Re: db and multiple processes
Hi Tomas, that all picolisp processes spread over many machines managing many databases all communicating (interconnected, in what sence?) to solve one (or many unrelated) business problem? I looked at SmApper website and it seems quite a secret technology;-) Well, then I cannot tell too much ;-) These are not mercantilistic but technical applications. Large companies have filer systems with many volumes, totaling several hundred terabytes of data. A sophisticated piece of SmApper software (in C++) operates on these filers and extracts or manipulates miscellaneous information. These data are fed into a chain of PicoLisp databases, one chain for each mounted volume. A chain currently consists of three elements, communicating with each other back and forth, and the final element provides a total view common for all volumes. The C++ part operates on the filer data, depending on the dressed-up information and a rules engine in the common database, which in turn queries the preceding chain elements. The mentioned admin part is separate from this. It starts, stops and monitors these processes. (app 1 server and admin 1 server share the same database: read-write) ... etc for other applications What I'd like to know first: Is it necessary here that they both share a common database, or would it also be conceivable that each has its own database, and they just exchange information? These are two different concepts, but the second one has some advantages if the amount of shared data is not too big. If you go with a common database, I would do it like this: The parent process only loads lib/http.l and a common er.l, defines a function 'admin' (de admin () (load admin/main.l) (startAdmin) ) and then starts (server port lightweight/main.l). If a user connects to port, he will see the lightweight application. A user that wants to call the admin application, connects to http[s]://server/@admin Basically, this is how I do it in that app with different GUIs. You still have a single parent process but different programs. What do you mean by processes accessing db are children of single parent? Does it mean just the one above or any common parent up high the hierarchy tree? Just one above. The limitation is caused by 'tell', which is called by 'commit' internally. (tell 'sym ['any ..]) - any Family IPC: Send an executable list (sym any ..) to all family members (i.e. all children of the current process, and all other children of the parent process, see fork) for automatic execution. So 'tell' relays changes only to its sister processes (and to its own children, but this has not shown to be of much practical value). Therefore, it is best if all processes that want to modify the database are of the same generation, with a common ancestor. As the http server forks new child process for each request, does not it mean that for each request I would have to load whole app 1 code depending whether the request is for app 1 server or admin 1 server if the parent must be just the one above? I would expect the server to fork a new child process only upon the first connect. Then the app code is loaded, and a session is started. After that, the 'load' on each request is minimal (or zero if you use @function). BTW, the same thing applies to the database caching, not only the 'load'ing of sources. If I understand it well, then the db stays consistent and properly synchronized no matter which process calls pool as long as the processes have the same direct parent process. Yes, and as long as these processes behave well, i.e. only change objects between (dbSync) and (commit 'upd), or call the wrapper methods 'put!', 'inc!', 'del!' etc. Sorry, I am missing something, where can I find this 'ext' function and '*Ext' global? They exist since picoLisp-2.3.3. Is there no doc/refE.html#*Ext? Cheers, - Alex -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: db and multiple processes
On Fri, Nov 28, 2008 at 01:15:46PM +0100, Alexander Burger wrote: If you go with a common database, I would do it like this: The parent process only loads lib/http.l and a common er.l, defines a function Another great advantage of this solution is that you don't even have to stop any of the applications after some source files (except the preloaded lib/http.l and er.l) were changed. Cheers, - Alex -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: db and multiple processes
Hi Tomas, is it fine to have two/many *independent* processes using the same database? Usually not, though I do it in certain cases. On the lowest level, the db consists of independent objects (external symbols). When a process accesses such a symbol, it obtains a read-lock on the db, reads it, and releases the lock. In the following, it operates on its objects in memory (i.e. the objects are cached), and a 'commit' will write all modified objects to the db (after obtaining a write-lock). Thus, any number of processes may read the db, but only a single one will be able to write at a time (while all reads are blocked during that time). This works for independent processes, too, but only guarantees the consistency of individual objects (that is, no two processes can write at the same time, and an object read by one process cannot be changed by another process while the reading takes place). It does, however, not synchronize the state of objects cached on different processes, and may in particular result in inconsistencies, if a transaction changes a set of interdependent objects, and other processes have some of these objects cached and some not. The processes do not share a common parent process, so I just want to check that there is NOT any important communication going on between parent and child processes (something like synchronization signals/messages or so)? In fact, there *is* an important communication going on. Before a process begins to change a set of objects for a transaction, it is supposed to call (dbSync). This will request a write-lock, and until that request is granted, all modifications done on the db by other processes will be relayed to the waiting process. This happens via pipes between the parent and the child processes. When the process is done modifying the objects, it should call (commit 'upd) so write the data to the db (or (rollback) to undo the changes) and to release the lock. If you have an independent process accessing such a db, it must make sure never to write to that db (or write only objects which are guaranteed not to be written by other processes), and be aware that the state of its cached objects might be out of date (e.g. call (rollback) from time to time to cause a reload). This is usually not so easy to guarantee, because changing a single object often triggers the change of many other objects as a side effect, like objects connected via '+Joint', or whole branches of an index tree. Also, how does locking and transactions work in picolisp? Is there a single master lock for the whole database, for each database file etc.? For the mechanism described above, the lock will always be for the whole database. It is possible to lock individual objects, too, this is done for example in lib/form.l to give to a user exclusive access rights for an object. To be on the safe side, I would recommend to allow only members of a single family to access a db, and implement external accesses via some RPC mechanisms (for queries, '*Ext' comes in handy here). Only when absolutely sure, begin to bypass these mechanisms. This whole matter is a good candidate for the Wiki ;-) Cheers, - Alex -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: db and multiple processes
Hi Alex, Usually not, though I do it in certain cases. In what cases and how you do it? I want to split my app into two independent processes (process families): an admin part (quite complex, can change a lot and significantly, can stop quite often for upgrades etc.) and public part (quite simple, changes little, minimize downtime). The reason is that I want to be able to kill and upgrade these two parts independently without affecting each other. If you have an independent process accessing such a db, it must make sure never to write to that db (or write only objects which are guaranteed not to be written by other processes), and be aware that the state of its cached objects might be out of date (e.g. call (rollback) from time to time to cause a reload). This is usually not so easy to guarantee, because changing a single object often triggers the change of many other objects as a side effect, like objects connected via '+Joint', or whole branches of an index tree. I cannot guarantee that. In fact, there *is* an important communication going on. Now how to achieve the above requirements? Maybe having a master parent process which would - open the database 'pool' - fork into two apps (each forked process would load code for different app) This way I would have both apps in the same process family and could still kill/restart them independently. To be on the safe side, I would recommend to allow only members of a single family to access a db, and implement external accesses via some RPC mechanisms (for queries, '*Ext' comes in handy here). What is '*Ext', I cannot find anything about that? This whole matter is a good candidate for the Wiki ;-) Yes, why not, I have to understand it first though. Or, feel free to put your thoughts in there;-) Thank you, Tomas -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: db and multiple processes
Hi Tomas, In what cases and how you do it? At SmApper we had a set of stand-alone shell tools (i.e. with the first line containing '#!bin/picolisp') that were used to do quick scans of some databases and build reports. They simply loaded the er.l of those applications, and called 'pool' on the db files. As they did a single pass over the data and then terminated, there was no cache problem, and they did not write anything to the databases. But recently these tools were abandoned and replaced by new versions, which properly connect to the applications and issue queries. families): an admin part (quite complex, can change a lot and significantly, can stop quite often for upgrades etc.) and public part (quite simple, changes little, minimize downtime). We have been using similar architectures. Our (now obsolete) 7fach system consisted of an admin application, and a separate application for each customer (we found only around 25 pilot customers and stopped the project). The SmApper system also has an application manager, and tens to hundreds of interconnected database applications on blade clusters. It depends a lot on the logical structure, and what exchange of information has to take place, but I think PicoLisp has the necessary mechanisms, mainly using TC/P connections. - open the database 'pool' I would use a separate 'pool' for each logical application. - fork into two apps (each forked process would load code for different app) Yes, but you have to keep in mind that it works only well if the processes accessing the db are children of a single parent (see 'tell' in the reference). In this case it would mean that the two forked apps should not do any further forks, which could be inconvenient. If you want to go with a single parent but different sources, it is no problem though, just load the sources after the fork. I do this currently for one customer who is still using the old Java applet API, and the new 'form' API in parallel (depending on the user). This app has in fact three entry URLs, one for the AWT version, one for the Swing version, and the new one. Another disadvantage of having all applications using the same pool is that you cannot put them on separate machines. What is '*Ext', I cannot find anything about that? The global '*Ext', in cooperation with RPC functions and the 'ext' function, allows to send pilog queries or other requests to remote machines, have external symbols sent back from these machines, and operate on these external symbols just as if they would reside on the local database. These objects are read-only, though. This whole matter is a good candidate for the Wiki ;-) Yes, why not, I have to understand it first though. Or, feel free to put your thoughts in there;-) Right. Posting to the Wiki is on my todo-list, but I absolutely did not find the time yet (all spare time goes to pico3 currently) ;-) Cheers, - Alex -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]