Re: db and multiple processes

2008-11-28 Thread Alexander Burger
Hi Tomas,

 that all picolisp processes spread over many machines managing many
 databases all communicating (interconnected, in what sence?) to solve
 one (or many unrelated) business problem?  I looked at SmApper
 website and it seems quite a secret technology;-)

Well, then I cannot tell too much ;-) These are not mercantilistic but
technical applications. Large companies have filer systems with many
volumes, totaling several hundred terabytes of data. A sophisticated
piece of SmApper software (in C++) operates on these filers and extracts
or manipulates miscellaneous information. These data are fed into a
chain of PicoLisp databases, one chain for each mounted volume. A chain
currently consists of three elements, communicating with each other back
and forth, and the final element provides a total view common for all
volumes. The C++ part operates on the filer data, depending on the
dressed-up information and a rules engine in the common database, which
in turn queries the preceding chain elements. The mentioned admin part
is separate from this. It starts, stops and monitors these processes.


  (app 1 server and admin 1 server share the same database: read-write)
  ... etc for other applications

What I'd like to know first: Is it necessary here that they both share a
common database, or would it also be conceivable that each has its own
database, and they just exchange information?

These are two different concepts, but the second one has some advantages
if the amount of shared data is not too big.

If you go with a common database, I would do it like this: The parent
process only loads lib/http.l and a common er.l, defines a function
'admin'

   (de admin ()
  (load admin/main.l)
  (startAdmin) )

and then starts (server port lightweight/main.l).

If a user connects to port, he will see the lightweight application.

A user that wants to call the admin application, connects to

   http[s]://server/@admin

Basically, this is how I do it in that app with different GUIs. You
still have a single parent process but different programs.




 What do you mean by processes accessing db are children of single
 parent?  Does it mean just the one above or any common parent up high
 the hierarchy tree?

Just one above. The limitation is caused by 'tell', which is called by
'commit' internally.

   (tell 'sym ['any ..]) - any
  Family IPC: Send an executable list (sym any ..) to all family
  members (i.e. all children of the current process, and all other
  children of the parent process, see fork) for automatic execution.

So 'tell' relays changes only to its sister processes (and to its own
children, but this has not shown to be of much practical value).
Therefore, it is best if all processes that want to modify the database
are of the same generation, with a common ancestor.


 As the http server forks new child process for each request, does not
 it mean that for each request I would have to load whole app 1 code
 depending whether the request is for app 1 server or admin 1
 server if the parent must be just the one above?

I would expect the server to fork a new child process only upon the
first connect. Then the app code is loaded, and a session is started.
After that, the 'load' on each request is minimal (or zero if you use
@function).

BTW, the same thing applies to the database caching, not only the
'load'ing of sources.


 If I understand it well, then the db stays consistent and properly
 synchronized no matter which process calls pool as long as the
 processes have the same direct parent process.

Yes, and as long as these processes behave well, i.e. only change
objects between (dbSync) and (commit 'upd), or call the wrapper methods
'put!', 'inc!', 'del!' etc.


 Sorry, I am missing something, where can I find this 'ext' function
 and '*Ext' global?

They exist since picoLisp-2.3.3. Is there no doc/refE.html#*Ext?

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]


Re: db and multiple processes

2008-11-28 Thread Alexander Burger
On Fri, Nov 28, 2008 at 01:15:46PM +0100, Alexander Burger wrote:
 If you go with a common database, I would do it like this: The parent
 process only loads lib/http.l and a common er.l, defines a function

Another great advantage of this solution is that you don't even have to
stop any of the applications after some source files (except the
preloaded lib/http.l and er.l) were changed.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]


Re: db and multiple processes

2008-11-27 Thread Alexander Burger
Hi Tomas,

 is it fine to have two/many *independent* processes using the same
 database?

Usually not, though I do it in certain cases.

On the lowest level, the db consists of independent objects (external
symbols). When a process accesses such a symbol, it obtains a read-lock
on the db, reads it, and releases the lock. In the following, it
operates on its objects in memory (i.e. the objects are cached), and a
'commit' will write all modified objects to the db (after obtaining a
write-lock).

Thus, any number of processes may read the db, but only a single one
will be able to write at a time (while all reads are blocked during that
time).

This works for independent processes, too, but only guarantees the
consistency of individual objects (that is, no two processes can write
at the same time, and an object read by one process cannot be changed by
another process while the reading takes place).

It does, however, not synchronize the state of objects cached on
different processes, and may in particular result in inconsistencies, if
a transaction changes a set of interdependent objects, and other
processes have some of these objects cached and some not.


 The processes do not share a common parent process, so I just want to
 check that there is NOT any important communication going on between
 parent and child processes (something like synchronization
 signals/messages or so)?

In fact, there *is* an important communication going on.

Before a process begins to change a set of objects for a transaction, it
is supposed to call (dbSync). This will request a write-lock, and until
that request is granted, all modifications done on the db by other
processes will be relayed to the waiting process. This happens via pipes
between the parent and the child processes. When the process is done
modifying the objects, it should call (commit 'upd) so write the data to
the db (or (rollback) to undo the changes) and to release the lock.


If you have an independent process accessing such a db, it must make
sure never to write to that db (or write only objects which are
guaranteed not to be written by other processes), and be aware that the
state of its cached objects might be out of date (e.g. call (rollback)
from time to time to cause a reload).

This is usually not so easy to guarantee, because changing a single
object often triggers the change of many other objects as a side effect,
like objects connected via '+Joint', or whole branches of an index tree.


 Also, how does locking and transactions work in picolisp?  Is there a
 single master lock for the whole database, for each database file
 etc.?

For the mechanism described above, the lock will always be for the whole
database. It is possible to lock individual objects, too, this is done
for example in lib/form.l to give to a user exclusive access rights
for an object.


To be on the safe side, I would recommend to allow only members of a
single family to access a db, and implement external accesses via some
RPC mechanisms (for queries, '*Ext' comes in handy here). Only when
absolutely sure, begin to bypass these mechanisms.

This whole matter is a good candidate for the Wiki ;-)

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]


Re: db and multiple processes

2008-11-27 Thread Tomas Hlavaty
Hi Alex,

 Usually not, though I do it in certain cases.

In what cases and how you do it?

I want to split my app into two independent processes (process
families): an admin part (quite complex, can change a lot and
significantly, can stop quite often for upgrades etc.) and public
part (quite simple, changes little, minimize downtime).  The reason is
that I want to be able to kill and upgrade these two parts
independently without affecting each other.

 If you have an independent process accessing such a db, it must make
 sure never to write to that db (or write only objects which are
 guaranteed not to be written by other processes), and be aware that the
 state of its cached objects might be out of date (e.g. call (rollback)
 from time to time to cause a reload).

 This is usually not so easy to guarantee, because changing a single
 object often triggers the change of many other objects as a side effect,
 like objects connected via '+Joint', or whole branches of an index tree.

I cannot guarantee that.

 In fact, there *is* an important communication going on.

Now how to achieve the above requirements?  Maybe having a master
parent process which would

- open the database 'pool'
- fork into two apps (each forked process would load code for different app)

This way I would have both apps in the same process family and could
still kill/restart them independently.

 To be on the safe side, I would recommend to allow only members of a
 single family to access a db, and implement external accesses via
 some RPC mechanisms (for queries, '*Ext' comes in handy here).

What is '*Ext', I cannot find anything about that?

 This whole matter is a good candidate for the Wiki ;-)

Yes, why not, I have to understand it first though.  Or, feel free to
put your thoughts in there;-)

Thank you,

Tomas
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]


Re: db and multiple processes

2008-11-27 Thread Alexander Burger
Hi Tomas,

 In what cases and how you do it?

At SmApper we had a set of stand-alone shell tools (i.e. with the first
line containing '#!bin/picolisp') that were used to do quick scans of
some databases and build reports. They simply loaded the er.l of those
applications, and called 'pool' on the db files. As they did a single
pass over the data and then terminated, there was no cache problem, and
they did not write anything to the databases. But recently these tools
were abandoned and replaced by new versions, which properly connect to
the applications and issue queries.


 families): an admin part (quite complex, can change a lot and
 significantly, can stop quite often for upgrades etc.) and public
 part (quite simple, changes little, minimize downtime).

We have been using similar architectures. Our (now obsolete) 7fach
system consisted of an admin application, and a separate application for
each customer (we found only around 25 pilot customers and stopped the
project). The SmApper system also has an application manager, and tens
to hundreds of interconnected database applications on blade clusters.

It depends a lot on the logical structure, and what exchange of
information has to take place, but I think PicoLisp has the necessary
mechanisms, mainly using TC/P connections.


 - open the database 'pool'

I would use a separate 'pool' for each logical application.

 - fork into two apps (each forked process would load code for
 different app)

Yes, but you have to keep in mind that it works only well if the
processes accessing the db are children of a single parent (see 'tell'
in the reference). In this case it would mean that the two forked apps
should not do any further forks, which could be inconvenient.

If you want to go with a single parent but different sources, it is no
problem though, just load the sources after the fork. I do this
currently for one customer who is still using the old Java applet API,
and the new 'form' API in parallel (depending on the user). This app has
in fact three entry URLs, one for the AWT version, one for the Swing
version, and the new one.


Another disadvantage of having all applications using the same pool is
that you cannot put them on separate machines.


 What is '*Ext', I cannot find anything about that?

The global '*Ext', in cooperation with RPC functions and the 'ext'
function, allows to send pilog queries or other requests to remote
machines, have external symbols sent back from these machines, and
operate on these external symbols just as if they would reside on the
local database. These objects are read-only, though.


  This whole matter is a good candidate for the Wiki ;-)
 
 Yes, why not, I have to understand it first though.  Or, feel free to
 put your thoughts in there;-)

Right. Posting to the Wiki is on my todo-list, but I absolutely did not
find the time yet (all spare time goes to pico3 currently) ;-)

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]