Hi Jorge,

> > An important question is: Does this parallel processing of database
> > objects also involve modifications of these objects? If so, the
> > necessary synchronization between the processes will produce additional
> > costs.
> 
> Yes, I need to update the database with results from the executed commands.

Then I suspect that parallelizing the task may make things worse.

Parallelizing something makes sense when CPUs load is the bottleneck.
For database updates, however, the bottleneck is disk I/O and the
system's disk buffer cache. What makes sense in such cases is to
parallelize the work on separate databases, running on separate
machines.


> So, you suggest limiting the updates to happen only in the main process, 
> right?

Yes. To be precise, the recommended way is to have all database
operations run in child processes which are all children of a common
parent process. If we call this parent process the "main" process, then
the updates don't happen in that process but in one of the children.


Having said this, I see that your test program doesn't operate on a
database at all. Calling the C compiler in parallel tasks _does_ make
sense. Then we talk about something completely different.

> In that case I would need something like the following to be able to
> invoke the shell commands and update the database with the results.

This would indeed make sense, if the shell commands induce a heavy load.

Then the CPU load is the bottleneck again, and the way to go is to fork
several processes to do the work (i.e. call the shell commands), but
still have a _single_ process operating on the database. This would be
optimal.


> I don't like it, too convoluted for my taste. Any suggestion on how to
> improve the performance/style? Perhaps a different approach would be
> better?

Yes, I think so too. If I understand you right, you want to call a
number of shell commands (in a batch list), and then store the results
in a database. If so, you could use something like that:

   (de processJobs (CPUs Batch)
      (let Slots (need CPUs "free")
         (for Exe Batch
            (let Pos
               (wait NIL
                  (seek
                     '((Pos)
                        (cond
                           ((== "free" (car Pos))  # Found a free slot
                              (set Pos "busy") )
                           ((n== "busy" (car Pos)) # Found a result
                              (msg (car Pos))      # Instead of 'msg': Store 
result in DB
                              (set Pos "busy") ) ) )
                     Slots ) )
               (later Pos (eval Exe)) ) ) ) )

You pass the number of CPUs and a list of executable expressions which
may do arbitrary work, including calls to a shell. The 'msg' call can be
replaced with something more useful, e.g. a store the result into a
database.

Example call:

   (processJobs 4
      (make
         (do 20
            (link '(in '(sh "-c" "sleep 1; echo $RANDOM") (read))) ) ) )

This outpus 20 random numbers via 'msg', in maximally 4 parallel
processes.


I didn't completely analyze your code. Just a warning about an error:

> (de "completeJob" ("Pos")
>    (let (cdar "Pos")
>       (let RESULT (caar "Pos")
>          (eval (caddar "Pos")) ) )

The second line redefines the function 'cdar'. Is that intended?

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe

Reply via email to