Re: Parallel command execution

2012-06-03 Thread Alexander Burger
HI Jorge,

> OK, if I understood 'task' correctly this runs entirely on the 'main'
> process (the one that will be accessing the database and queueing the
> commands).

Yes.

> Looks like if the main process is busy in a very CPU-intensive task
> (no IO at all) it could be missing some chances to launch additional
> processes, but I guess I can workaround that inserting a (wait 0) at
> some point in the expensive calculation to give the task a chance to
> run, right?

Yes. Besides during 'wait', the background tasks also run when 'dbSync',
'key' or 'listen' are executed. From these, 'key' is perhaps useful
here, calling (key 0) to check for a key press without delay.


Another possibility is to write an explicit 'job' (e.g. with 'curry') or
a coroutine, and call that both in the background task and in the CPU
intensive code.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-06-02 Thread Jorge Acereda
Hi Alexander,

On Jun 2, 2012, at 1:01 PM, Alexander Burger wrote:
> 
>   (task -2000 0# Run once every 2 sec
>  Slots (need 4 "free") # QuadCore
>  (map
> '((Pos)
>(cond
>   ((== "free" (car Pos))   # Found a free slot
>  (when (fifo '*Batch)  # Any jobs?
> (set Pos "busy")   # Yes
> (later Pos (eval @)) ) )
>   ((n== "busy" (car Pos))  # Found a result
>  (msg (car Pos))   # Handle result
>  (ifn (fifo '*Batch)   # More jobs?
> (set Pos "free")   # No
> (set Pos "busy")   # Yes
> (later Pos (eval @)) ) ) ) )
> Slots ) )

OK, if I understood 'task' correctly this runs entirely on the 'main' process 
(the one that will be accessing the database and queueing the commands).
Looks like if the main process is busy in a very CPU-intensive task (no IO at 
all) it could be missing some chances to launch additional processes, but I 
guess I can workaround that inserting a (wait 0) at some point in the expensive 
calculation to give the task a chance to run, right?
This is perfect, nice and simple.

Thanks,
  Jorge


--
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-06-02 Thread Alexander Burger
Hi Jorge,

> In my situation, a crash is not a big deal. Just reinvoke the command.

OK ;-)

> As for IPC, I don't foresee any commutication between processes
> (besides the need to communicate the results back).

Then we have the simplest setup of all, in that you need only a single
process, which does the DB stuff and calls the external commands (as
discussed) to get the data. This is also the most efficient way.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-06-02 Thread Alexander Burger
Hi Henrik,

> Correct, and I should have added that in my case that only trivial
> stuff was actually executed straight in the parent process, the more

OK, all right.

> hairy stuff was forked in the main, so a child calls the boss which
> creates new children in order to avoid children of children (afaik the
> db syncing only works with a single child "depth").

Right. It uses 'tell' internally, which sends messages only to all
sister processes, and to all direct child processes.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-06-02 Thread Alexander Burger
Hi Jorge,

> It was just an example. I'll have a database recording all the dependencies 
> for each target.
> 
> > 
> >> In that case I would need something like the following to be able to
> >> invoke the shell commands and update the database with the results.
> > 
> > This would indeed make sense, if the shell commands induce a heavy load.
> 
> It does, say we have to invoke in the order of 1 gcc commands. My
> toy application is a 'make' replacement.

OK



> > Yes, I think so too. If I understand you right, you want to call a
> > number of shell commands (in a batch list), and then store the results
> > in a database. If so, you could use something like that:
> 
> Not exactly, I would prefer not to compose that list in advance. Would
> be good to be able to keep queueing commands while previous commands are
> running.
> My guess is that deciding what to execute next will be time-consuming,
> as soon as it can be determined that a target is ready for execution its
> command should be queued.

I think you can do that easily by with 'fifo', using a global '*Batch'
variable instead of the 'Batch' parameter in

> >   (de processJobs (CPUs Batch)


> I guess an additional step is needed that performs the equivalent of
> my "waitJobs":

Right. This was missing. The remaining running jobs must be waited for.


Taking all that into account, I propose the following solution: We
install a background 'task', which runs, say, every 2 seconds, and
handles all that. We use a global '*Batch' to hold the queue:

   # *Batch

   (task -2000 0# Run once every 2 sec
  Slots (need 4 "free") # QuadCore
  (map
 '((Pos)
(cond
   ((== "free" (car Pos))   # Found a free slot
  (when (fifo '*Batch)  # Any jobs?
 (set Pos "busy")   # Yes
 (later Pos (eval @)) ) )
   ((n== "busy" (car Pos))  # Found a result
  (msg (car Pos))   # Handle result
  (ifn (fifo '*Batch)   # More jobs?
 (set Pos "free")   # No
 (set Pos "busy")   # Yes
 (later Pos (eval @)) ) ) ) )
 Slots ) )

It checks the slots every 2 seconds, starts new processes when a free
slot is found, and handles the results if any are available.


We can use 'fifo' to batch a new job:

   (fifo '*Batch '(call "cc" ..))

   : (for X 10 (fifo '*Batch (list '* X X)))
   -> (* 10 10)
   : 1  
   4
   9
   16
   25
   36
   49
   64
   81
   100

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-06-02 Thread Jorge Acereda

On Jun 2, 2012, at 10:23 AM, Alexander Burger wrote:

> Hi Henrik,
> 
>> I don't know if the boss function might be of help to you? It helped
>> me in order for the current forked instance of the web server to be
> 
> That's right.
> 
> But I'd like to point out (again) that 'boss' must be used with absolute
> care. It executes expressions in the parent process, but the parent
> process is the central coordinator for child synchronization and IPC. It
> should always be as responsive as possible (because otherwise all
> children might be slowed down or even blocked), and it should never!
> crash.

In my situation, a crash is not a big deal. Just reinvoke the command.
As for IPC, I don't foresee any commutication between processes (besides the 
need to communicate the results back).


> 
> A crashing or blocking child process, on the other hand, doesn't do any
> harm to the rest of the system. Therefore, I strongly recommend to do
> any database modifications in a child process, as is the standard setup
> in all PicoLisp examples.
> 
> Another reason for not putting any load on the parent process is: If the
> parent starts to do non-trivial work, it will increase its heap size. In
> the following, all fork'ed child processes will inherit this heap,
> including possible temporary data and cached objects. It is difficult
> then go get a predictable behavior.

This could be a good reason though, but perhaps I can avoid growing the heap in 
that process with some manual GCs.

I'll also experiment with Henrik's suggestion.

Regards,
  Jorge

--
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-06-02 Thread Henrik Sarvell
Correct, and I should have added that in my case that only trivial
stuff was actually executed straight in the parent process, the more
hairy stuff was forked in the main, so a child calls the boss which
creates new children in order to avoid children of children (afaik the
db syncing only works with a single child "depth").

In main.l (the "boss"):

# workers
(de forkWorkHorse Args
   (unless (fork)
  (eval Args)
  (bye)))

(de updatePresentFeed (Feed)
   (dbSync)
   (aImportCommon> '+Rss Feed)
   (commit 'upd)
   (bye))

(de updatePresentFeeds (Feeds)
   (for Feed Feeds
  (let Cycles 30
 (if (fork)
(let Pid @
   (while (and (n0 Cycles) (kill Pid 0))
  (dec 'Cycles)
  (wait 1000))
   (kill Pid))
(updatePresentFeed Feed)

The last updatePresentFees is actually an ugly (but robust) hack to
make up for the fact that some external sites would provide RSS input
that would crash the child, at this point I didn't have the time or
energy to figure out why, I simply wanted the import process to move
along and forget about it. Ie if something takes too long to finish we
kill it and go for the next in line.

In the child:

(dm aImportFleshLater> (Feed)
   (put!> Feed 'lastFetch (stamp> '+Gh))
   (let As (createArticles> This Feed T)
  (boss 'forkWorkHorse 'fleshArticles (lit As))
  As))

Here we import the articles so for instance the headlines display ASAP
for the user, other intensive meta data (data the user can wait for)
generation is handled through calling forkWorkHorse in main.l with the
appropriate data.




On Sat, Jun 2, 2012 at 3:23 PM, Alexander Burger  wrote:
> Hi Henrik,
>
>> I don't know if the boss function might be of help to you? It helped
>> me in order for the current forked instance of the web server to be
>
> That's right.
>
> But I'd like to point out (again) that 'boss' must be used with absolute
> care. It executes expressions in the parent process, but the parent
> process is the central coordinator for child synchronization and IPC. It
> should always be as responsive as possible (because otherwise all
> children might be slowed down or even blocked), and it should never!
> crash.
>
> A crashing or blocking child process, on the other hand, doesn't do any
> harm to the rest of the system. Therefore, I strongly recommend to do
> any database modifications in a child process, as is the standard setup
> in all PicoLisp examples.
>
> Another reason for not putting any load on the parent process is: If the
> parent starts to do non-trivial work, it will increase its heap size. In
> the following, all fork'ed child processes will inherit this heap,
> including possible temporary data and cached objects. It is difficult
> then go get a predictable behavior.
>
> In a nutshell: Keep you hands off 'boss', unless you are absolutely sure
> what you are doing!
>
> Cheers,
> - Alex
> --
> UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-06-02 Thread Jorge Acereda
Hi,

On Jun 2, 2012, at 10:10 AM, Alexander Burger wrote:
> 
> 
> Having said this, I see that your test program doesn't operate on a
> database at all. Calling the C compiler in parallel tasks _does_ make
> sense. Then we talk about something completely different.

It was just an example. I'll have a database recording all the dependencies for 
each target.

> 
>> In that case I would need something like the following to be able to
>> invoke the shell commands and update the database with the results.
> 
> This would indeed make sense, if the shell commands induce a heavy load.

It does, say we have to invoke in the order of 1 gcc commands. My toy 
application is a 'make' replacement.


> 
>> I don't like it, too convoluted for my taste. Any suggestion on how to
>> improve the performance/style? Perhaps a different approach would be
>> better?
> 
> Yes, I think so too. If I understand you right, you want to call a
> number of shell commands (in a batch list), and then store the results
> in a database. If so, you could use something like that:

Not exactly, I would prefer not to compose that list in advance. Would be good 
to be able to keep queueing commands while previous commands are running.
My guess is that deciding what to execute next will be time-consuming, as soon 
as it can be determined that a target is ready for execution its command should 
be queued.


> 
>   (de processJobs (CPUs Batch)
>  (let Slots (need CPUs "free")
> (for Exe Batch
>(let Pos
>   (wait NIL
>  (seek
> '((Pos)
>(cond
>   ((== "free" (car Pos))  # Found a free slot
>  (set Pos "busy") )
>   ((n== "busy" (car Pos)) # Found a result
>  (msg (car Pos))  # Instead of 'msg': Store 
> result in DB
>  (set Pos "busy") ) ) )
> Slots ) )
>   (later Pos (eval Exe)) ) ) ) )


I guess an additional step is needed that performs the equivalent of my 
"waitJobs":


(bench
   (processJobs 2 (mapcar '((X) (list '* X X)) (range 1 10
1
4
9
16
25
36
49
64
0.006 sec

No result for 8 and 9.

> 
> I didn't completely analyze your code. Just a warning about an error:
> 
>> (de "completeJob" ("Pos")
>>   (let (cdar "Pos")
>>  (let RESULT (caar "Pos")
>> (eval (caddar "Pos")) ) )
> 
> The second line redefines the function 'cdar'. Is that intended?

Of course not :-)
It should read something like:
(bind (cadar "Pos") ...

Each slot contains (  ), and that line tries 
to restore the environment of the instant when the command was queued, so that 
it restores all the bound variables before evaluating the expression.

Time to experiment a bit more. Thanks.

--
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-06-02 Thread Alexander Burger
Hi Henrik,

> I don't know if the boss function might be of help to you? It helped
> me in order for the current forked instance of the web server to be

That's right.

But I'd like to point out (again) that 'boss' must be used with absolute
care. It executes expressions in the parent process, but the parent
process is the central coordinator for child synchronization and IPC. It
should always be as responsive as possible (because otherwise all
children might be slowed down or even blocked), and it should never!
crash.

A crashing or blocking child process, on the other hand, doesn't do any
harm to the rest of the system. Therefore, I strongly recommend to do
any database modifications in a child process, as is the standard setup
in all PicoLisp examples.

Another reason for not putting any load on the parent process is: If the
parent starts to do non-trivial work, it will increase its heap size. In
the following, all fork'ed child processes will inherit this heap,
including possible temporary data and cached objects. It is difficult
then go get a predictable behavior.

In a nutshell: Keep you hands off 'boss', unless you are absolutely sure
what you are doing!

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-06-02 Thread Alexander Burger
Hi Jorge,

> > An important question is: Does this parallel processing of database
> > objects also involve modifications of these objects? If so, the
> > necessary synchronization between the processes will produce additional
> > costs.
> 
> Yes, I need to update the database with results from the executed commands.

Then I suspect that parallelizing the task may make things worse.

Parallelizing something makes sense when CPUs load is the bottleneck.
For database updates, however, the bottleneck is disk I/O and the
system's disk buffer cache. What makes sense in such cases is to
parallelize the work on separate databases, running on separate
machines.


> So, you suggest limiting the updates to happen only in the main process, 
> right?

Yes. To be precise, the recommended way is to have all database
operations run in child processes which are all children of a common
parent process. If we call this parent process the "main" process, then
the updates don't happen in that process but in one of the children.


Having said this, I see that your test program doesn't operate on a
database at all. Calling the C compiler in parallel tasks _does_ make
sense. Then we talk about something completely different.

> In that case I would need something like the following to be able to
> invoke the shell commands and update the database with the results.

This would indeed make sense, if the shell commands induce a heavy load.

Then the CPU load is the bottleneck again, and the way to go is to fork
several processes to do the work (i.e. call the shell commands), but
still have a _single_ process operating on the database. This would be
optimal.


> I don't like it, too convoluted for my taste. Any suggestion on how to
> improve the performance/style? Perhaps a different approach would be
> better?

Yes, I think so too. If I understand you right, you want to call a
number of shell commands (in a batch list), and then store the results
in a database. If so, you could use something like that:

   (de processJobs (CPUs Batch)
  (let Slots (need CPUs "free")
 (for Exe Batch
(let Pos
   (wait NIL
  (seek
 '((Pos)
(cond
   ((== "free" (car Pos))  # Found a free slot
  (set Pos "busy") )
   ((n== "busy" (car Pos)) # Found a result
  (msg (car Pos))  # Instead of 'msg': Store 
result in DB
  (set Pos "busy") ) ) )
 Slots ) )
   (later Pos (eval Exe)) ) ) ) )

You pass the number of CPUs and a list of executable expressions which
may do arbitrary work, including calls to a shell. The 'msg' call can be
replaced with something more useful, e.g. a store the result into a
database.

Example call:

   (processJobs 4
  (make
 (do 20
(link '(in '(sh "-c" "sleep 1; echo $RANDOM") (read))) ) ) )

This outpus 20 random numbers via 'msg', in maximally 4 parallel
processes.


I didn't completely analyze your code. Just a warning about an error:

> (de "completeJob" ("Pos")
>(let (cdar "Pos")
>   (let RESULT (caar "Pos")
>  (eval (caddar "Pos")) ) )

The second line redefines the function 'cdar'. Is that intended?

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-06-01 Thread Henrik Sarvell
I don't know if the boss function might be of help to you? It helped
me in order for the current forked instance of the web server to be
able to call the main process and do various slow writes there. The
reason was that the user should not have to wait for stuff like that.,
ie the current child process was able to return an http answer much
faster while the heavy stuff happened in the background.

http://www.mail-archive.com/picolisp@software-lab.de/msg01917.html

http://www.mail-archive.com/picolisp@software-lab.de/msg01745.html


On Sat, Jun 2, 2012 at 6:48 AM, Jorge Acereda  wrote:
> Hi again,
>
>
> On May 29, 2012, at 4:40 PM, Alexander Burger wrote:
>
>>
>>
>>> Another option to avoid the fork() would be to have a pool of
>>> pre-forked instances reading from a jobs queue or something like that
>>> (perhaps taking advantage of an additional database to implement the
>>> queue?), but my skills are still lacking on how to implement that.
>>
>> That's also an interesting approach.
>>
>>
>> An important question is: Does this parallel processing of database
>> objects also involve modifications of these objects? If so, the
>> necessary synchronization between the processes will produce additional
>> costs.
>
> Yes, I need to update the database with results from the executed commands.
>
> So, you suggest limiting the updates to happen only in the main process, 
> right?
>
> In that case I would need something like the following to be able to invoke 
> the shell commands and update the database with the results.
> I don't like it, too convoluted for my taste. Any suggestion on how to 
> improve the performance/style? Perhaps a different approach would be better?
>
> Note that *maxProcesses should be set to something like the number of cores 
> and that increasing the length of *L can result in fork() errors in the 
> simple later/wait benchmark.
>
> Thanks,
>  Jorge
>
> (setq *maxProcesses 2)
> (setq "*idle" '(NIL NIL nil))
> (setq "*batches" (make (do *maxProcesses (link "*idle"
>
> (de "findPred" (Pred L)
>   (seek '((X) (Pred (car X))) L) )
>
> (de "completeJob" ("Pos")
>   (let (cdar "Pos")
>      (let RESULT (caar "Pos")
>         (eval (caddar "Pos")) ) )
>   (set "Pos" "*idle")
>   "Pos" )
>
> (de "waitJob" ()
>   ("completeJob" (wait NIL ("findPred" 'pair "*batches"
>
> (de waitAll ()
>   (wait NIL (not ("findPred" 'atom "*batches")))
>   (map '"completeJob" "*batches"))
>
> (de batch ("Cont" . "Job")
>   (let "Pos" ("waitJob")
>      (set "Pos" 'busy)
>      (later "Pos"
>         (list (run "Job") (env) "Cont") ) ) )
>
> (mapcar '((X) (batch '(msg RESULT) (* X X))) (range 1 10))
> (waitAll)
>
> (de mycmd ()
>   (in '(cc "-c" "../picolisp/src/main.c" "-o" "/dev/null")) T)
>
> (setq *L (range 1 50))
>
> (bench
>   (mapcar '(() (batch '(nil RESULT) (mycmd))) *L)
>   (waitAll) )
>
> (bench
>   (prog1
>      (mapcan '(() (later (cons) (mycmd))) *L)
>      (wait NIL (full @))
>      ) )
>
> (bench
>   (mapcan '(() (mycmd)) *L))
>
> --
> UNSUBSCRIBE: mailto:picolisp@software-lab.de?subjectUnsubscribe
--
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-06-01 Thread Jorge Acereda
Hi again,


On May 29, 2012, at 4:40 PM, Alexander Burger wrote:

> 
> 
>> Another option to avoid the fork() would be to have a pool of
>> pre-forked instances reading from a jobs queue or something like that
>> (perhaps taking advantage of an additional database to implement the
>> queue?), but my skills are still lacking on how to implement that.
> 
> That's also an interesting approach.
> 
> 
> An important question is: Does this parallel processing of database
> objects also involve modifications of these objects? If so, the
> necessary synchronization between the processes will produce additional
> costs.

Yes, I need to update the database with results from the executed commands.

So, you suggest limiting the updates to happen only in the main process, right?

In that case I would need something like the following to be able to invoke the 
shell commands and update the database with the results. 
I don't like it, too convoluted for my taste. Any suggestion on how to improve 
the performance/style? Perhaps a different approach would be better?

Note that *maxProcesses should be set to something like the number of cores and 
that increasing the length of *L can result in fork() errors in the simple 
later/wait benchmark.

Thanks,
  Jorge

(setq *maxProcesses 2)
(setq "*idle" '(NIL NIL nil))
(setq "*batches" (make (do *maxProcesses (link "*idle"

(de "findPred" (Pred L)
   (seek '((X) (Pred (car X))) L) ) 

(de "completeJob" ("Pos")
   (let (cdar "Pos")
  (let RESULT (caar "Pos")
 (eval (caddar "Pos")) ) )
   (set "Pos" "*idle")
   "Pos" )

(de "waitJob" ()
   ("completeJob" (wait NIL ("findPred" 'pair "*batches"

(de waitAll ()
   (wait NIL (not ("findPred" 'atom "*batches")))
   (map '"completeJob" "*batches"))

(de batch ("Cont" . "Job")
   (let "Pos" ("waitJob")
  (set "Pos" 'busy)
  (later "Pos"
 (list (run "Job") (env) "Cont") ) ) )

(mapcar '((X) (batch '(msg RESULT) (* X X))) (range 1 10))
(waitAll)

(de mycmd ()
   (in '(cc "-c" "../picolisp/src/main.c" "-o" "/dev/null")) T)

(setq *L (range 1 50))

(bench 
   (mapcar '(() (batch '(nil RESULT) (mycmd))) *L)
   (waitAll) )

(bench
   (prog1
  (mapcan '(() (later (cons) (mycmd))) *L)
  (wait NIL (full @))
  ) )

(bench
   (mapcan '(() (mycmd)) *L))

--
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-05-29 Thread Alexander Burger
Hi Jorge,

> >   (de processCustomers (N . Prg)
> >  (let Lst (need N)
> > (iter (tree 'nr '+CuSu)
> >'((This)
> >   (let Pos (wait NIL (memq NIL Lst))
> >  (set Pos T)
> >  (later Pos
> > (run Prg)
> > NIL ) ) ) ) ) )
> 
> Thanks, that looks good. My only objection would be the additional
> fork() implied by 'later'.

Not sure. I rather believe that the database file I/O is the bottleneck.
Then a direct, single-threaded iteration might be even the fastest.

The parallel stuff makes only sense if the processing in the 'Prg' is
very CPU intensive.


> Another option to avoid the fork() would be to have a pool of
> pre-forked instances reading from a jobs queue or something like that
> (perhaps taking advantage of an additional database to implement the
> queue?), but my skills are still lacking on how to implement that.

That's also an interesting approach.


An important question is: Does this parallel processing of database
objects also involve modifications of these objects? If so, the
necessary synchronization between the processes will produce additional
costs.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-05-29 Thread Jorge Acereda

On May 29, 2012, at 9:15 AM, Alexander Burger wrote:



> 
>   (de processCustomers (N . Prg)
>  (let Lst (need N)
> (iter (tree 'nr '+CuSu)
>'((This)
>   (let Pos (wait NIL (memq NIL Lst))
>  (set Pos T)
>  (later Pos
> (run Prg)
> NIL ) ) ) ) ) )

Thanks, that looks good. My only objection would be the additional fork() 
implied by 'later'.
I'll try to get it running with that approach and do some benchmarking. Later, 
I'll try to compare that to a patched version that uses vfork() instead of 
fork() and see if that makes a difference.
Another option to avoid the fork() would be to have a pool of pre-forked 
instances reading from a jobs queue or something like that (perhaps taking 
advantage of an additional database to implement the queue?), but my skills are 
still lacking on how to implement that. I'll worry about that if the fork() 
time is noticeable.

Thanks,
  Jorge

--
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-05-29 Thread Jorge Acereda

On May 29, 2012, at 12:30 AM, José Romero wrote:

> On Mon, 28 May 2012 23:51:21 +0200
> Jorge Acereda  wrote:
> 
>> Hi,
>> 
>> I need to invoke external commands for each entry in my database and
>> I want to run those in parallel, but restricting the number of
>> simultaneous jobs to a certain number to avoid bringing the machine
>> to its knees (sort of 'make -jN').
>> 
>> How would you approach such problem?
>> 
> First a bit more of information is needed, what do you need to
> parallelize? The processing steps within each entry, or the processing
> of the entries themselves? Are there any dependencies that should be
> taken into account?

I'll try to explain better what I'm trying to do.
As a learning exercise, I'm implementing a build tool (sort of 'scons'). 
A target can have explicit dependencies (specified when the target is inserted 
in the database) or calculated dependencies (in the case of a C target, 
obtained via 'cc -M', 'mkdep' or something similar).
I still don't know how fast will be the stage that determines the build order, 
so at this stage i'm only worried about being able to run the mkdeps stage and 
the build stage in parallel.

Regards,
  Jorge


> 
>> Thanks,
>>  Jorge
>> 
> 
> Cheers,
> José
> --
> UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe

--
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-05-29 Thread Alexander Burger
Hi Jorge, Henrik,

On Tue, May 29, 2012 at 11:26:42AM +0700, Henrik Sarvell wrote:
> More complicated (but prettier) would be a later -> wait combo (if
> possible in your situation):

Yes, I would also say that 'later' / 'wait' is the way to go. But as I
understood Jorge, he wants to do something with (a subset of) all
database objects of a given (single) database.

In the general case, I would select those objects with a 'pilog'
expression, and then process them with 'later'. If -- as in Jorge's
question -- all objects are desired, I'd use 'iter' to directly iterate
a tree:

   (de processCustomers (N . Prg)
  (let Lst (need N)
 (iter (tree 'nr '+CuSu)
'((This)
   (let Pos (wait NIL (memq NIL Lst))
  (set Pos T)
  (later Pos
 (run Prg)
 NIL ) ) ) ) ) )

I've tested it in the demo 'app'

   $ ./pil app/main.l lib/too.l -main +
   ...
   : (processCustomers 5 (msg This) (wait 500))
   {2-1}
   {2-5}
   ...

The 'Lst' variable holds a list of the desired length, initialized to
all 'NIL' elements.

Then all customer objects are iterated, using the 'nr' number index.

'memq' finds the next free slot, and points 'Pos' to it. 'set' marks
that slot with 'T', then 'later' starts to do the work in 'Prg'. Finally
the 'later' body returns 'NIL', to mark that slot as "free" again.

(Note that for a general solution (e.g. in a library), it is recommended
to use transient symbols for all involved parameters and variables)

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-05-28 Thread Henrik Sarvell
Ugly but... Chunk up the job and then you call wait with a
sufficiently big number that each chunk's pipes have time to terminate
before the next chunk.

More complicated (but prettier) would be a later -> wait combo (if
possible in your situation):

(dm evalAll> @
   (let Result
  (make
 (for N (getSockNums> This)
(later (chain (cons "void"))
   (eval> This N (rest)
  (wait 5000 (not (memq "void" Result)))
  Result))

The above example queries an arbitrary amount of external databases in
parallel, it waits for 5 seconds or until all of them have returned
something.


On Tue, May 29, 2012 at 5:30 AM, José Romero  wrote:
> On Mon, 28 May 2012 23:51:21 +0200
> Jorge Acereda  wrote:
>
>> Hi,
>>
>> I need to invoke external commands for each entry in my database and
>> I want to run those in parallel, but restricting the number of
>> simultaneous jobs to a certain number to avoid bringing the machine
>> to its knees (sort of 'make -jN').
>>
>> How would you approach such problem?
>>
> First a bit more of information is needed, what do you need to
> parallelize? The processing steps within each entry, or the processing
> of the entries themselves? Are there any dependencies that should be
> taken into account?
>
>> Thanks,
>>   Jorge
>>
>
> Cheers,
> José
> --
> UNSUBSCRIBE: mailto:picolisp@software-lab.de?subjectUnsubscribe
--
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Parallel command execution

2012-05-28 Thread José Romero
On Mon, 28 May 2012 23:51:21 +0200
Jorge Acereda  wrote:

> Hi,
> 
> I need to invoke external commands for each entry in my database and
> I want to run those in parallel, but restricting the number of
> simultaneous jobs to a certain number to avoid bringing the machine
> to its knees (sort of 'make -jN').
> 
> How would you approach such problem?
> 
First a bit more of information is needed, what do you need to
parallelize? The processing steps within each entry, or the processing
of the entries themselves? Are there any dependencies that should be
taken into account?

> Thanks,
>   Jorge
> 

Cheers,
José
--
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Parallel command execution

2012-05-28 Thread Jorge Acereda
Hi,

I need to invoke external commands for each entry in my database and I want to 
run those in parallel, but restricting the number of simultaneous jobs to a 
certain number to avoid bringing the machine to its knees (sort of 'make -jN').

How would you approach such problem?

Thanks,
  Jorge

--
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe