[CODE4LIB] many processes, one result

2008-02-18 Thread Eric Lease Morgan

How do I write a computer program that spawns many processes but
returns one result?

I suppose the classic example of my query is the federated search. Get
user input. Send it to many remote indexes. Wait. Combine results.
Return. In this scenario when one of the remote indexes is slow things
grind to a halt.

I have a more modern example. Suppose I want to take advantage of many
Web Services. One might be spell checker. Another might be a
thesaurus. Another might be an index. Another might be a user lookup
function. Given this environment, where each Web Service will return
different sets of streams, how do I query each of them simultaneously
and then aggregate the result? I don't want to so this sequentially. I
want to fork them all at once and wait for their return before a
specific time out. In Perl I can use the system command to fork a
process, but I must wait for it to return. There is another Perl
command allowing me to fork a process and keep going but I don't
remember what it is. Neither one of these solutions seem feasible. Is
the idea of threading in Java suppose to be able to address this
problem?

--
Eric Lease Morgan
University Libraries of Notre Dame

(574) 631-8604


[CODE4LIB] many processes, one result

2008-02-18 Thread Glen Newton - NRC/CNRC CISTI/ICIST Research
 How do I write a computer program that spawns many processes but
 returns one result?

 I suppose the classic example of my query is the federated search. Get
 user input. Send it to many remote indexes. Wait. Combine results.
 Return. In this scenario when one of the remote indexes is slow things
 grind to a halt.

 I have a more modern example. Suppose I want to take advantage of many
 Web Services. One might be spell checker. Another might be a
 thesaurus. Another might be an index. Another might be a user lookup
 function. Given this environment, where each Web Service will return
 different sets of streams, how do I query each of them simultaneously
 and then aggregate the result? I don't want to so this sequentially. I
 want to fork them all at once and wait for their return before a
 specific time out. In Perl I can use the system command to fork a
 process, but I must wait for it to return. There is another Perl
 command allowing me to fork a process and keep going but I don't
 remember what it is. Neither one of these solutions seem feasible. Is
 the idea of threading in Java suppose to be able to address this
 problem?

Yes. I do this thing all the time for various things (and taking
advantage of multi-cpu and multi-core). Java threading is more
lightweight than forking.

---
MyThread[] myThreads = new MyThreads[20];

// start all threads
for(int i=0; i20; i++)
{
MyThread m = new MyThread();
m.start();
myThreads[i] = m;
}

for(int i=0; i20; i++)
{
// wait for each to complete. Note that a thread may be
// completed before this method is called.
myThreads[i].join()
}

Note that there is a join(long timeoutMillis) method.
Note that the threads can be doing all sorts of different things (like
the situation you describe).

-Glen

--

Glen Newton | [EMAIL PROTECTED]
Researcher, Information Science, CISTI Research
 NRC W3C Advisory Committee Representative
http://tinyurl.com/yvchmu
tel/tél: 613-990-9163 | facsimile/télécopieur 613-952-8246
Canada Institute for Scientific and Technical Information (CISTI)
National Research Council Canada (NRC)| M-55, 1200 Montreal Road
http://www.nrc-cnrc.gc.ca/
Institut canadien de l'information scientifique et technique (ICIST)
Conseil national de recherches Canada | M-55, 1200 chemin Montréal
Ottawa, Ontario K1A 0R6
Government of Canada | Gouvernement du Canada
--


Re: [CODE4LIB] many processes, one result

2008-02-18 Thread Kevin S. Clarke
Hi Eric, you wrote:
 How do I write a computer program that spawns many processes but
 returns one result?

...

 Is
 the idea of threading in Java suppose to be able to address this
 problem?

Yes, Java threading addresses this problem.  You can spawn different
threads to accomplish different tasks and pull all the results
together in the end.

http://java.sun.com/docs/books/tutorial/essential/concurrency/

Kevin


--
There are two kinds of people in the world: those who believe there
are two kinds of people and those who know better.


Re: [CODE4LIB] many processes, one result

2008-02-18 Thread John Fereira

Eric Lease Morgan wrote:

How do I write a computer program that spawns many processes but
returns one result?

I suppose the classic example of my query is the federated search. Get
user input. Send it to many remote indexes. Wait. Combine results.
Return. In this scenario when one of the remote indexes is slow things
grind to a halt.

I have a more modern example. Suppose I want to take advantage of many
Web Services. One might be spell checker. Another might be a
thesaurus. Another might be an index. Another might be a user lookup
function. Given this environment, where each Web Service will return
different sets of streams, how do I query each of them simultaneously
and then aggregate the result? I don't want to so this sequentially. I
want to fork them all at once and wait for their return before a
specific time out. In Perl I can use the system command to fork a
process, but I must wait for it to return. There is another Perl
command allowing me to fork a process and keep going but I don't
remember what it is. Neither one of these solutions seem feasible. Is
the idea of threading in Java suppose to be able to address this
problem?

Yes.  Take a look at Brian Goetz book, Java: Concurrency in Practice.
It's the best resource I have found on creating multi-threaded applications.

On a recent project I worked on there are several steps that must be
taken in a workflow which takes a very large set of files, moves them
from one server to another, then does some qa work, some data
transformation, and finally stores a set of artifacts in a digital
repository.

I used ActiveMQ to build a message based system such that all of this
work can occur simultaneously.   It may seem that simply transferring
files from one server to another would be fairly basic operation, but
when you're dealing with hundreds of thousands of files that are
anywhere from 100Mb to over a GB is size a sequential process just can't
handle the amount of data.




Re: [CODE4LIB] many processes, one result

2008-02-18 Thread Dr R. Sanderson

And Perl's fork() explained:

  http://hell.jedicoder.net/?p=82

Enjoy :)

Rob

On Mon, 18 Feb 2008, Kevin S. Clarke wrote:


Hi Eric, you wrote:

How do I write a computer program that spawns many processes but
returns one result?


...


Is
the idea of threading in Java suppose to be able to address this
problem?


Yes, Java threading addresses this problem.  You can spawn different
threads to accomplish different tasks and pull all the results
together in the end.

http://java.sun.com/docs/books/tutorial/essential/concurrency/


Re: [CODE4LIB] many processes, one result

2008-02-18 Thread Smith,Devon
One of Erlang's real stengths is its approach to concurrent
programming.[1][2]
It differs from threaded programing - the more common approach - in
several ways. From the programmers point-of-view, Erlang's approach is
just easier to write and debug.

[1] http://www.erlang.org/course/concurrent_programming.html
[2]
http://www.computer.org/portal/site/dsonline/menuitem.9ed3d9924aeb0dcd82
ccc6716bbe36ec/index.jsp?pName=dso_level1path=dsonline/2007/10file=w5
tow.xmlxsl=article.xsl;jsessionid=H5f2QTzQWh2zMWy36pYGytVqvVDQLjKF2mYn
RhSTpwPs4qyY1JWh!1418919023

-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
Eric Lease Morgan
Sent: Monday, February 18, 2008 1:43 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] many processes, one result

How do I write a computer program that spawns many processes but returns
one result?

I suppose the classic example of my query is the federated search. Get
user input. Send it to many remote indexes. Wait. Combine results.
Return. In this scenario when one of the remote indexes is slow things
grind to a halt.

I have a more modern example. Suppose I want to take advantage of many
Web Services. One might be spell checker. Another might be a thesaurus.
Another might be an index. Another might be a user lookup function.
Given this environment, where each Web Service will return different
sets of streams, how do I query each of them simultaneously and then
aggregate the result? I don't want to so this sequentially. I want to
fork them all at once and wait for their return before a specific time
out. In Perl I can use the system command to fork a process, but I must
wait for it to return. There is another Perl command allowing me to fork
a process and keep going but I don't remember what it is. Neither one of
these solutions seem feasible. Is the idea of threading in Java suppose
to be able to address this problem?

--
Eric Lease Morgan
University Libraries of Notre Dame

(574) 631-8604


Re: [CODE4LIB] many processes, one result

2008-02-18 Thread Eric Lease Morgan

On Feb 18, 2008, at 1:42 PM, Eric Lease Morgan wrote:


How do I write a computer program that spawns many processes but
returns one result?




Thank you for the many prompt and useful replies. I am added the link
below simply for the historical records. (I may need it in the future.
0 It was given to me in channel by lbay (I think):

  http://search.cpan.org/~dlux/Parallel-ForkManager-0.7.5/

--
Eric Lease Morgan


Re: [CODE4LIB] many processes, one result

2008-02-18 Thread Jonathan Rochkind

The short answer is you want a book or article on 'concurrent
programming'. The main programming abstraction for doing this is
generally 'threads'.  Which are supported in different ways in different
environments (languages and OSs).  Another way this is sometimes done
especially in the UNIX environment is with fork and exec of processes,
at the OS level, rather than 'threads'.  Concurrent programming brings
it's own challenges and conventional design patterns/abstractions to
solve them; I'd definitely reccommend looking for some reading on this
topic, but don't have anything in particular to reccommend, I'm afraid.

And indeed, I need to address exactly the problems you mentioned in both
my Umlaut application and in federated search. The solution is indeed
concurrent programming of some kind.  In Umlaut, it's a pain since Rails
isn't particularly happy about concurrent programming.

Jonathan

Eric Lease Morgan wrote:

How do I write a computer program that spawns many processes but
returns one result?

I suppose the classic example of my query is the federated search. Get
user input. Send it to many remote indexes. Wait. Combine results.
Return. In this scenario when one of the remote indexes is slow things
grind to a halt.

I have a more modern example. Suppose I want to take advantage of many
Web Services. One might be spell checker. Another might be a
thesaurus. Another might be an index. Another might be a user lookup
function. Given this environment, where each Web Service will return
different sets of streams, how do I query each of them simultaneously
and then aggregate the result? I don't want to so this sequentially. I
want to fork them all at once and wait for their return before a
specific time out. In Perl I can use the system command to fork a
process, but I must wait for it to return. There is another Perl
command allowing me to fork a process and keep going but I don't
remember what it is. Neither one of these solutions seem feasible. Is
the idea of threading in Java suppose to be able to address this
problem?

--
Eric Lease Morgan
University Libraries of Notre Dame

(574) 631-8604



--
Jonathan Rochkind
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu