[CODE4LIB] many processes, one result
How do I write a computer program that spawns many processes but returns one result? I suppose the classic example of my query is the federated search. Get user input. Send it to many remote indexes. Wait. Combine results. Return. In this scenario when one of the remote indexes is slow things grind to a halt. I have a more modern example. Suppose I want to take advantage of many Web Services. One might be spell checker. Another might be a thesaurus. Another might be an index. Another might be a user lookup function. Given this environment, where each Web Service will return different sets of streams, how do I query each of them simultaneously and then aggregate the result? I don't want to so this sequentially. I want to fork them all at once and wait for their return before a specific time out. In Perl I can use the system command to fork a process, but I must wait for it to return. There is another Perl command allowing me to fork a process and keep going but I don't remember what it is. Neither one of these solutions seem feasible. Is the idea of threading in Java suppose to be able to address this problem? -- Eric Lease Morgan University Libraries of Notre Dame (574) 631-8604
[CODE4LIB] many processes, one result
How do I write a computer program that spawns many processes but returns one result? I suppose the classic example of my query is the federated search. Get user input. Send it to many remote indexes. Wait. Combine results. Return. In this scenario when one of the remote indexes is slow things grind to a halt. I have a more modern example. Suppose I want to take advantage of many Web Services. One might be spell checker. Another might be a thesaurus. Another might be an index. Another might be a user lookup function. Given this environment, where each Web Service will return different sets of streams, how do I query each of them simultaneously and then aggregate the result? I don't want to so this sequentially. I want to fork them all at once and wait for their return before a specific time out. In Perl I can use the system command to fork a process, but I must wait for it to return. There is another Perl command allowing me to fork a process and keep going but I don't remember what it is. Neither one of these solutions seem feasible. Is the idea of threading in Java suppose to be able to address this problem? Yes. I do this thing all the time for various things (and taking advantage of multi-cpu and multi-core). Java threading is more lightweight than forking. --- MyThread[] myThreads = new MyThreads[20]; // start all threads for(int i=0; i20; i++) { MyThread m = new MyThread(); m.start(); myThreads[i] = m; } for(int i=0; i20; i++) { // wait for each to complete. Note that a thread may be // completed before this method is called. myThreads[i].join() } Note that there is a join(long timeoutMillis) method. Note that the threads can be doing all sorts of different things (like the situation you describe). -Glen -- Glen Newton | [EMAIL PROTECTED] Researcher, Information Science, CISTI Research NRC W3C Advisory Committee Representative http://tinyurl.com/yvchmu tel/tél: 613-990-9163 | facsimile/télécopieur 613-952-8246 Canada Institute for Scientific and Technical Information (CISTI) National Research Council Canada (NRC)| M-55, 1200 Montreal Road http://www.nrc-cnrc.gc.ca/ Institut canadien de l'information scientifique et technique (ICIST) Conseil national de recherches Canada | M-55, 1200 chemin Montréal Ottawa, Ontario K1A 0R6 Government of Canada | Gouvernement du Canada --
Re: [CODE4LIB] many processes, one result
Hi Eric, you wrote: How do I write a computer program that spawns many processes but returns one result? ... Is the idea of threading in Java suppose to be able to address this problem? Yes, Java threading addresses this problem. You can spawn different threads to accomplish different tasks and pull all the results together in the end. http://java.sun.com/docs/books/tutorial/essential/concurrency/ Kevin -- There are two kinds of people in the world: those who believe there are two kinds of people and those who know better.
Re: [CODE4LIB] many processes, one result
Eric Lease Morgan wrote: How do I write a computer program that spawns many processes but returns one result? I suppose the classic example of my query is the federated search. Get user input. Send it to many remote indexes. Wait. Combine results. Return. In this scenario when one of the remote indexes is slow things grind to a halt. I have a more modern example. Suppose I want to take advantage of many Web Services. One might be spell checker. Another might be a thesaurus. Another might be an index. Another might be a user lookup function. Given this environment, where each Web Service will return different sets of streams, how do I query each of them simultaneously and then aggregate the result? I don't want to so this sequentially. I want to fork them all at once and wait for their return before a specific time out. In Perl I can use the system command to fork a process, but I must wait for it to return. There is another Perl command allowing me to fork a process and keep going but I don't remember what it is. Neither one of these solutions seem feasible. Is the idea of threading in Java suppose to be able to address this problem? Yes. Take a look at Brian Goetz book, Java: Concurrency in Practice. It's the best resource I have found on creating multi-threaded applications. On a recent project I worked on there are several steps that must be taken in a workflow which takes a very large set of files, moves them from one server to another, then does some qa work, some data transformation, and finally stores a set of artifacts in a digital repository. I used ActiveMQ to build a message based system such that all of this work can occur simultaneously. It may seem that simply transferring files from one server to another would be fairly basic operation, but when you're dealing with hundreds of thousands of files that are anywhere from 100Mb to over a GB is size a sequential process just can't handle the amount of data.
Re: [CODE4LIB] many processes, one result
And Perl's fork() explained: http://hell.jedicoder.net/?p=82 Enjoy :) Rob On Mon, 18 Feb 2008, Kevin S. Clarke wrote: Hi Eric, you wrote: How do I write a computer program that spawns many processes but returns one result? ... Is the idea of threading in Java suppose to be able to address this problem? Yes, Java threading addresses this problem. You can spawn different threads to accomplish different tasks and pull all the results together in the end. http://java.sun.com/docs/books/tutorial/essential/concurrency/
Re: [CODE4LIB] many processes, one result
One of Erlang's real stengths is its approach to concurrent programming.[1][2] It differs from threaded programing - the more common approach - in several ways. From the programmers point-of-view, Erlang's approach is just easier to write and debug. [1] http://www.erlang.org/course/concurrent_programming.html [2] http://www.computer.org/portal/site/dsonline/menuitem.9ed3d9924aeb0dcd82 ccc6716bbe36ec/index.jsp?pName=dso_level1path=dsonline/2007/10file=w5 tow.xmlxsl=article.xsl;jsessionid=H5f2QTzQWh2zMWy36pYGytVqvVDQLjKF2mYn RhSTpwPs4qyY1JWh!1418919023 -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Eric Lease Morgan Sent: Monday, February 18, 2008 1:43 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] many processes, one result How do I write a computer program that spawns many processes but returns one result? I suppose the classic example of my query is the federated search. Get user input. Send it to many remote indexes. Wait. Combine results. Return. In this scenario when one of the remote indexes is slow things grind to a halt. I have a more modern example. Suppose I want to take advantage of many Web Services. One might be spell checker. Another might be a thesaurus. Another might be an index. Another might be a user lookup function. Given this environment, where each Web Service will return different sets of streams, how do I query each of them simultaneously and then aggregate the result? I don't want to so this sequentially. I want to fork them all at once and wait for their return before a specific time out. In Perl I can use the system command to fork a process, but I must wait for it to return. There is another Perl command allowing me to fork a process and keep going but I don't remember what it is. Neither one of these solutions seem feasible. Is the idea of threading in Java suppose to be able to address this problem? -- Eric Lease Morgan University Libraries of Notre Dame (574) 631-8604
Re: [CODE4LIB] many processes, one result
On Feb 18, 2008, at 1:42 PM, Eric Lease Morgan wrote: How do I write a computer program that spawns many processes but returns one result? Thank you for the many prompt and useful replies. I am added the link below simply for the historical records. (I may need it in the future. 0 It was given to me in channel by lbay (I think): http://search.cpan.org/~dlux/Parallel-ForkManager-0.7.5/ -- Eric Lease Morgan
Re: [CODE4LIB] many processes, one result
The short answer is you want a book or article on 'concurrent programming'. The main programming abstraction for doing this is generally 'threads'. Which are supported in different ways in different environments (languages and OSs). Another way this is sometimes done especially in the UNIX environment is with fork and exec of processes, at the OS level, rather than 'threads'. Concurrent programming brings it's own challenges and conventional design patterns/abstractions to solve them; I'd definitely reccommend looking for some reading on this topic, but don't have anything in particular to reccommend, I'm afraid. And indeed, I need to address exactly the problems you mentioned in both my Umlaut application and in federated search. The solution is indeed concurrent programming of some kind. In Umlaut, it's a pain since Rails isn't particularly happy about concurrent programming. Jonathan Eric Lease Morgan wrote: How do I write a computer program that spawns many processes but returns one result? I suppose the classic example of my query is the federated search. Get user input. Send it to many remote indexes. Wait. Combine results. Return. In this scenario when one of the remote indexes is slow things grind to a halt. I have a more modern example. Suppose I want to take advantage of many Web Services. One might be spell checker. Another might be a thesaurus. Another might be an index. Another might be a user lookup function. Given this environment, where each Web Service will return different sets of streams, how do I query each of them simultaneously and then aggregate the result? I don't want to so this sequentially. I want to fork them all at once and wait for their return before a specific time out. In Perl I can use the system command to fork a process, but I must wait for it to return. There is another Perl command allowing me to fork a process and keep going but I don't remember what it is. Neither one of these solutions seem feasible. Is the idea of threading in Java suppose to be able to address this problem? -- Eric Lease Morgan University Libraries of Notre Dame (574) 631-8604 -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu