Re: JVM Spawning
LocalJobRunner allows you to test your code with everything running in a single JVM. Just set mapred.job.tracker=local. Doug Ryan LeCompte wrote: I see... so there really isn't a way for me to test a map/reduce program using a single node without incurring the overhead of upping/downing JVM's... My input is broken up into 5 text files is there a way I could start the job such that it only uses 1 map to process the whole thing? I guess I'd have to concatenate the files into 1 file and somehow turn off splitting? Ryan On Wed, Sep 3, 2008 at 12:09 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote: On Sep 2, 2008, at 9:00 PM, Ryan LeCompte wrote: Beginner's question: If I have a cluster with a single node that has a max of 1 map/1 reduce, and the job submitted has 50 maps... Then it will process only 1 map at a time. Does that mean that it's spawning 1 new JVM for each map processed? Or re-using the same JVM when a new map can be processed? It creates a new JVM for each task. Devaraj is working on https://issues.apache.org/jira/browse/HADOOP-249 which will allow the jvms to run multiple tasks sequentially. -- Owen
Re: JVM Spawning
I posted an idea for an extension for MultipleFileInputFormat if someone has any extra time. *smile* https://issues.apache.org/jira/browse/HADOOP-4057 -- Owen
Re: JVM Spawning
On Tue, Sep 2, 2008 at 9:13 PM, Ryan LeCompte <[EMAIL PROTECTED]> wrote: > I see... so there really isn't a way for me to test a map/reduce > program using a single node without incurring the overhead of > upping/downing JVM's... My input is broken up into 5 text files is > there a way I could start the job such that it only uses 1 map to > process the whole thing? I guess I'd have to concatenate the files > into 1 file and somehow turn off splitting? There is a MultipleFileInputFormat, but it is less useful than it should be, but it is a good place to start. If you defining a MultipleFileInputFormat that reads text files should be pretty easy and it will give you a single map for your job. Otherwise, yes, you'll need to make a single file and ask for a single map. -- Owen
Re: JVM Spawning
I see... so there really isn't a way for me to test a map/reduce program using a single node without incurring the overhead of upping/downing JVM's... My input is broken up into 5 text files is there a way I could start the job such that it only uses 1 map to process the whole thing? I guess I'd have to concatenate the files into 1 file and somehow turn off splitting? Ryan On Wed, Sep 3, 2008 at 12:09 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > > On Sep 2, 2008, at 9:00 PM, Ryan LeCompte wrote: > >> Beginner's question: >> >> If I have a cluster with a single node that has a max of 1 map/1 >> reduce, and the job submitted has 50 maps... Then it will process only >> 1 map at a time. Does that mean that it's spawning 1 new JVM for each >> map processed? Or re-using the same JVM when a new map can be >> processed? > > It creates a new JVM for each task. Devaraj is working on > https://issues.apache.org/jira/browse/HADOOP-249 > which will allow the jvms to run multiple tasks sequentially. > > -- Owen >
Re: JVM Spawning
On Sep 2, 2008, at 9:00 PM, Ryan LeCompte wrote: Beginner's question: If I have a cluster with a single node that has a max of 1 map/1 reduce, and the job submitted has 50 maps... Then it will process only 1 map at a time. Does that mean that it's spawning 1 new JVM for each map processed? Or re-using the same JVM when a new map can be processed? It creates a new JVM for each task. Devaraj is working on https://issues.apache.org/jira/browse/HADOOP-249 which will allow the jvms to run multiple tasks sequentially. -- Owen
JVM Spawning
Beginner's question: If I have a cluster with a single node that has a max of 1 map/1 reduce, and the job submitted has 50 maps... Then it will process only 1 map at a time. Does that mean that it's spawning 1 new JVM for each map processed? Or re-using the same JVM when a new map can be processed? Thanks, Ryan