Re: JVM Spawning

2008-09-05 Thread Doug Cutting
LocalJobRunner allows you to test your code with everything running in a 
single JVM.  Just set mapred.job.tracker=local.


Doug

Ryan LeCompte wrote:

I see... so there really isn't a way for me to test a map/reduce
program using a single node without incurring the overhead of
upping/downing JVM's... My input is broken up into 5 text files is
there a way I could start the job such that it only uses 1 map to
process the whole thing? I guess I'd have to concatenate the files
into 1 file and somehow turn off splitting?

Ryan


On Wed, Sep 3, 2008 at 12:09 AM, Owen O'Malley [EMAIL PROTECTED] wrote:

On Sep 2, 2008, at 9:00 PM, Ryan LeCompte wrote:


Beginner's question:

If I have a cluster with a single node that has a max of 1 map/1
reduce, and the job submitted has 50 maps... Then it will process only
1 map at a time. Does that mean that it's spawning 1 new JVM for each
map processed? Or re-using the same JVM when a new map can be
processed?

It creates a new JVM for each task. Devaraj is working on
https://issues.apache.org/jira/browse/HADOOP-249
which will allow the jvms to run multiple tasks sequentially.

-- Owen



Re: JVM Spawning

2008-09-02 Thread Ryan LeCompte
I see... so there really isn't a way for me to test a map/reduce
program using a single node without incurring the overhead of
upping/downing JVM's... My input is broken up into 5 text files is
there a way I could start the job such that it only uses 1 map to
process the whole thing? I guess I'd have to concatenate the files
into 1 file and somehow turn off splitting?

Ryan


On Wed, Sep 3, 2008 at 12:09 AM, Owen O'Malley [EMAIL PROTECTED] wrote:

 On Sep 2, 2008, at 9:00 PM, Ryan LeCompte wrote:

 Beginner's question:

 If I have a cluster with a single node that has a max of 1 map/1
 reduce, and the job submitted has 50 maps... Then it will process only
 1 map at a time. Does that mean that it's spawning 1 new JVM for each
 map processed? Or re-using the same JVM when a new map can be
 processed?

 It creates a new JVM for each task. Devaraj is working on
 https://issues.apache.org/jira/browse/HADOOP-249
 which will allow the jvms to run multiple tasks sequentially.

 -- Owen



Re: JVM Spawning

2008-09-02 Thread Owen O'Malley
On Tue, Sep 2, 2008 at 9:13 PM, Ryan LeCompte [EMAIL PROTECTED] wrote:

 I see... so there really isn't a way for me to test a map/reduce
 program using a single node without incurring the overhead of
 upping/downing JVM's... My input is broken up into 5 text files is
 there a way I could start the job such that it only uses 1 map to
 process the whole thing? I guess I'd have to concatenate the files
 into 1 file and somehow turn off splitting?


There is a MultipleFileInputFormat, but it is less useful than it should be,
but it is a good
place to start. If you defining a MultipleFileInputFormat that reads text
files should be pretty easy and it will give you a single map for your job.
Otherwise, yes, you'll need to make a single file and ask for a single map.

-- Owen


Re: JVM Spawning

2008-09-02 Thread Owen O'Malley
I posted an idea for an extension for MultipleFileInputFormat if someone has
any extra time. *smile*

https://issues.apache.org/jira/browse/HADOOP-4057

-- Owen