Re: Best practices with large-memory jobs

Steve Loughran Wed, 16 Sep 2009 03:01:49 -0700

Chris Dyer wrote:

my task logs I see the message:
"attempt to override final parameter: mapred.child.ulimit;  Ignoring."
which doesn't exactly inspire confidence that I'm on the right path.

Chances are the param has been marked final in the task tracker's running
config which will prevent you overriding the value with a job specific
configuration.

Do you have any idea how one unmarks such a thing?  Do I just need to
edit the configuration file for the task tracker?

Depending upon how many tasks per node, that may not be enough. Streaming
jobs eat a crapton (I'm pretty sure that is an SI unit) of memory.  If you

Is there any particular reason for the excessive memory use?  I
realize this is Java, but it's just sloshing data down to my
processes...

Java6u14 + lets you run with "compressed pointers"; everyone is stillplaying with that but it does appear to reduce 64-bit memory use. Ifyou were using 32 bit JVMs, stay with them, as even with compressedpointers, 64 bit JVMs use more memory per object instances.

How does one change the number of map slots per
node?  I'm a hadoop configuration newbie (which is why I was
originally excited about the Cloudera EC2 scripts...)


From the code in front of my IDE

   maxMapSlots = conf.getInt("mapred.tasktracker.map.tasks.maximum", 2);

maxReduceSlots =conf.getInt("mapred.tasktracker.reduce.tasks.maximum", 2);


Those are conf values you have to tune.

Re: Best practices with large-memory jobs

Reply via email to