You an set the mapred.child.java.opts on a per job basis
either via -D mapred.child.java.ops="java options" or via
conf.set("mapred.child.java.opts", "java options").

Note: the conf.set must be done before the job is submitted.

On Fri, May 8, 2009 at 11:57 AM, Philip Zeyliger <phi...@cloudera.com>wrote:

 You could add "-Xss<n>" to the "mapred.child.java.opts" configuration
 setting.  That's controlling the Java stack size, which I think is the
 relevant bit for you.

That's part of it, but there's also native memory used when you start a thread with most JREs.

See the lengthy article at http://www.ibm.com/developerworks/java/library/j-nativememory-linux/index.html for more details than you probably ever wanted to know :) I haven't tried the sample code on my EC2 instances, but will try to do so next week and post results.

In the past, with FC4 & (I think) FC6, we definitely needed to constrain the OS stack size to avoid running out of native memory when spawning lots of Java threads.

-- Ken




 > <property>
  <name>mapred.child.java.opts</name>
  <value>-Xmx200m</value>
  <description>Java opts for the task tracker child processes.
  The following symbol, if present, will be interpolated: @taskid@ is
 replaced
  by current TaskID. Any other occurrences of '@' will go unchanged.
  For example, to enable verbose gc logging to a file named for the taskid
 in
  /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
        -Xmx1024m -verbose:gc -Xloggc:/tmp/@tas...@.gc

  The configuration variable mapred.child.ulimit can be used to control the
  maximum virtual memory of the child processes.
  </description>
 </property>


 On Fri, May 8, 2009 at 11:16 AM, Ken Krugler <kkrugler_li...@transpac.com
 >wrote:

 > Hi there,
 >
 > For a very specific type of reduce task, we currently need to use a large
 > number of threads.
 >
 > To avoid running out of memory, I'd like to constrain the Linux stack
 size
 > via a "ulimit -s xxx" shell script command before starting up the JVM. I
 > could do this for the entire system at boot time, but it would be better
 to
 > have it for just the Hadoop JVM(s).
 >
 > Any suggestions for how best to handle this?
 >
 > Thanks,
 >
 > > -- Ken

--
Ken Krugler
+1 530-210-6378

Reply via email to