Repository: incubator-systemml
Updated Branches:
  refs/heads/master 77b8e0888 -> fa73a1b85


Add troubleshooting info for OOM error in reduce phase in Hadoop Batch Mode

Closes #128.


Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/fa73a1b8
Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/fa73a1b8
Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/fa73a1b8

Branch: refs/heads/master
Commit: fa73a1b8505edddc766482d73fe589b74c0360c5
Parents: 77b8e08
Author: Yifan (Ethan) Xu <etha...@us.ibm.com>
Authored: Tue May 3 12:14:54 2016 -0700
Committer: Deron Eriksson <de...@us.ibm.com>
Committed: Tue May 3 12:14:54 2016 -0700

----------------------------------------------------------------------
 docs/troubleshooting-guide.md | 44 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/fa73a1b8/docs/troubleshooting-guide.md
----------------------------------------------------------------------
diff --git a/docs/troubleshooting-guide.md b/docs/troubleshooting-guide.md
index f8cc745..db8f060 100644
--- a/docs/troubleshooting-guide.md
+++ b/docs/troubleshooting-guide.md
@@ -50,3 +50,47 @@ from `provided` to `compile`.
 SystemML can then be rebuilt with the `commons-math3` dependency using
 Maven (`mvn clean package -P distribution`).
 
+## OutOfMemoryError in Hadoop Reduce Phase 
+In Hadoop MapReduce, outputs from mapper nodes are copied to reducer nodes and 
then sorted (known as the *shuffle* phase) before being consumed by reducers. 
The shuffle phase utilizes several buffers that share memory space with other 
MapReduce tasks, which will throw an `OutOfMemoryError` if the shuffle buffers 
take too much space: 
+
+    Error: java.lang.OutOfMemoryError: Java heap space
+        at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:357)
+        at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:419)
+        at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:238)
+        at 
org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:348)
+        at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:368)
+        at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
+        ...
+  
+One way to fix this issue is lowering the following buffer thresholds.
+
+    mapred.job.shuffle.input.buffer.percent # default 0.70; try 0.20 
+    mapred.job.shuffle.merge.percent # default 0.66; try 0.20
+    mapred.job.reduce.input.buffer.percent # default 0.0; keep 0.0
+
+These configurations can be modified **globally** by inserting/modifying the 
following in `mapred-site.xml`.
+
+    <property>
+     <name>mapred.job.shuffle.input.buffer.percent</name>
+     <value>0.2</value>
+    </property>
+    <property>
+     <name>mapred.job.shuffle.merge.percent</name>
+     <value>0.2</value>
+    </property>
+    <property>
+     <name>mapred.job.reduce.input.buffer.percent</name>
+     <value>0.0</value>
+    </property>
+
+They can also be configured on a **per SystemML-task basis** by inserting the 
following in `SystemML-config.xml`.
+
+    <mapred.job.shuffle.merge.percent>0.2</mapred.job.shuffle.merge.percent>
+    
<mapred.job.shuffle.input.buffer.percent>0.2</mapred.job.shuffle.input.buffer.percent>
+    
<mapred.job.reduce.input.buffer.percent>0</mapred.job.reduce.input.buffer.percent>
+
+Note: The default `SystemML-config.xml` is located in `<path to SystemML 
root>/conf/`. It is passed to SystemML using the `-config` argument:
+
+    hadoop jar SystemML.jar [-? | -help | -f <filename>] 
(-config=<config_filename>) ([-args | -nvargs] <args-list>)
+    
+See [Invoking SystemML in Hadoop Batch Mode](hadoop-batch-mode.html) for 
details of the syntax. 

Reply via email to