Hey guys, I am running hive and I am trying to join two tables (2.2GB and 136MB) on a cluster of 9 nodes (replication = 3)
Hadoop version - 0.20.2 Each data node memory - 2GB HADOOP_HEAPSIZE - 1000MB other heap settings are defaults. My hive launches 40 Maptasks and every task failed with the same error 2011-09-19 18:37:17,110 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 300 2011-09-19 18:37:17,223 FATAL org.apache.hadoop.mapred.TaskTracker: Error running child : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:781) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) Looks like I need to tweak some of the heap settings for TTs to handle the memory efficiently. I am unable to understand which variables to modify (there are too many related to heap sizes). Any specific things I must look at? Thanks, jS