Hi Kostas, Can you provide the container size you were running with?
Were you storing any data to ObjectRegistry in this job? Please feel free to open a JIRA and it would be helpful if you could upload the app-logs along with tez-site.xml. ~Rajesh.B On Thu, Nov 6, 2014 at 11:09 AM, Kostas Tzoumas <[email protected]> wrote: > I am running into the same error [1] with plain Tez (not Hive): > > Any advice on what configuration parameters I should start looking at? > > Kostas > > [1] java.lang.OutOfMemoryError: Java heap space > at > > org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56) > at > > org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46) > at > > org.apache.tez.runtime.library.common.shuffle.MemoryFetchedInput.<init>(MemoryFetchedInput.java:38) > at > > org.apache.tez.runtime.library.common.shuffle.impl.SimpleFetchedInputAllocator.allocate(SimpleFetchedInputAllocator.java:139) > at > > org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:713) > at > > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:485) > at > > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:394) > at > > org.apache.tez.runtime.library.common.shuffle.Fetcher.call(Fetcher.java:189) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.call(Fetcher.java:71) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > On Tue, Aug 26, 2014 at 4:26 PM, Suma Shivaprasad < > [email protected]> wrote: > > > Am using Tez 0.4.0 and counters for the query run are as below > > > > 2014-08-26 14:06:41,203 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(171)) - org.apache.tez.common.counters.DAGCounter: > > 2014-08-26 14:06:41,205 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - NUM_FAILED_TASKS: 67 > > 2014-08-26 14:06:41,205 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - NUM_KILLED_TASKS: 312 > > 2014-08-26 14:06:41,205 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - TOTAL_LAUNCHED_TASKS: 259 > > 2014-08-26 14:06:41,205 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - DATA_LOCAL_TASKS: 59 > > 2014-08-26 14:06:41,205 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - RACK_LOCAL_TASKS: 27 > > 2014-08-26 14:06:41,207 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(171)) - File System Counters: > > 2014-08-26 14:06:41,208 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - FILE: BYTES_READ: 0 > > 2014-08-26 14:06:41,208 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - FILE: BYTES_WRITTEN: 3201156949 > > 2014-08-26 14:06:41,208 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - FILE: READ_OPS: 0 > > 2014-08-26 14:06:41,209 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - FILE: LARGE_READ_OPS: 0 > > 2014-08-26 14:06:41,209 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - FILE: WRITE_OPS: 0 > > 2014-08-26 14:06:41,209 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - HDFS: BYTES_READ: 30052072845 > > 2014-08-26 14:06:41,209 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - HDFS: BYTES_WRITTEN: 0 > > 2014-08-26 14:06:41,209 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - HDFS: READ_OPS: 768 > > 2014-08-26 14:06:41,209 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - HDFS: LARGE_READ_OPS: 0 > > 2014-08-26 14:06:41,209 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - HDFS: WRITE_OPS: 0 > > 2014-08-26 14:06:41,211 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(171)) - org.apache.tez.common.counters.TaskCounter: > > 2014-08-26 14:06:41,211 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - GC_TIME_MILLIS: 148639 > > 2014-08-26 14:06:41,211 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - CPU_MILLISECONDS: 1420020 > > 2014-08-26 14:06:41,211 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - PHYSICAL_MEMORY_BYTES: 304725393408 > > 2014-08-26 14:06:41,211 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - VIRTUAL_MEMORY_BYTES: 440084279296 > > 2014-08-26 14:06:41,212 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - COMMITTED_HEAP_BYTES: 337806557184 > > 2014-08-26 14:06:41,212 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - INPUT_RECORDS_PROCESSED: 722420718 > > 2014-08-26 14:06:41,212 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - OUTPUT_RECORDS: 144488481 > > 2014-08-26 14:06:41,212 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - OUTPUT_BYTES: 6876509984 > > 2014-08-26 14:06:41,212 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - OUTPUT_BYTES_WITH_OVERHEAD: 7165487118 > > 2014-08-26 14:06:41,212 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - OUTPUT_BYTES_PHYSICAL: 3201154197 > > 2014-08-26 14:06:41,212 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(171)) - > > org.apache.hadoop.hive.ql.exec.FilterOperator$Counter: > > 2014-08-26 14:06:41,212 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - FILTERED: 863123081 > > 2014-08-26 14:06:41,212 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - PASSED: 215782564 > > 2014-08-26 14:06:41,212 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(171)) - > > org.apache.hadoop.hive.ql.exec.MapOperator$Counter: > > 2014-08-26 14:06:41,212 INFO [Thread-13]: exec.Task > > (TezTask.java:execute(173)) - DESERIALIZE_ERRORS: 0 > > > > Thanks > > Suma > > > > > > On Tue, Aug 26, 2014 at 7:47 PM, Suma Shivaprasad < > > [email protected]> wrote: > > > > > Trying to run a query on Tez with the following configurations > > > > > > > > > *set hive.tez.container.size=5120* > > > *set mapreduce.map.child.java.opts=-Xmx5120M* > > > *set hive.tez.java.opts=-Xmx4096M* > > > *set hive.auto.convert.join.noconditionaltask.size=805306000* > > > *set tez.am.resource.memory.mb=5120* > > > *set tez.am.java.opts=-Xmx4096M* > > > > > > The above config settings were set after running > > > > > > https://github.com/hortonworks/hdp-configuration-utils/blob/master/2.1/hdp-configuration-utils.py > > > to get the right memory configs > > > > > > Tried with both > > > > > > set tez.runtime.io.sort.mb=512 > > > set mapreduce.task.io.sort.mb=512 > > > > > > and > > > > > > set tez.runtime.io.sort.mb=2048 > > > set mapreduce.task.io.sort.mb=2048 > > > > > > > > > The query I am trying run is > > > > > > *select sum(tab1.m1),sum(tab1.m2)* > > > * from tab1 join tab2 dm on tab1.col1=tab2.col1* > > > * where tab1.dt = '2014-06-01' * > > > * and tab2.col2 = '..'* > > > * and tab2.col3 IN ('..')* > > > * group by TAB1.col1* > > > > > > *where TAB1.col1 has high cardinality(around 700- 800 million)* > > > > > > And its going OOM during shuffle phase. > > > > > > errorMessage=Fetch failed > > > Container released by application, > > > AttemptID:attempt_1407396011310_1577_1_01_000000_4 Info:Error: > > > exceptionThrown=java.lang.OutOfMemoryError: Java heap space > > > at > > > > > > org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56) > > > at > > > > > > org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46) > > > at > > > > > > org.apache.tez.runtime.library.shuffle.common.MemoryFetchedInput.<init>(MemoryFetchedInput.java:38) > > > at > > > > > > org.apache.tez.runtime.library.shuffle.common.impl.SimpleFetchedInputAllocator.allocate(SimpleFetchedInputAllocator.java:137) > > > at > > > > > > org.apache.tez.runtime.library.shuffle.common.Fetcher.fetchInputs(Fetcher.java:252) > > > at > > > > > > org.apache.tez.runtime.library.shuffle.common.Fetcher.call(Fetcher.java:184) > > > at > > > > > > org.apache.tez.runtime.library.shuffle.common.Fetcher.call(Fetcher.java:59) > > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > > at > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > > at > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > > at java.lang.Thread.run(Thread.java:662) > > > > > > > > > Please advice if the configurations look ok? Do I need to change > > anything? > > > > > > > > > > > > Thanks > > > Suma > > > > > > > > > > > >
