[jira] [Commented] (HIVE-15693) LLAP: cached threadpool in AMReporter creates too many threads leading to OOM
[ https://issues.apache.org/jira/browse/HIVE-15693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15843369#comment-15843369 ] Hive QA commented on HIVE-15693: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12849667/HIVE-15693.5.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11003 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple] (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=140) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=93) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3220/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3220/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3220/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12849667 - PreCommit-HIVE-Build > LLAP: cached threadpool in AMReporter creates too many threads leading to OOM > - > > Key: HIVE-15693 > URL: https://issues.apache.org/jira/browse/HIVE-15693 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Critical > Attachments: HIVE-15693.1.patch, HIVE-15693.2.patch, > HIVE-15693.3.patch, HIVE-15693.4.patch, HIVE-15693.5.patch > > > branch: master > {noformat} > 2017-01-22T19:52:42,470 WARN [IPC Server handler 3 on 34642 ()] > org.apache.hadoop.ipc.Server: IPC Server handler 3 on 34642, call > org.apache.hadoop.hive.llap.protocol.LlapProtocolBlockingPB.submitWork > ...Call#17257 Retry#0 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) ~[?:1.8.0_77] > at java.lang.Thread.start(Thread.java:714) [?:1.8.0_77] > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950) > ~[?:1.8.0_77] > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) > ~[?:1.8.0_77] > at > com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:480) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.llap.daemon.impl.AMReporter.taskKilled(AMReporter.java:231) > ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl$KilledTaskHandlerImpl.taskKilled(ContainerRunnerImpl.java:501) > ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15693) LLAP: cached threadpool in AMReporter creates too many threads leading to OOM
[ https://issues.apache.org/jira/browse/HIVE-15693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842613#comment-15842613 ] Hive QA commented on HIVE-15693: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12849667/HIVE-15693.5.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 11003 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple] (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple] (batchId=153) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=93) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3212/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3212/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3212/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12849667 - PreCommit-HIVE-Build > LLAP: cached threadpool in AMReporter creates too many threads leading to OOM > - > > Key: HIVE-15693 > URL: https://issues.apache.org/jira/browse/HIVE-15693 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Critical > Attachments: HIVE-15693.1.patch, HIVE-15693.2.patch, > HIVE-15693.3.patch, HIVE-15693.4.patch, HIVE-15693.5.patch > > > branch: master > {noformat} > 2017-01-22T19:52:42,470 WARN [IPC Server handler 3 on 34642 ()] > org.apache.hadoop.ipc.Server: IPC Server handler 3 on 34642, call > org.apache.hadoop.hive.llap.protocol.LlapProtocolBlockingPB.submitWork > ...Call#17257 Retry#0 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) ~[?:1.8.0_77] > at java.lang.Thread.start(Thread.java:714) [?:1.8.0_77] > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950) > ~[?:1.8.0_77] > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) > ~[?:1.8.0_77] > at > com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:480) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.llap.daemon.impl.AMReporter.taskKilled(AMReporter.java:231) > ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl$KilledTaskHandlerImpl.taskKilled(ContainerRunnerImpl.java:501) > ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15693) LLAP: cached threadpool in AMReporter creates too many threads leading to OOM
[ https://issues.apache.org/jira/browse/HIVE-15693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842416#comment-15842416 ] Lefty Leverenz commented on HIVE-15693: --- Config review: The parameter description should have newlines (\n) just like the previous parameter's description, to avoid overlong lines in the generated template file hive-default.xml.template. Also a couple of nits: The second line of the description doesn't need extra indentation. And you could add a period at the end. > LLAP: cached threadpool in AMReporter creates too many threads leading to OOM > - > > Key: HIVE-15693 > URL: https://issues.apache.org/jira/browse/HIVE-15693 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Critical > Attachments: HIVE-15693.1.patch, HIVE-15693.2.patch, > HIVE-15693.3.patch, HIVE-15693.4.patch > > > branch: master > {noformat} > 2017-01-22T19:52:42,470 WARN [IPC Server handler 3 on 34642 ()] > org.apache.hadoop.ipc.Server: IPC Server handler 3 on 34642, call > org.apache.hadoop.hive.llap.protocol.LlapProtocolBlockingPB.submitWork > ...Call#17257 Retry#0 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) ~[?:1.8.0_77] > at java.lang.Thread.start(Thread.java:714) [?:1.8.0_77] > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950) > ~[?:1.8.0_77] > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) > ~[?:1.8.0_77] > at > com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:480) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.llap.daemon.impl.AMReporter.taskKilled(AMReporter.java:231) > ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl$KilledTaskHandlerImpl.taskKilled(ContainerRunnerImpl.java:501) > ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15693) LLAP: cached threadpool in AMReporter creates too many threads leading to OOM
[ https://issues.apache.org/jira/browse/HIVE-15693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840758#comment-15840758 ] Siddharth Seth commented on HIVE-15693: --- +1. If we can override the maxThreads (based on numExecutors) - I think that should be mentioned in the description of the property before committing. > LLAP: cached threadpool in AMReporter creates too many threads leading to OOM > - > > Key: HIVE-15693 > URL: https://issues.apache.org/jira/browse/HIVE-15693 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Critical > Attachments: HIVE-15693.1.patch, HIVE-15693.2.patch, > HIVE-15693.3.patch > > > branch: master > {noformat} > 2017-01-22T19:52:42,470 WARN [IPC Server handler 3 on 34642 ()] > org.apache.hadoop.ipc.Server: IPC Server handler 3 on 34642, call > org.apache.hadoop.hive.llap.protocol.LlapProtocolBlockingPB.submitWork > ...Call#17257 Retry#0 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) ~[?:1.8.0_77] > at java.lang.Thread.start(Thread.java:714) [?:1.8.0_77] > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950) > ~[?:1.8.0_77] > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) > ~[?:1.8.0_77] > at > com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:480) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.llap.daemon.impl.AMReporter.taskKilled(AMReporter.java:231) > ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl$KilledTaskHandlerImpl.taskKilled(ContainerRunnerImpl.java:501) > ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15693) LLAP: cached threadpool in AMReporter creates too many threads leading to OOM
[ https://issues.apache.org/jira/browse/HIVE-15693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840626#comment-15840626 ] Siddharth Seth commented on HIVE-15693: --- Maybe we can have -1/0 as value where we auto determine the thread count, and any other value being an override. > LLAP: cached threadpool in AMReporter creates too many threads leading to OOM > - > > Key: HIVE-15693 > URL: https://issues.apache.org/jira/browse/HIVE-15693 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Critical > Attachments: HIVE-15693.1.patch > > > branch: master > {noformat} > 2017-01-22T19:52:42,470 WARN [IPC Server handler 3 on 34642 ()] > org.apache.hadoop.ipc.Server: IPC Server handler 3 on 34642, call > org.apache.hadoop.hive.llap.protocol.LlapProtocolBlockingPB.submitWork > ...Call#17257 Retry#0 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) ~[?:1.8.0_77] > at java.lang.Thread.start(Thread.java:714) [?:1.8.0_77] > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950) > ~[?:1.8.0_77] > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) > ~[?:1.8.0_77] > at > com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:480) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.llap.daemon.impl.AMReporter.taskKilled(AMReporter.java:231) > ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl$KilledTaskHandlerImpl.taskKilled(ContainerRunnerImpl.java:501) > ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15693) LLAP: cached threadpool in AMReporter creates too many threads leading to OOM
[ https://issues.apache.org/jira/browse/HIVE-15693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839399#comment-15839399 ] Siddharth Seth commented on HIVE-15693: --- Instead of 2x executors - think this needs to be based on the concurrency. New config parameter to set an upper bound? Lower bound to number of executors? Killed attempts is more likely to be based on number of AMs communicating, rather than the number of executors in the daemon. Eventually, I think we need to have a certain number of threads per AM - and also ensure that all threads don't end up blocking because of one bad AM. I'll create a follow up jira for this. > LLAP: cached threadpool in AMReporter creates too many threads leading to OOM > - > > Key: HIVE-15693 > URL: https://issues.apache.org/jira/browse/HIVE-15693 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Critical > Attachments: HIVE-15693.1.patch > > > branch: master > {noformat} > 2017-01-22T19:52:42,470 WARN [IPC Server handler 3 on 34642 ()] > org.apache.hadoop.ipc.Server: IPC Server handler 3 on 34642, call > org.apache.hadoop.hive.llap.protocol.LlapProtocolBlockingPB.submitWork > ...Call#17257 Retry#0 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) ~[?:1.8.0_77] > at java.lang.Thread.start(Thread.java:714) [?:1.8.0_77] > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950) > ~[?:1.8.0_77] > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) > ~[?:1.8.0_77] > at > com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:480) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.llap.daemon.impl.AMReporter.taskKilled(AMReporter.java:231) > ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl$KilledTaskHandlerImpl.taskKilled(ContainerRunnerImpl.java:501) > ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)