[jira] [Commented] (HIVE-13525) HoS hangs when job is empty
[ https://issues.apache.org/jira/browse/HIVE-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275468#comment-15275468 ] Xuefu Zhang commented on HIVE-13525: Thanks for the explanation, [~lirui]. +1. > HoS hangs when job is empty > --- > > Key: HIVE-13525 > URL: https://issues.apache.org/jira/browse/HIVE-13525 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13525.1.patch, HIVE-13525.2.patch, > HIVE-13525.3.patch > > > Observed in local tests. This should be the cause of HIVE-13402. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13525) HoS hangs when job is empty
[ https://issues.apache.org/jira/browse/HIVE-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275463#comment-15275463 ] Rui Li commented on HIVE-13525: --- Thanks [~szehon] and [~xuefuz] for the review. What triggers the deserialization error is {{NoClassDefFoundError: org/antlr/runtime/tree/CommonTree}}. It only happens for local-cluster mode. I'm not sure why the class is needed, but adding the antlr jar to driver's class path can workaround the issue. Otherwise, any failed task may make the job hanging. > HoS hangs when job is empty > --- > > Key: HIVE-13525 > URL: https://issues.apache.org/jira/browse/HIVE-13525 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13525.1.patch, HIVE-13525.2.patch, > HIVE-13525.3.patch > > > Observed in local tests. This should be the cause of HIVE-13402. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13525) HoS hangs when job is empty
[ https://issues.apache.org/jira/browse/HIVE-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275251#comment-15275251 ] Xuefu Zhang commented on HIVE-13525: Hi [~lirui], thanks for working on this. The patch looks good to me. One thing I'm not very clear: what's the relationship between the deserialization error and the need for antlr jar? > HoS hangs when job is empty > --- > > Key: HIVE-13525 > URL: https://issues.apache.org/jira/browse/HIVE-13525 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13525.1.patch, HIVE-13525.2.patch, > HIVE-13525.3.patch > > > Observed in local tests. This should be the cause of HIVE-13402. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13525) HoS hangs when job is empty
[ https://issues.apache.org/jira/browse/HIVE-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274894#comment-15274894 ] Szehon Ho commented on HIVE-13525: -- +1 it looks good to me, sounds like SPARK-14958 is pretty important to fix > HoS hangs when job is empty > --- > > Key: HIVE-13525 > URL: https://issues.apache.org/jira/browse/HIVE-13525 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13525.1.patch, HIVE-13525.2.patch, > HIVE-13525.3.patch > > > Observed in local tests. This should be the cause of HIVE-13402. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13525) HoS hangs when job is empty
[ https://issues.apache.org/jira/browse/HIVE-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273476#comment-15273476 ] Rui Li commented on HIVE-13525: --- Test failures are not related. [~xuefuz] and [~szehon], could you help review the patch when you have time? Thanks. > HoS hangs when job is empty > --- > > Key: HIVE-13525 > URL: https://issues.apache.org/jira/browse/HIVE-13525 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13525.1.patch, HIVE-13525.2.patch, > HIVE-13525.3.patch > > > Observed in local tests. This should be the cause of HIVE-13402. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13525) HoS hangs when job is empty
[ https://issues.apache.org/jira/browse/HIVE-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265915#comment-15265915 ] Hive QA commented on HIVE-13525: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12801448/HIVE-13525.3.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 42 failed/errored test(s), 9989 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.llap.daemon.impl.comparator.TestShortestJobFirstComparator.testWaitQueueComparatorWithinDagPriority org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testDelayedLocalityNodeCommErrorImmediateAllocation org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote.org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote org.apache.hadoop.hive.metastore.TestFilterHooks.org.apache.hadoop.hive.metastore.TestFilterHooks org.apache.hadoop.hive.metastore.TestHiveMetaStoreGetMetaConf.testGetMetaConfDefault org.apache.hadoop.hive.metastore.TestHiveMetaStoreGetMetaConf.testGetMetaConfDefaultEmptyString org.apache.hadoop.hive.metastore.TestHiveMetaStoreGetMetaConf.testGetMetaConfOverridden org.apache.hadoop.hive.metastore.TestHiveMetaStoreGetMetaConf.testGetMetaConfUnknownPreperty org.apache.hadoop.hive.metastore.TestMetaStoreEndFunctionListener.testEndFunctionListener org.apache.hadoop.hive.metastore.TestMetaStoreEventListenerOnlyOnCommit.testEventStatus org.apache.hadoop.hive.metastore.TestMetaStoreInitListener.testMetaStoreInitListener org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.org.apache.hadoop.hive.metastore.TestMetaStoreMetrics org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithCommas org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithUnicode org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithValidCharacters org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler org.apache.hadoop.hive.metastore.hbase.TestHBaseImport.org.apache.hadoop.hive.metastore.hbase.TestHBaseImport org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.lockConflictDbTable org.apache.hadoop.hive.ql.security.TestClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestExtendedAcls.org.apache.hadoop.hive.ql.security.TestExtendedAcls org.apache.hadoop.hive.ql.security.TestFolderPermissions.org.apache.hadoop.hive.ql.security.TestFolderPermissions org.apache.hadoop.hive.ql.security.TestMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener.org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testDelegationTokenSharedStore org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testMetastoreProxyUser org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testSaslWithHiveMetaStore org.apache.hive.hcatalog.listener.TestDbNotificationListener.dropDatabase org.apache.hive.hcatalog.listener.TestDbNotificationListener.sqlInsertPartition org.apache.hive.jdbc.TestJdbcWithMiniHS2.testConcurrentStatements org.apache.hive.jdbc.TestSSL.testSSLFetchHttp org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.org.apache.hive.service.TestHS2ImpersonationWithRemoteMS {noformat} Test results: http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/144/testReport Console output: http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/144/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-144/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest
[jira] [Commented] (HIVE-13525) HoS hangs when job is empty
[ https://issues.apache.org/jira/browse/HIVE-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260193#comment-15260193 ] Rui Li commented on HIVE-13525: --- Did more debug and found some hanging cases are related to spark not being able to handle failed tasks properly. Created SPARK-14958 for it. [~vanzin] it'd be great if you could help look into it. Thanks. At the same time, I'll see if there's anything we can do on hive side to work around it. > HoS hangs when job is empty > --- > > Key: HIVE-13525 > URL: https://issues.apache.org/jira/browse/HIVE-13525 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13525.1.patch, HIVE-13525.2.patch > > > Observed in local tests. This should be the cause of HIVE-13402. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13525) HoS hangs when job is empty
[ https://issues.apache.org/jira/browse/HIVE-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258256#comment-15258256 ] Rui Li commented on HIVE-13525: --- {{TestSparkClient.testMetricsCollection}} failure is related. The problem is, if {{RemoteDriver}} considers a job is done after future#get returns, we may send JobResult before the listener can handle TaskEnd event and send the metrics. At the client side, the job handle is removed after JobResult is received, which means the later metrics will be simply discarded. I'll think about how to solve this. And any idea is welcome. > HoS hangs when job is empty > --- > > Key: HIVE-13525 > URL: https://issues.apache.org/jira/browse/HIVE-13525 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13525.1.patch, HIVE-13525.2.patch > > > Observed in local tests. This should be the cause of HIVE-13402. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13525) HoS hangs when job is empty
[ https://issues.apache.org/jira/browse/HIVE-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257745#comment-15257745 ] Hive QA commented on HIVE-13525: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12800441/HIVE-13525.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 47 failed/errored test(s), 9918 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestMiniTezCliDriver-auto_join30.q-script_pipe.q-vector_decimal_10_0.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-cbo_windowing.q-tez_join.q-bucket_map_join_tez1.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-constprog_dpp.q-dynamic_partition_pruning.q-tez_insert_overwrite_local_directory_1.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-dynpart_sort_optimization2.q-tez_dynpart_hashjoin_3.q-orc_vectorization_ppd.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nomore_ambiguous_table_col org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern3 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern4 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_nonkey_groupby org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_selectDistinctStarNeg_2 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_subquery_shared_alias org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udtf_not_supported1 org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_minimr_broken_pipe org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testPreemptionQueueComparator org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote.org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote org.apache.hadoop.hive.metastore.TestFilterHooks.org.apache.hadoop.hive.metastore.TestFilterHooks org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testAddPartitions org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hadoop.hive.metastore.TestMetaStoreEndFunctionListener.testEndFunctionListener org.apache.hadoop.hive.metastore.TestMetaStoreEventListenerOnlyOnCommit.testEventStatus org.apache.hadoop.hive.metastore.TestMetaStoreInitListener.testMetaStoreInitListener org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.org.apache.hadoop.hive.metastore.TestMetaStoreMetrics org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithCommas org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithUnicode org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithValidCharacters org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler org.apache.hadoop.hive.ql.security.TestClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestFolderPermissions.org.apache.hadoop.hive.ql.security.TestFolderPermissions org.apache.hadoop.hive.ql.security.TestMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener.org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccessWithReadOnly org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testDelegationTokenSharedStore org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testMetastoreProxyUser org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testSaslWithHiveMetaStore org.apache.hive.hcatalog.listener.TestDbNotifi
[jira] [Commented] (HIVE-13525) HoS hangs when job is empty
[ https://issues.apache.org/jira/browse/HIVE-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246965#comment-15246965 ] Rui Li commented on HIVE-13525: --- Thanks Xuefu. I guess there's something wrong with our jenkins job? > HoS hangs when job is empty > --- > > Key: HIVE-13525 > URL: https://issues.apache.org/jira/browse/HIVE-13525 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13525.1.patch, HIVE-13525.2.patch > > > Observed in local tests. This should be the cause of HIVE-13402. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13525) HoS hangs when job is empty
[ https://issues.apache.org/jira/browse/HIVE-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245821#comment-15245821 ] Xuefu Zhang commented on HIVE-13525: Yes, I meant to suggest moving the increment outside the log msg. Thanks for fixing that. +1 > HoS hangs when job is empty > --- > > Key: HIVE-13525 > URL: https://issues.apache.org/jira/browse/HIVE-13525 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13525.1.patch, HIVE-13525.2.patch > > > Observed in local tests. This should be the cause of HIVE-13402. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13525) HoS hangs when job is empty
[ https://issues.apache.org/jira/browse/HIVE-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244806#comment-15244806 ] Xuefu Zhang commented on HIVE-13525: Thanks for working on this, [~lirui]. The explanation makes sense. However, I'm wondering if the increment of the variable in debug message is safe: {code} + LOG.debug("Client job {}: {} of {} Spark jobs finished.", + req.id, ++completed, jobs.size()); {code} Other than that, the patch looks good to me also. > HoS hangs when job is empty > --- > > Key: HIVE-13525 > URL: https://issues.apache.org/jira/browse/HIVE-13525 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13525.1.patch > > > Observed in local tests. This should be the cause of HIVE-13402. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13525) HoS hangs when job is empty
[ https://issues.apache.org/jira/browse/HIVE-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244500#comment-15244500 ] Marcelo Vanzin commented on HIVE-13525: --- Patch LGTM. > HoS hangs when job is empty > --- > > Key: HIVE-13525 > URL: https://issues.apache.org/jira/browse/HIVE-13525 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13525.1.patch > > > Observed in local tests. This should be the cause of HIVE-13402. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13525) HoS hangs when job is empty
[ https://issues.apache.org/jira/browse/HIVE-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243998#comment-15243998 ] Rui Li commented on HIVE-13525: --- Sorry I didn't notice HIVE-13223 when creating the JIRA. [~szehon] do you think it's a dup? > HoS hangs when job is empty > --- > > Key: HIVE-13525 > URL: https://issues.apache.org/jira/browse/HIVE-13525 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13525.1.patch > > > Observed in local tests. This should be the cause of HIVE-13402. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13525) HoS hangs when job is empty
[ https://issues.apache.org/jira/browse/HIVE-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243271#comment-15243271 ] Szehon Ho commented on HIVE-13525: -- Yea it looks related to HIVE-13223, which we should investigate. > HoS hangs when job is empty > --- > > Key: HIVE-13525 > URL: https://issues.apache.org/jira/browse/HIVE-13525 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > > Observed in local tests. This should be the cause of HIVE-13402. -- This message was sent by Atlassian JIRA (v6.3.4#6332)