[ https://issues.apache.org/jira/browse/HIVE-16487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989775#comment-15989775 ]
Hive QA commented on HIVE-16487: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865547/HIVE-16487.02.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10635 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_2] (batchId=234) org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4933/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4933/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4933/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865547 - PreCommit-HIVE-Build > Serious Zookeeper exception is logged when a race condition happens > ------------------------------------------------------------------- > > Key: HIVE-16487 > URL: https://issues.apache.org/jira/browse/HIVE-16487 > Project: Hive > Issue Type: Bug > Components: Locking > Affects Versions: 3.0.0 > Reporter: Peter Vary > Assignee: Peter Vary > Attachments: HIVE-16487.02.patch, HIVE-16487.patch > > > A customer started to see this in the logs, but happily everything was > working as intended: > {code} > 2017-03-30 12:01:59,446 ERROR ZooKeeperHiveLockManager: > [HiveServer2-Background-Pool: Thread-620]: Serious Zookeeper exception: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /hive_zookeeper_namespace/<TABLE_NAME>/LOCK-SHARED- > {code} > This was happening, because a race condition between the lock releasing, and > lock acquiring. The thread releasing the lock removes the parent ZK node just > after the thread acquiring the lock made sure, that the parent node exists. > Since this can happen without any real problem, I plan to add NODEEXISTS, and > NONODE as a transient ZooKeeper exception, so the users are not confused. > Also, the original author of ZooKeeperHiveLockManager maybe planned to handle > different ZooKeeperExceptions differently, and the code is hard to > understand. See the {{continue}} and the {{break}}. The {{break}} only breaks > the switch, and not the loop which IMHO is not intuitive: > {code} > do { > try { > [..] > ret = lockPrimitive(key, mode, keepAlive, parentCreated, > } catch (Exception e1) { > if (e1 instanceof KeeperException) { > KeeperException e = (KeeperException) e1; > switch (e.code()) { > case CONNECTIONLOSS: > case OPERATIONTIMEOUT: > LOG.debug("Possibly transient ZooKeeper exception: ", e); > continue; > default: > LOG.error("Serious Zookeeper exception: ", e); > break; > } > } > [..] > } > } while (tryNum < numRetriesForLock); > {code} > If we do not want to try again in case of a "Serious Zookeeper exception:", > then we should add a label to the do loop, and break it in the switch. > If we do want to try regardless of the type of the ZK exception, then we > should just change the {{continue;}} to {{break;}} and move the lines part of > the code which did not run in case of {{continue}} to the {{default}} switch, > so it is easier to understand the code. > Any suggestions or ideas [~ctang.ma] or [~szehon]? -- This message was sent by Atlassian JIRA (v6.3.15#6346)