[jira] [Commented] (HIVE-14003) queries running against llap hang at times - preemption issues
[ https://issues.apache.org/jira/browse/HIVE-14003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338735#comment-15338735 ] Siddharth Seth commented on HIVE-14003: --- Thanks for the reviews [~prasanth_j], [~sershe]. Test failures are unrelated. Committing. Wonder why the test failures are up to 10 now. It was down to 4-5 a while ago. > queries running against llap hang at times - preemption issues > -- > > Key: HIVE-14003 > URL: https://issues.apache.org/jira/browse/HIVE-14003 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0 >Reporter: Takahiko Saito >Assignee: Siddharth Seth > Attachments: HIVE-14003.01.patch, HIVE-14003.02.patch > > > The preemption logic in the Hive processor needs some more work. There are > definitely windows where the abort flag is completely dropped within the Hive > processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14003) queries running against llap hang at times - preemption issues
[ https://issues.apache.org/jira/browse/HIVE-14003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337469#comment-15337469 ] Hive QA commented on HIVE-14003: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12811266/HIVE-14003.02.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10220 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-join1.q-mapjoin_decimal.q-vectorized_distinct_gby.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_repair org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_table_nonprintable org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker.testPartitionsCheck org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker.testTableCheck {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/158/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/158/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-158/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12811266 - PreCommit-HIVE-MASTER-Build > queries running against llap hang at times - preemption issues > -- > > Key: HIVE-14003 > URL: https://issues.apache.org/jira/browse/HIVE-14003 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0 >Reporter: Takahiko Saito >Assignee: Siddharth Seth > Attachments: HIVE-14003.01.patch, HIVE-14003.02.patch > > > The preemption logic in the Hive processor needs some more work. There are > definitely windows where the abort flag is completely dropped within the Hive > processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14003) queries running against llap hang at times - preemption issues
[ https://issues.apache.org/jira/browse/HIVE-14003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337250#comment-15337250 ] Sergey Shelukhin commented on HIVE-14003: - +1 > queries running against llap hang at times - preemption issues > -- > > Key: HIVE-14003 > URL: https://issues.apache.org/jira/browse/HIVE-14003 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0 >Reporter: Takahiko Saito >Assignee: Siddharth Seth > Attachments: HIVE-14003.01.patch, HIVE-14003.02.patch > > > The preemption logic in the Hive processor needs some more work. There are > definitely windows where the abort flag is completely dropped within the Hive > processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14003) queries running against llap hang at times - preemption issues
[ https://issues.apache.org/jira/browse/HIVE-14003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335111#comment-15335111 ] Prasanth Jayachandran commented on HIVE-14003: -- Can you make the TODOs as follow up jiras? nit: "KKK" can be removed. Also "Reviewer" comments. Other than that LGTM +1 > queries running against llap hang at times - preemption issues > -- > > Key: HIVE-14003 > URL: https://issues.apache.org/jira/browse/HIVE-14003 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0 >Reporter: Takahiko Saito >Assignee: Siddharth Seth > Attachments: HIVE-14003.01.patch > > > The preemption logic in the Hive processor needs some more work. There are > definitely windows where the abort flag is completely dropped within the Hive > processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14003) queries running against llap hang at times - preemption issues
[ https://issues.apache.org/jira/browse/HIVE-14003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335050#comment-15335050 ] Siddharth Seth commented on HIVE-14003: --- Which ones specifically ? My intent was to fix the rest of the TODOs left in the code as follow ups (after getting more clarity, and some more support from Tez). Fix one known problem for now. Create new jiras to fix others. > queries running against llap hang at times - preemption issues > -- > > Key: HIVE-14003 > URL: https://issues.apache.org/jira/browse/HIVE-14003 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0 >Reporter: Takahiko Saito >Assignee: Siddharth Seth > Attachments: HIVE-14003.01.patch > > > The preemption logic in the Hive processor needs some more work. There are > definitely windows where the abort flag is completely dropped within the Hive > processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14003) queries running against llap hang at times - preemption issues
[ https://issues.apache.org/jira/browse/HIVE-14003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334987#comment-15334987 ] Sergey Shelukhin commented on HIVE-14003: - looks good to me with comments addressed... > queries running against llap hang at times - preemption issues > -- > > Key: HIVE-14003 > URL: https://issues.apache.org/jira/browse/HIVE-14003 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0 >Reporter: Takahiko Saito >Assignee: Siddharth Seth > Attachments: HIVE-14003.01.patch > > > The preemption logic in the Hive processor needs some more work. There are > definitely windows where the abort flag is completely dropped within the Hive > processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14003) queries running against llap hang at times - preemption issues
[ https://issues.apache.org/jira/browse/HIVE-14003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332229#comment-15332229 ] Siddharth Seth commented on HIVE-14003: --- [~sershe], [~prasanth_j] - is the patch good to go in (with the comments changed to point to new jiras which will be created). I think there are additional cases which need to be addressed; they can be addressed in a different jira. This one, in it's current form, does get rid of queries getting stuck on hash table creation. > queries running against llap hang at times - preemption issues > -- > > Key: HIVE-14003 > URL: https://issues.apache.org/jira/browse/HIVE-14003 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0 >Reporter: Takahiko Saito >Assignee: Siddharth Seth > Attachments: HIVE-14003.01.patch > > > The preemption logic in the Hive processor needs some more work. There are > definitely windows where the abort flag is completely dropped within the Hive > processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14003) queries running against llap hang at times - preemption issues
[ https://issues.apache.org/jira/browse/HIVE-14003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15329052#comment-15329052 ] Siddharth Seth commented on HIVE-14003: --- [~hagleitn] - mind taking a look at the patch, and providing some more information on dummyOps / mergeOps. An interrupt would ideally stop an opeartion - however it's really a suggestion, and we cannot rely on libraries to handle them correctly. I suspect most of Hadoop has issues here. An HDFS jira was created and has already been fixed. The abort flag serves to protect against operations which reset the interrupt status - which is where the avoid blocking op comment comes in. In most cases we'll be OK, with an abort flag check. > queries running against llap hang at times - preemption issues > -- > > Key: HIVE-14003 > URL: https://issues.apache.org/jira/browse/HIVE-14003 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0 >Reporter: Takahiko Saito >Assignee: Siddharth Seth > Attachments: HIVE-14003.01.patch > > > The preemption logic in the Hive processor needs some more work. There are > definitely windows where the abort flag is completely dropped within the Hive > processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14003) queries running against llap hang at times - preemption issues
[ https://issues.apache.org/jira/browse/HIVE-14003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328067#comment-15328067 ] Sergey Shelukhin commented on HIVE-14003: - I think TODO: Reviewer: things need to be investigated (and other TODOs may need to be fixed). I don't have exact answers to most of them. Dummy ops are used for mapjoin. Prasanth may know more about merge. The global map does not need to be cleaned up explicitly; you can check the existing cleanup, as far as I can tell this patch wouldn't interfere with it. Overall, I think we should be able to interrupt the execution with interrupt exception, so I am not sure why some comments say that having a blocking op is a problem - shouldn't an interrupt there abort properly? Abort flag would only be an optimization then. The only problem are other library calls that can swallow interrupt exceptions... as long as abort flag is checked after those it should be alright. Also we should file JIRAs to fix the code if these are Hadoop libraries/components. At the very least they should restore the interrupt flag. There should be not retries on interrupts, etc. > queries running against llap hang at times - preemption issues > -- > > Key: HIVE-14003 > URL: https://issues.apache.org/jira/browse/HIVE-14003 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0 >Reporter: Takahiko Saito >Assignee: Siddharth Seth > Attachments: HIVE-14003.01.patch > > > The preemption logic in the Hive processor needs some more work. There are > definitely windows where the abort flag is completely dropped within the Hive > processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)