[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298910#comment-14298910 ] Alan Gates commented on HIVE-8966: -- I confirmed that it is already in 1.1, based on the git logs. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 1.0.0 > > Attachments: HIVE-8966-branch-1.patch, HIVE-8966.2.patch, > HIVE-8966.3.patch, HIVE-8966.4.patch, HIVE-8966.5.patch, HIVE-8966.6.patch, > HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297960#comment-14297960 ] Lefty Leverenz commented on HIVE-8966: -- Does this also need to be checked into branch-1.1 (formerly known as 0.15)? > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 1.0.0 > > Attachments: HIVE-8966-branch-1.patch, HIVE-8966.2.patch, > HIVE-8966.3.patch, HIVE-8966.4.patch, HIVE-8966.5.patch, HIVE-8966.6.patch, > HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297958#comment-14297958 ] Jihong Liu commented on HIVE-8966: -- Thanks Alan. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 1.0.0 > > Attachments: HIVE-8966-branch-1.patch, HIVE-8966.2.patch, > HIVE-8966.3.patch, HIVE-8966.4.patch, HIVE-8966.5.patch, HIVE-8966.6.patch, > HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295328#comment-14295328 ] Alan Gates commented on HIVE-8966: -- [~leftylev] no, we just made what should have worked before work properly. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 1.0.0 > > Attachments: HIVE-8966-branch-1.patch, HIVE-8966.2.patch, > HIVE-8966.3.patch, HIVE-8966.4.patch, HIVE-8966.5.patch, HIVE-8966.6.patch, > HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294911#comment-14294911 ] Lefty Leverenz commented on HIVE-8966: -- Any documentation needed? > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 1.0.0 > > Attachments: HIVE-8966-branch-1.patch, HIVE-8966.2.patch, > HIVE-8966.3.patch, HIVE-8966.4.patch, HIVE-8966.5.patch, HIVE-8966.6.patch, > HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292622#comment-14292622 ] Alan Gates commented on HIVE-8966: -- Fixed. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.2.patch, HIVE-8966.3.patch, HIVE-8966.4.patch, > HIVE-8966.5.patch, HIVE-8966.6.patch, HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292615#comment-14292615 ] Brock Noland commented on HIVE-8966: thx > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.2.patch, HIVE-8966.3.patch, HIVE-8966.4.patch, > HIVE-8966.5.patch, HIVE-8966.6.patch, HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292601#comment-14292601 ] Alan Gates commented on HIVE-8966: -- I did svn add instead of svn rm on a couple of files that moved. I'll fix it. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.2.patch, HIVE-8966.3.patch, HIVE-8966.4.patch, > HIVE-8966.5.patch, HIVE-8966.6.patch, HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292593#comment-14292593 ] Brock Noland commented on HIVE-8966: Looks like this was committed but I am seeing: {noformat} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-common: Compilation failure: Compilation failure: [ERROR] /Users/noland/workspaces/hive-apache/hive/common/src/java/org/apache/hadoop/hive/common/ValidTxnListImpl.java:[23,8] org.apache.hadoop.hive.common.ValidTxnListImpl is not abstract and does not override abstract method getInvalidTransactions() in org.apache.hadoop.hive.common.ValidTxnList [ERROR] /Users/noland/workspaces/hive-apache/hive/common/src/java/org/apache/hadoop/hive/common/ValidTxnListImpl.java:[46,3] method does not override or implement a method from a supertype [ERROR] /Users/noland/workspaces/hive-apache/hive/common/src/java/org/apache/hadoop/hive/common/ValidTxnListImpl.java:[54,3] method does not override or implement a method from a supertype [ERROR] /Users/noland/workspaces/hive-apache/hive/common/src/java/org/apache/hadoop/hive/common/ValidTxnListImpl.java:[121,3] method does not override or implement a method from a supertype [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hive-common {noformat} > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.2.patch, HIVE-8966.3.patch, HIVE-8966.4.patch, > HIVE-8966.5.patch, HIVE-8966.6.patch, HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290583#comment-14290583 ] Hive QA commented on HIVE-8966: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12694321/HIVE-8966.6.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7370 tests executed *Failed tests:* {noformat} TestSparkCliDriver-parallel_join1.q-avro_joins.q-groupby_ppr.q-and-12-more - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2506/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2506/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2506/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12694321 - PreCommit-HIVE-TRUNK-Build > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.2.patch, HIVE-8966.3.patch, HIVE-8966.4.patch, > HIVE-8966.5.patch, HIVE-8966.6.patch, HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286267#comment-14286267 ] Vikram Dixit K commented on HIVE-8966: -- +1 for a branch 1.0. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.2.patch, HIVE-8966.3.patch, HIVE-8966.4.patch, > HIVE-8966.5.patch, HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284935#comment-14284935 ] Owen O'Malley commented on HIVE-8966: - After a little more thought, I'm worried that someone will accidentally create a ValidCompactorTxnList and get confused by the different behavior. I think it would make sense to move it into the compactor package to minimize the chance that someone accidentally uses it by mistake. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.2.patch, HIVE-8966.3.patch, HIVE-8966.4.patch, > HIVE-8966.5.patch, HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284927#comment-14284927 ] Owen O'Malley commented on HIVE-8966: - This looks good, Alan. +1 One minor nit is that the class javadoc for ValidReadTxnList has "And" instead of the intended "An". > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.2.patch, HIVE-8966.3.patch, HIVE-8966.4.patch, > HIVE-8966.5.patch, HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278515#comment-14278515 ] Hive QA commented on HIVE-8966: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12692048/HIVE-8966.5.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7330 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2369/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2369/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2369/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12692048 - PreCommit-HIVE-TRUNK-Build > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.2.patch, HIVE-8966.3.patch, HIVE-8966.4.patch, > HIVE-8966.5.patch, HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272443#comment-14272443 ] Hive QA commented on HIVE-8966: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12691437/HIVE-8966.4.patch {color:green}SUCCESS:{color} +1 6764 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2322/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2322/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2322/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12691437 - PreCommit-HIVE-TRUNK-Build > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.2.patch, HIVE-8966.3.patch, HIVE-8966.4.patch, > HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14271601#comment-14271601 ] Jihong Liu commented on HIVE-8966: -- Make sense. It is so great if that solution can be implemented.Thanks > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.2.patch, HIVE-8966.3.patch, HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270282#comment-14270282 ] Alan Gates commented on HIVE-8966: -- The issue is that since the writer died with an unclosed batch it left the orc file in a state where it cannot be read without the length file. So removing the length file means any reader will fail when reading it. The proper solution is for the compactor to stop at that partition until it has determined all transactions in that file have committed or aborted. Then it should compact it using the length file, but properly ignore the length file. I'll work on the fix. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.2.patch, HIVE-8966.3.patch, HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266854#comment-14266854 ] Jihong Liu commented on HIVE-8966: -- The error occur when doing the mapreduce job. Following is log in hivemetastore.log 2015-01-06 16:42:22,506 INFO [sfdmgctmn003.gid.gap.com-32]: compactor.Worker (Worker.java:run(137)) - Starting MAJOR compaction for ds_infra.event_metrics.date=2014-12-24 2015-01-06 16:42:22,564 INFO [sfdmgctmn003.gid.gap.com-32]: impl.TimelineClientImpl (TimelineClientImpl.java:serviceInit(285)) - Timeline service address: http://sfdmgctmn003.gid.gap.com:8188/ws/v1/timeline/ 2015-01-06 16:42:22,622 INFO [sfdmgctmn003.gid.gap.com-32]: impl.TimelineClientImpl (TimelineClientImpl.java:serviceInit(285)) - Timeline service address: http://sfdmgctmn003.gid.gap.com:8188/ws/v1/timeline/ 2015-01-06 16:42:22,628 WARN [sfdmgctmn003.gid.gap.com-32]: mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(153)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 2015-01-06 16:42:22,753 WARN [sfdmgctmn003.gid.gap.com-32]: split.JobSplitWriter (JobSplitWriter.java:writeOldSplits(168)) - Max block location exceeded for split: CompactorInputSplit{base: hdfs://sfdmgct/apps/hive/warehouse/ds_infra/event_metrics/date=2014-12-24/base_0035304, bucket: 1, length: 292280, deltas: [delta_0035311_0035313, delta_0035479_0035481, delta_0035491_0035493, delta_0035515_0035517, delta_0035533_0035535, delta_0035548_0035550, delta_0035563_0035565, delta_0035578_0035580, delta_0035593_0035595, delta_0035599_0035601, delta_0035656_0035658, delta_0035671_0035673, delta_0035686_0035688, delta_0035701_0035703, delta_0035716_0035718, delta_0035731_0035733, delta_0035746_0035748, delta_0035761_0035763, delta_0035776_0035778, delta_0035791_0035793, delta_0035806_0035808, delta_0035821_0035823, delta_0035830_0035832, delta_0035842_0035844, delta_0035854_0035856, delta_0035866_0035868, delta_0035878_0035880]} splitsize: 27 maxsize: 10 2015-01-06 16:42:22,753 WARN [sfdmgctmn003.gid.gap.com-32]: split.JobSplitWriter (JobSplitWriter.java:writeOldSplits(168)) - Max block location exceeded for split: CompactorInputSplit{base: null, bucket: 3, length: 199770, deltas: [delta_0035311_0035313, delta_0035479_0035481, delta_0035491_0035493, delta_0035515_0035517, delta_0035533_0035535, delta_0035548_0035550, delta_0035563_0035565, delta_0035578_0035580, delta_0035593_0035595, delta_0035599_0035601, delta_0035656_0035658, delta_0035671_0035673, delta_0035686_0035688, delta_0035701_0035703, delta_0035716_0035718, delta_0035731_0035733, delta_0035746_0035748, delta_0035761_0035763, delta_0035776_0035778, delta_0035791_0035793, delta_0035806_0035808, delta_0035821_0035823, delta_0035830_0035832, delta_0035842_0035844, delta_0035854_0035856, delta_0035866_0035868, delta_0035878_0035880]} splitsize: 21 maxsize: 10 2015-01-06 16:42:22,753 WARN [sfdmgctmn003.gid.gap.com-32]: split.JobSplitWriter (JobSplitWriter.java:writeOldSplits(168)) - Max block location exceeded for split: CompactorInputSplit{base: hdfs://sfdmgct/apps/hive/warehouse/ds_infra/event_metrics/date=2014-12-24/base_0035304, bucket: 0, length: 172391, deltas: [delta_0035311_0035313, delta_0035479_0035481, delta_0035491_0035493, delta_0035515_0035517, delta_0035533_0035535, delta_0035548_0035550, delta_0035563_0035565, delta_0035578_0035580, delta_0035593_0035595, delta_0035599_0035601, delta_0035656_0035658, delta_0035671_0035673, delta_0035686_0035688, delta_0035701_0035703, delta_0035716_0035718, delta_0035731_0035733, delta_0035746_0035748, delta_0035761_0035763, delta_0035776_0035778, delta_0035791_0035793, delta_0035806_0035808, delta_0035821_0035823, delta_0035830_0035832, delta_0035842_0035844, delta_0035854_0035856, delta_0035866_0035868, delta_0035878_0035880]} splitsize: 30 maxsize: 10 2015-01-06 16:42:22,777 INFO [sfdmgctmn003.gid.gap.com-32]: mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(494)) - number of splits:4 2015-01-06 16:42:22,793 INFO [sfdmgctmn003.gid.gap.com-32]: mapreduce.JobSubmitter (JobSubmitter.java:printTokens(583)) - Submitting tokens for job: job_1419291043936_1639 2015-01-06 16:42:23,000 INFO [sfdmgctmn003.gid.gap.com-32]: impl.YarnClientImpl (YarnClientImpl.java:submitApplication(251)) - Submitted application application_1419291043936_1639 2015-01-06 16:42:23,001 INFO [sfdmgctmn003.gid.gap.com-32]: mapreduce.Job (Job.java:submit(1300)) - The url to track the job: http://sfdmgctmn002.gid.gap.com:8088/proxy/application_1419291043936_1639/ 2015-01-06 16:42:23,001 INFO [sfdmgctmn003.gid.gap.com-32]: mapreduce.Job (Job.java:monitorAndPrintJob(1345)) - Running job: job_1419291043936_1639 2015-01-06 16:42:30,042 INFO [sfdmgctmn003.gid.gap.com-32]: mapreduce.Job (Job.java:mon
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265257#comment-14265257 ] Alan Gates commented on HIVE-8966: -- What error message does it give when it fails? I would expect this to work. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.2.patch, HIVE-8966.3.patch, HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264177#comment-14264177 ] Jihong Liu commented on HIVE-8966: -- Did a test. Generally the new version works as expected. But for the following case, the compaction will always fail: 1. due to any reason, the writer exits without closing a batch. So the "length" file is still there. This could happen, for example the program is killed, hive/server restarts. 2. restart the program, so a new writer and a new batch is created and continute to write into the same partition. The data will go to a new delta. 3. Now we manually delete that "length" file in the previous delta. Then do compaction, but it fails. Even we totally exit the program so that no any open batch and no any "length" file, the compaction will never success for this partition. However the current hive 14.0 will work fine for the above case. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.2.patch, HIVE-8966.3.patch, HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256700#comment-14256700 ] Hive QA commented on HIVE-8966: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12688699/HIVE-8966.3.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6724 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_lvj_mapjoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2168/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2168/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2168/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12688699 - PreCommit-HIVE-TRUNK-Build > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.2.patch, HIVE-8966.3.patch, HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240701#comment-14240701 ] Jihong Liu commented on HIVE-8966: -- Alan, Your idea is very good. But there is an issue here -- we should only do this "compacting" test for the most recent delta, not for all deltas. Following is an example for the reason: Assume there are two deltas: 1 delta_00011_00020this delta has open transaction batch 2 delta_00021_00030this delta has no open transaction batch. All closed. In the above, the first delta has open transaction batch, the second has not. And the second delta is the most recent delta. This case is possible, especially when multiple threads write to the same partition. If we ignore the first one, then the compaction will success and create a base, like base_00030. Then cleaner will delete all the two deltas since their transaction id are less or equal to the base transaction id. Thus the data in delta 2 will be lost. This is why we should only test the most recent delta, all other deltas will be automatically in the list. Thus in this case, the compaction will be fail, since the "flush_length" file is there. And for this case, the compaction will be success only when all transaction batchs are closed. Although it is not perfect, at least no data lost. Since each delta file and transaction id for compaction is not saved anywhere, probably this is the only solution for now. In my removeNotCompactableDeltas() method, we first sort the deltas, then only check the last one. But the name: "removeNotCompactableDeltas" is not good, easy makes confusion. It will be clear if named it as "removeLastDeltaIfNotCompactable". Thanks > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.2.patch, HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240482#comment-14240482 ] Hive QA commented on HIVE-8966: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12686124/HIVE-8966.2.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6704 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2013/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2013/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2013/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12686124 - PreCommit-HIVE-TRUNK-Build > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.2.patch, HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240415#comment-14240415 ] Owen O'Malley commented on HIVE-8966: - Alan, your patch looks good +1 > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.2.patch, HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240004#comment-14240004 ] Jihong Liu commented on HIVE-8966: -- I see. Basically there are two solutions. One is that when get the delta list, we don't include the current delta if it has open tranaction. So uptate the AcidUtil.getAcidState() directly. The other is what I posted here. We first get the delta list, then when do compaction, we don't compact the last one if there is open transaction. The first solution is better as long as changing getAcidState() doesn't affact other existing code, since it is a public static method. By the way, we should only do that to the current delta (the delta with the largest transaction id), not to all deltas which have open transactions. If I am correct, the base file will be named based on the largest transaction id in the deltas. So if the latest delta is closed, but an early delta has an open transaction, we should not do anything. So simply let the compaction fail. Otherwise, the base will be named by the last transaction id, and all early deltas will be removed. That will cause data lost. This is my understanding, please correct me, it it is not correct. Thanks > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239750#comment-14239750 ] Alan Gates commented on HIVE-8966: -- Rather than go remove these directories from the list of deltas I think it makes more sense to change Directory.getAcidState to not include these deltas. We obviously can't do that in all cases, as readers need to see these deltas. But we can change it to see that this is the compactor and therefore those should be excluded. I'll post a patch with this change. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239636#comment-14239636 ] Alan Gates commented on HIVE-8966: -- Don't worry about the results from testing, those tests are flaky. I'll review the patch. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237257#comment-14237257 ] Jihong Liu commented on HIVE-8966: -- I am confused about the QA test. The error looks like not related to HIVE-8966.patch. First, was this patch really included in the build? Also this patch is for 0.14.1, not for trunk. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237080#comment-14237080 ] Hive QA commented on HIVE-8966: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12685590/HIVE-8966.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6696 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1986/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1986/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1986/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12685590 - PreCommit-HIVE-TRUNK-Build > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237067#comment-14237067 ] Jihong Liu commented on HIVE-8966: -- Alan, I created a wrong patch about 1 hour ago. Before I removed it. QA automatically did the above test. Please ignore and look the current attached patch. I think it really solves the issue. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237062#comment-14237062 ] Jihong Liu commented on HIVE-8966: -- Hi Alan,I have created a new patch. It works fine. The patch is pasted in that jira, also added comment about the logic. Please have a look. Thanks and have a good dayJihong From: Alan Gates (JIRA) To: jhli...@yahoo.com Sent: Friday, December 5, 2014 7:41 AM Subject: [jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted [ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235645#comment-14235645 ] Alan Gates commented on HIVE-8966: -- Jihong, thanks for doing the testing on this. We could change this to not compact the current delta file, or we could change the cleaner to not remove the delta file that was still open during compaction. I'll try to look at this in the next couple of days. We need to get this fixed for 0.14.1. -- This message was sent by Atlassian JIRA (v6.3.4#6332) > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237057#comment-14237057 ] Hive QA commented on HIVE-8966: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12685584/HIVE-8966.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1985/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1985/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1985/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-1985/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java' Reverted 'common/src/java/org/apache/hadoop/hive/conf/HiveConf.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java' ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/scheduler/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target accumulo-handler/target hwi/target common/target common/src/gen common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig contrib/target service/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target ql/src/test/results/clientpositive/parquet_array_of_multi_field_struct_gen_schema.q.out ql/src/test/results/clientpositive/parquet_decimal_gen_schema.q.out ql/src/test/results/clientpositive/parquet_array_of_unannotated_groups_gen_schema.q.out ql/src/test/results/clientpositive/parquet_array_of_single_field_struct_gen_schema.q.out ql/src/test/results/clientpositive/parquet_array_of_unannotated_primitives_gen_schema.q.out ql/src/test/results/clientpositive/parquet_array_of_structs_gen_schema.q.out ql/src/test/results/clientpositive/parquet_avro_array_of_primitives_gen_schema.q.out ql/src/test/results/clientpositive/parquet_thrift_array_of_single_field_struct_gen_schema.q.out ql/src/test/results/clientpositive/parquet_array_of_optional_elements_gen_schema.q.out ql/src/test/results/clientpositive/parquet_avro_array_of_single_field_struct_gen_schema.q.out ql/src/test/results/clientpositive/parquet_thrift_array_of_primitives_gen_schema.q.out ql/src/test/results/clientpositive/parquet_array_of_structs_gen_schema_ext.q.out ql/src/test/results/clientpositive/parquet_array_of_required_elements_gen_schema.q.out ql/src/test/queries/clientpositive/parquet_avro_array_of_single_field_struct_gen_schema.q ql/src/test/queries/clientpositive/parquet_array_of_single_field_struct_gen_schema.q ql/src/test/queri
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237048#comment-14237048 ] Jihong Liu commented on HIVE-8966: -- By the way, hive may need another cleaning process which auto removes the bucket_n_flash_length file if the connection is actually closed. A program may not be able to close a transaction batch, due to many reasons, for example, network disconnected, server shutdown, application killed, and etc. So if the connection which creates a batch has been closed, that bucket_n_flash_length file needs to be removed. Otherwise that delta and the deltas after it can never be compacted unless we remove that file manually. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237047#comment-14237047 ] Jihong Liu commented on HIVE-8966: -- Solution: if the last delta has any file which is in bucket file pattern, but actually is non bucket file, don’t compact this delta. When a transaction is not close, a delta will have a file like bucket_n_flash_length, which is non bucket file. Actually for any reason, if the last delta has a file with bucket file pattern but not compactable, we should ignore this delta. Since after compaction, the delta will be removed. So if the whole delta cannot be compacted, leave it as what it is. So in the above scenario, the second delta will not be compacted. And the cleaner will not remove it because it has higher transaction id than the new created compaction file(base or delta). The reason we only do the above for the last delta is to consider the case that two or more transaction batches may be created and the last one is close first. Then if the last delta gets compacted, the transaction id in the base will be big, so all deltas will be removed by cleaner. So data could be lost. In this case, in the list of deltas for compaction, at least one delta has that bucket_n_flash_length file inside. Since we do not ignore it, the compaction will be auto-fail, so nothing happen, no data lost. In this case, the compaction can only be done after all transaction batches are closed. Although it is not so good, at least no data lost. The patch is attached. It adds one method to test whether needs to remove the last delta from the delta list. And before process the delta list, run that method. After applying this patch, no data is lost. We can do either major or minor compaction meanwhile keeping loading data in the same time. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237046#comment-14237046 ] Jihong Liu commented on HIVE-8966: -- The scenario of data lost: Assume when start compaction there are two deltas, delta_00011_00020 and delta_00021_00030, where the transaction batch in the first one is closed, and the second one still has transaction batch open. After compaction is finished, the status in compaction_ queue will become “ready_for_clean”. Then clean process will be triggered. Cleaner will remove all deltas if its transaction id is less than the base which just created and if there is no lock on it. In the meantime, we still load data into the second delta. When finish loading and close the transaction batch, cleaner detects no lock on that, so delete it. So the new data added after compaction will be lost. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235923#comment-14235923 ] Jihong Liu commented on HIVE-8966: -- Great. I am working on that now. Will update you after finished the testing. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235645#comment-14235645 ] Alan Gates commented on HIVE-8966: -- Jihong, thanks for doing the testing on this. We could change this to not compact the current delta file, or we could change the cleaner to not remove the delta file that was still open during compaction. I'll try to look at this in the next couple of days. We need to get this fixed for 0.14.1. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234769#comment-14234769 ] Jihong Liu commented on HIVE-8966: -- I think we may have to withdraw this patch for now. It looks like currently hive must not support doing compaction and loading in the same time for a partition. Without this patch, if loading for a partition is not completely finished, compaction will always fail, so nothing happen. After apply this patch, compaction will go through and finish. However we may loss data! I did a test. Data could be lost if we do compaction meanwhile the loading is not finished yet. But if keep the current version, it must be a limitation for hive. If streaming load to a partition for a long period, performance will be affected if cannot do compaction on it. For completely solve this issue, my initial thinking is that the delta files with open transaction should not be compacted. Currently they must be inlcuded, and it is probably the reason for data lost. But other closed delta files should be able to compact. So we can do compaction and loading in the same time. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233790#comment-14233790 ] Jihong Liu commented on HIVE-8966: -- The patch is attached. Please review. Thanks > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > Fix For: 0.14.1 > > Attachments: HIVE-8966.patch > > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232306#comment-14232306 ] Jihong Liu commented on HIVE-8966: -- Thanks. So now the fix is in 0.14.1? > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227045#comment-14227045 ] Gunther Hagleitner commented on HIVE-8966: -- +1 for 0.14.1 > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226943#comment-14226943 ] Alan Gates commented on HIVE-8966: -- Ok, that makes sense. You're current delta has the file because it's still open and being written to. It also explains why my tests don't see it, as they don't run long enough. The streaming is always done by the time the compactor kicks in. Why don't you post a patch to this JIRA with the change for 1, and I can get that committed. [~hagleitn], I'd like to put this in 0.14.1 as well as trunk if you're ok with it, since it blocks compaction for users using the streaming interface. > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226925#comment-14226925 ] Jihong Liu commented on HIVE-8966: -- That flush_length file is only in the most recent delta. By the way, for streaming loading, a transaction batch is probably always open since data keeps coming. Is it possible to do compaction in the streaming loading environment? Thanks > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226890#comment-14226890 ] Alan Gates commented on HIVE-8966: -- 1 might be the right thing to do. 2 breaks backward compatibility. Before we do that though I'd like to understand why you still see the flush length files hanging around. In my tests I don't see this issue because the flush length file is properly cleaned up. I want to make sure that its existence doesn't mean something else is wrong. Do you see the flush length files in all delta directories or only the most recent? > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226872#comment-14226872 ] Jihong Liu commented on HIVE-8966: -- Yes. Closed the transaction batch. Suggest to do either the following two updates, or do both: 1. if a file is non-bucket file, don't try to compact it. So update the following code: in org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.java Change the following code: private void addFileToMap(Matcher matcher, Path file, boolean sawBase, Map splitToBucketMap) { if (!matcher.find()) { LOG.warn("Found a non-bucket file that we thought matched the bucket pattern! " + file.toString()); } . to: private void addFileToMap(Matcher matcher, Path file, boolean sawBase, Map splitToBucketMap) { if (!matcher.find()) { LOG.warn("Found a non-bucket file that we thought matched the bucket pattern! " + file.toString()); return; } 2. don't use the bucket file pattern to name to "flush_length" file. So update the following code: in org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.java change the following code: static Path getSideFile(org.apache.tools.ant.types.Path main) { return new Path(main + "_flush_length"); } to: static Path getSideFile(org.apache.tools.ant.types.Path main) { if (main.toString().startsWith("bucket_")) { return new Path("bkt"+main.toString().substring(6)+ "_flush_length"); } else return new Path(main + "_flush_length"); } after did the above updates and re-compiled the hive-exec.jar, the compaction works fine now > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226794#comment-14226794 ] Alan Gates commented on HIVE-8966: -- This flush length file should be removed when the batch is closed. Are you closing the transaction batch on a regular basis? > Delta files created by hive hcatalog streaming cannot be compacted > -- > > Key: HIVE-8966 > URL: https://issues.apache.org/jira/browse/HIVE-8966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.14.0 > Environment: hive >Reporter: Jihong Liu >Assignee: Alan Gates >Priority: Critical > > hive hcatalog streaming will also create a file like bucket_n_flush_length in > each delta directory. Where "n" is the bucket number. But the > compactor.CompactorMR think this file also needs to compact. However this > file of course cannot be compacted, so compactor.CompactorMR will not > continue to do the compaction. > Did a test, after removed the bucket_n_flush_length file, then the "alter > table partition compact" finished successfully. If don't delete that file, > nothing will be compacted. > This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)