[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=523683=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-523683 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 14/Dec/20 00:53 Start Date: 14/Dec/20 00:53 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1415: URL: https://github.com/apache/hive/pull/1415 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 523683) Time Spent: 12.5h (was: 12h 20m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 12.5h > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=520841=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520841 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 07/Dec/20 00:50 Start Date: 07/Dec/20 00:50 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1415: URL: https://github.com/apache/hive/pull/1415#issuecomment-739601932 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520841) Time Spent: 12h 20m (was: 12h 10m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 12h 20m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=503053=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503053 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 21/Oct/20 07:51 Start Date: 21/Oct/20 07:51 Worklog Time Spent: 10m Work Description: deniskuzZ merged pull request #1548: URL: https://github.com/apache/hive/pull/1548 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 503053) Time Spent: 12h 10m (was: 12h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 12h 10m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=503051=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503051 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 21/Oct/20 07:37 Start Date: 21/Oct/20 07:37 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r509054430 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -414,76 +403,30 @@ public void markCleaned(CompactionInfo info) throws MetaException { * aborted TXN_COMPONENTS above tc_writeid (and consequently about aborted txns). * See {@link ql.txn.compactor.Cleaner.removeFiles()} */ -s = "SELECT DISTINCT \"TXN_ID\" FROM \"TXNS\", \"TXN_COMPONENTS\" WHERE \"TXN_ID\" = \"TC_TXNID\" " -+ "AND \"TXN_STATE\" = " + TxnStatus.ABORTED + " AND \"TC_DATABASE\" = ? AND \"TC_TABLE\" = ?"; -if (info.highestWriteId != 0) s += " AND \"TC_WRITEID\" <= ?"; -if (info.partName != null) s += " AND \"TC_PARTITION\" = ?"; - +s = "DELETE FROM \"TXN_COMPONENTS\" WHERE \"TC_TXNID\" IN (" + Review comment: never mind, LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 503051) Time Spent: 12h (was: 11h 50m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 12h > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=503049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503049 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 21/Oct/20 07:20 Start Date: 21/Oct/20 07:20 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r508809944 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java ## @@ -589,4 +593,9 @@ private void checkInterrupt() throws InterruptedException { throw new InterruptedException("Compaction execution is interrupted"); } } -} + + private static boolean isDynPartAbort(Table t, CompactionInfo ci) { Review comment: those are actually 2 diff methods the only common part is the check for isDynPart. Also there is no CompactionUtils only CompactorUtil, that contains thread factory stuff. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 503049) Time Spent: 11h 50m (was: 11h 40m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 11h 50m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=502838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502838 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 20/Oct/20 20:14 Start Date: 20/Oct/20 20:14 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r508809944 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java ## @@ -589,4 +593,9 @@ private void checkInterrupt() throws InterruptedException { throw new InterruptedException("Compaction execution is interrupted"); } } -} + + private static boolean isDynPartAbort(Table t, CompactionInfo ci) { Review comment: could be, do you know if there is some helper class I could move isDynPartAbort method? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 502838) Time Spent: 11h 40m (was: 11.5h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 11h 40m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=502837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502837 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 20/Oct/20 20:10 Start Date: 20/Oct/20 20:10 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r508807499 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -414,76 +403,30 @@ public void markCleaned(CompactionInfo info) throws MetaException { * aborted TXN_COMPONENTS above tc_writeid (and consequently about aborted txns). * See {@link ql.txn.compactor.Cleaner.removeFiles()} */ -s = "SELECT DISTINCT \"TXN_ID\" FROM \"TXNS\", \"TXN_COMPONENTS\" WHERE \"TXN_ID\" = \"TC_TXNID\" " -+ "AND \"TXN_STATE\" = " + TxnStatus.ABORTED + " AND \"TC_DATABASE\" = ? AND \"TC_TABLE\" = ?"; -if (info.highestWriteId != 0) s += " AND \"TC_WRITEID\" <= ?"; -if (info.partName != null) s += " AND \"TC_PARTITION\" = ?"; - +s = "DELETE FROM \"TXN_COMPONENTS\" WHERE \"TC_TXNID\" IN (" + Review comment: @pvary, could you please take a quick look? thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 502837) Time Spent: 11.5h (was: 11h 20m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 11.5h > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=502835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502835 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 20/Oct/20 20:05 Start Date: 20/Oct/20 20:05 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r508805163 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -414,76 +403,30 @@ public void markCleaned(CompactionInfo info) throws MetaException { * aborted TXN_COMPONENTS above tc_writeid (and consequently about aborted txns). * See {@link ql.txn.compactor.Cleaner.removeFiles()} */ -s = "SELECT DISTINCT \"TXN_ID\" FROM \"TXNS\", \"TXN_COMPONENTS\" WHERE \"TXN_ID\" = \"TC_TXNID\" " -+ "AND \"TXN_STATE\" = " + TxnStatus.ABORTED + " AND \"TC_DATABASE\" = ? AND \"TC_TABLE\" = ?"; -if (info.highestWriteId != 0) s += " AND \"TC_WRITEID\" <= ?"; -if (info.partName != null) s += " AND \"TC_PARTITION\" = ?"; - +s = "DELETE FROM \"TXN_COMPONENTS\" WHERE \"TC_TXNID\" IN (" + Review comment: this is an optimization that makes everything in 1 db request instead of 2 (select + delete) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 502835) Time Spent: 11h 20m (was: 11h 10m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 11h 20m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=502834=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502834 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 20/Oct/20 20:04 Start Date: 20/Oct/20 20:04 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r508804039 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -400,11 +389,11 @@ public void markCleaned(CompactionInfo info) throws MetaException { pStmt.setString(paramCount++, info.partName); } if(info.highestWriteId != 0) { - pStmt.setLong(paramCount++, info.highestWriteId); + pStmt.setLong(paramCount, info.highestWriteId); Review comment: redundant post increment ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -134,9 +132,6 @@ public CompactionTxnHandler() { response.add(info); } } - -LOG.debug("Going to rollback"); -dbConn.rollback(); Review comment: no idea :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 502834) Time Spent: 11h 10m (was: 11h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 11h 10m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=502802=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502802 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 20/Oct/20 17:25 Start Date: 20/Oct/20 17:25 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r508641433 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java ## @@ -589,4 +593,9 @@ private void checkInterrupt() throws InterruptedException { throw new InterruptedException("Compaction execution is interrupted"); } } -} + + private static boolean isDynPartAbort(Table t, CompactionInfo ci) { Review comment: This can be consolidated with most of isDynPartIngest in CompactionUtils ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -400,11 +389,11 @@ public void markCleaned(CompactionInfo info) throws MetaException { pStmt.setString(paramCount++, info.partName); } if(info.highestWriteId != 0) { - pStmt.setLong(paramCount++, info.highestWriteId); + pStmt.setLong(paramCount, info.highestWriteId); Review comment: Why was this changed? ## File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java ## @@ -2128,24 +2129,601 @@ public void testCleanerForTxnToWriteId() throws Exception { 0, TxnDbUtil.countQueryAgent(hiveConf, "select count(*) from TXN_TO_WRITE_ID")); } - private void verifyDirAndResult(int expectedDeltas) throws Exception { -FileSystem fs = FileSystem.get(hiveConf); -// Verify the content of subdirs -FileStatus[] status = fs.listStatus(new Path(TEST_WAREHOUSE_DIR + "/" + -(Table.MMTBL).toString().toLowerCase()), FileUtils.HIDDEN_FILES_PATH_FILTER); + @Test + public void testMmTableAbortWithCompaction() throws Exception { Review comment: FYI MM tests are usually in TestTxnCommandsForMmTable.java but I don't really care about this ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -134,9 +132,6 @@ public CompactionTxnHandler() { response.add(info); } } - -LOG.debug("Going to rollback"); -dbConn.rollback(); Review comment: Any ideas about why this was here? Just curious ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -414,76 +403,30 @@ public void markCleaned(CompactionInfo info) throws MetaException { * aborted TXN_COMPONENTS above tc_writeid (and consequently about aborted txns). * See {@link ql.txn.compactor.Cleaner.removeFiles()} */ -s = "SELECT DISTINCT \"TXN_ID\" FROM \"TXNS\", \"TXN_COMPONENTS\" WHERE \"TXN_ID\" = \"TC_TXNID\" " -+ "AND \"TXN_STATE\" = " + TxnStatus.ABORTED + " AND \"TC_DATABASE\" = ? AND \"TC_TABLE\" = ?"; -if (info.highestWriteId != 0) s += " AND \"TC_WRITEID\" <= ?"; -if (info.partName != null) s += " AND \"TC_PARTITION\" = ?"; - +s = "DELETE FROM \"TXN_COMPONENTS\" WHERE \"TC_TXNID\" IN (" + Review comment: This is just refactoring right? LGTM but can you make sure @pvary sees this as well? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 502802) Time Spent: 11h (was: 10h 50m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 11h >
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=498499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-498499 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 09/Oct/20 14:12 Start Date: 09/Oct/20 14:12 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501713463 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java ## @@ -853,6 +857,273 @@ public void majorCompactAfterAbort() throws Exception { Lists.newArrayList(5, 6), 1); } + @Test + public void testCleanAbortCompactAfterAbortTwoPartitions() throws Exception { +String dbName = "default"; +String tblName = "cws"; + +HiveStreamingConnection connection1 = prepareTableTwoPartitionsAndConnection(dbName, tblName, 1); +HiveStreamingConnection connection2 = prepareTableTwoPartitionsAndConnection(dbName, tblName, 1); + +connection1.beginTransaction(); +connection1.write("1,1".getBytes()); +connection1.write("2,2".getBytes()); +connection1.abortTransaction(); + +connection2.beginTransaction(); +connection2.write("1,3".getBytes()); +connection2.write("2,3".getBytes()); +connection2.write("3,3".getBytes()); +connection2.abortTransaction(); + +assertAndCompactCleanAbort(dbName, tblName); + +connection1.close(); +connection2.close(); + } + + @Test + public void testCleanAbortCompactAfterAbort() throws Exception { +String dbName = "default"; +String tblName = "cws"; + +// Create three folders with two different transactions +HiveStreamingConnection connection1 = prepareTableAndConnection(dbName, tblName, 1); +HiveStreamingConnection connection2 = prepareTableAndConnection(dbName, tblName, 1); + +connection1.beginTransaction(); +connection1.write("1,1".getBytes()); +connection1.write("2,2".getBytes()); +connection1.abortTransaction(); + +connection2.beginTransaction(); +connection2.write("1,3".getBytes()); +connection2.write("2,3".getBytes()); +connection2.write("3,3".getBytes()); +connection2.abortTransaction(); + +assertAndCompactCleanAbort(dbName, tblName); + +connection1.close(); +connection2.close(); + } + + private void assertAndCompactCleanAbort(String dbName, String tblName) throws Exception { +IMetaStoreClient msClient = new HiveMetaStoreClient(conf); +TxnStore txnHandler = TxnUtils.getTxnStore(conf); +Table table = msClient.getTable(dbName, tblName); +FileSystem fs = FileSystem.get(conf); +FileStatus[] stat = +fs.listStatus(new Path(table.getSd().getLocation())); +if (3 != stat.length) { + Assert.fail("Expecting three directories corresponding to three partitions, FileStatus[] stat " + Arrays.toString(stat)); +} + +int count = TxnDbUtil.countQueryAgent(conf, "select count(*) from TXN_COMPONENTS where TC_OPERATION_TYPE='p'"); +// We should have two rows corresponding to the two aborted transactions +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from TXN_COMPONENTS"), 2, count); + +runInitiator(conf); +count = TxnDbUtil.countQueryAgent(conf, "select count(*) from COMPACTION_QUEUE where CQ_TYPE='p'"); +// Only one job is added to the queue per table. This job corresponds to all the entries for a particular table +// with rows in TXN_COMPONENTS +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from COMPACTION_QUEUE"), 1, count); + +ShowCompactResponse rsp = txnHandler.showCompact(new ShowCompactRequest()); +Assert.assertEquals(1, rsp.getCompacts().size()); +Assert.assertEquals(TxnStore.CLEANING_RESPONSE, rsp.getCompacts().get(0).getState()); +Assert.assertEquals("cws", rsp.getCompacts().get(0).getTablename()); +Assert.assertEquals(CompactionType.CLEAN_ABORTED, +rsp.getCompacts().get(0).getType()); + +runCleaner(conf); + +// After the cleaner runs TXN_COMPONENTS and COMPACTION_QUEUE should have zero rows, also the folders should have been deleted. +count = TxnDbUtil.countQueryAgent(conf, "select count(*) from TXN_COMPONENTS"); +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from TXN_COMPONENTS"), 0, count); + +count = TxnDbUtil.countQueryAgent(conf, "select count(*) from COMPACTION_QUEUE"); +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from COMPACTION_QUEUE"), 0, count); + +RemoteIterator it = +fs.listFiles(new Path(table.getSd().getLocation()), true); +if (it.hasNext()) { + Assert.fail("Expecting compaction to have cleaned the directories, FileStatus[] stat " + Arrays.toString(stat)); Review comment: I think this assert is quit misleading. I
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=498199=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-498199 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 09/Oct/20 13:48 Start Date: 09/Oct/20 13:48 Worklog Time Spent: 10m Work Description: deniskuzZ commented on pull request #1548: URL: https://github.com/apache/hive/pull/1548#issuecomment-705545322 looks like master is broken right now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 498199) Time Spent: 10h 40m (was: 10.5h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 10h 40m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=498134=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-498134 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 09/Oct/20 13:42 Start Date: 09/Oct/20 13:42 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499754423 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) { tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES); } + /** + * Look for delta directories matching the list of writeIds and deletes them. + * @param rootPartition root partition to look for the delta directories + * @param conf configuration + * @param writeIds list of writeIds to look for in the delta directories + * @return list of deleted directories. + * @throws IOException + */ + public static List deleteDeltaDirectories(Path rootPartition, Configuration conf, Set writeIds) + throws IOException { +FileSystem fs = rootPartition.getFileSystem(conf); + +PathFilter filter = (p) -> { + String name = p.getName(); + for (Long wId : writeIds) { +if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) { Review comment: changed, included delete_delta as well ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -97,9 +100,9 @@ public void run() { long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner(); LOG.info("Cleaning based on min open txn id: " + minOpenTxnId); List cleanerList = new ArrayList<>(); - for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { + for (CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() -> -clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); + clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); Review comment: In original patch Map tableLock = new ConcurrentHashMap<>() was used to prevent a concurrent p-clean (where the whole table will be scanned). I think, that is resolved by grouping p-cleans and recording list of writeIds that needs to be removed. @vpnvishv is that correct? ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -97,9 +100,9 @@ public void run() { long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner(); LOG.info("Cleaning based on min open txn id: " + minOpenTxnId); List cleanerList = new ArrayList<>(); - for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { + for (CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() -> -clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); + clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); Review comment: In original patch Map tableLock = new ConcurrentHashMap<>() was used to prevent a concurrent p-clean (where the whole table will be scanned). I think, that is resolved by grouping p-cleans and recording list of writeIds that needs to be removed. @vpnvishv is that correct? Also we do not allow concurrent Cleaners, their execution is mutexed. ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -97,9 +100,9 @@ public void run() { long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner(); LOG.info("Cleaning based on min open txn id: " + minOpenTxnId); List cleanerList = new ArrayList<>(); - for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { + for (CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() -> -clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); + clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); Review comment: 1. In original patch Map tableLock = new ConcurrentHashMap<>() was used to prevent a concurrent p-clean (where the whole table will be scanned). I think, that is resolved by grouping p-cleans and recording list of writeIds that needs to be removed. @vpnvishv is that correct? Also we do not allow concurrent Cleaners, their execution is mutexed. 2. was related to the following issue based on Map tableLock = new
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497551=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497551 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 19:54 Start Date: 08/Oct/20 19:54 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501883427 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -107,11 +107,12 @@ public CompactionTxnHandler() { // Check for aborted txns: number of aborted txns past threshold and age of aborted txns // past time threshold boolean checkAbortedTimeThreshold = abortedTimeThreshold >= 0; -final String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\"," -+ "MIN(\"TXN_STARTED\"), COUNT(*)" +String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\", " ++ "MIN(\"TXN_STARTED\"), COUNT(*), " ++ "MAX(CASE WHEN \"TC_OPERATION_TYPE\" = " + OperationType.DYNPART + " THEN 1 ELSE 0 END) AS \"IS_DP\" " Review comment: I still don't follow. Aborted txn check is done per db/table/partition, so if you have db1/tbl1/p1/type=NOT_DP and db1/tbl1/null/type=DP - that should generate 2 entries in potential compactions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 497551) Time Spent: 10h 20m (was: 10h 10m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 10h 20m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497462 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 17:35 Start Date: 08/Oct/20 17:35 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501884660 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -107,11 +107,12 @@ public CompactionTxnHandler() { // Check for aborted txns: number of aborted txns past threshold and age of aborted txns // past time threshold boolean checkAbortedTimeThreshold = abortedTimeThreshold >= 0; -final String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\"," -+ "MIN(\"TXN_STARTED\"), COUNT(*)" +String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\", " ++ "MIN(\"TXN_STARTED\"), COUNT(*), " ++ "MAX(CASE WHEN \"TC_OPERATION_TYPE\" = " + OperationType.DYNPART + " THEN 1 ELSE 0 END) AS \"IS_DP\" " Review comment: oh, sorry, I only considered time based threshold for DYN_PART This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 497462) Time Spent: 10h 10m (was: 10h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 10h 10m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497455 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 17:19 Start Date: 08/Oct/20 17:19 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501884660 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -107,11 +107,12 @@ public CompactionTxnHandler() { // Check for aborted txns: number of aborted txns past threshold and age of aborted txns // past time threshold boolean checkAbortedTimeThreshold = abortedTimeThreshold >= 0; -final String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\"," -+ "MIN(\"TXN_STARTED\"), COUNT(*)" +String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\", " ++ "MIN(\"TXN_STARTED\"), COUNT(*), " ++ "MAX(CASE WHEN \"TC_OPERATION_TYPE\" = " + OperationType.DYNPART + " THEN 1 ELSE 0 END) AS \"IS_DP\" " Review comment: oh, sorry, I only considered time based threshold This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 497455) Time Spent: 10h (was: 9h 50m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 10h > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497453 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 17:17 Start Date: 08/Oct/20 17:17 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501883427 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -107,11 +107,12 @@ public CompactionTxnHandler() { // Check for aborted txns: number of aborted txns past threshold and age of aborted txns // past time threshold boolean checkAbortedTimeThreshold = abortedTimeThreshold >= 0; -final String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\"," -+ "MIN(\"TXN_STARTED\"), COUNT(*)" +String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\", " ++ "MIN(\"TXN_STARTED\"), COUNT(*), " ++ "MAX(CASE WHEN \"TC_OPERATION_TYPE\" = " + OperationType.DYNPART + " THEN 1 ELSE 0 END) AS \"IS_DP\" " Review comment: I still don't follow. Aborted txn check is done per db/table/partition, so if you have db1/tbl1/{p1-p100}/type=NOT_DP and db1/tbl1/null/type=DP - that should generate 2 entries in potential compactions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 497453) Time Spent: 9h 50m (was: 9h 40m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 9h 50m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497368=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497368 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 14:28 Start Date: 08/Oct/20 14:28 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501766620 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -232,6 +240,51 @@ public Object run() throws Exception { private static String idWatermark(CompactionInfo ci) { return " id=" + ci.id; } + + private void cleanAborted(CompactionInfo ci) throws MetaException { +if (ci.writeIds == null || ci.writeIds.size() == 0) { + LOG.warn("Attempted cleaning aborted transaction with empty writeId list"); Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 497368) Time Spent: 9h 40m (was: 9.5h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 9h 40m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497358=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497358 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 14:17 Start Date: 08/Oct/20 14:17 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501757123 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java ## @@ -853,6 +857,273 @@ public void majorCompactAfterAbort() throws Exception { Lists.newArrayList(5, 6), 1); } + @Test + public void testCleanAbortCompactAfterAbortTwoPartitions() throws Exception { +String dbName = "default"; +String tblName = "cws"; + +HiveStreamingConnection connection1 = prepareTableTwoPartitionsAndConnection(dbName, tblName, 1); +HiveStreamingConnection connection2 = prepareTableTwoPartitionsAndConnection(dbName, tblName, 1); + +connection1.beginTransaction(); +connection1.write("1,1".getBytes()); +connection1.write("2,2".getBytes()); +connection1.abortTransaction(); + +connection2.beginTransaction(); +connection2.write("1,3".getBytes()); +connection2.write("2,3".getBytes()); +connection2.write("3,3".getBytes()); +connection2.abortTransaction(); + +assertAndCompactCleanAbort(dbName, tblName); + +connection1.close(); +connection2.close(); + } + + @Test + public void testCleanAbortCompactAfterAbort() throws Exception { +String dbName = "default"; +String tblName = "cws"; + +// Create three folders with two different transactions +HiveStreamingConnection connection1 = prepareTableAndConnection(dbName, tblName, 1); +HiveStreamingConnection connection2 = prepareTableAndConnection(dbName, tblName, 1); + +connection1.beginTransaction(); +connection1.write("1,1".getBytes()); +connection1.write("2,2".getBytes()); +connection1.abortTransaction(); + +connection2.beginTransaction(); +connection2.write("1,3".getBytes()); +connection2.write("2,3".getBytes()); +connection2.write("3,3".getBytes()); +connection2.abortTransaction(); + +assertAndCompactCleanAbort(dbName, tblName); + +connection1.close(); +connection2.close(); + } + + private void assertAndCompactCleanAbort(String dbName, String tblName) throws Exception { +IMetaStoreClient msClient = new HiveMetaStoreClient(conf); +TxnStore txnHandler = TxnUtils.getTxnStore(conf); +Table table = msClient.getTable(dbName, tblName); +FileSystem fs = FileSystem.get(conf); +FileStatus[] stat = +fs.listStatus(new Path(table.getSd().getLocation())); +if (3 != stat.length) { + Assert.fail("Expecting three directories corresponding to three partitions, FileStatus[] stat " + Arrays.toString(stat)); +} + +int count = TxnDbUtil.countQueryAgent(conf, "select count(*) from TXN_COMPONENTS where TC_OPERATION_TYPE='p'"); +// We should have two rows corresponding to the two aborted transactions +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from TXN_COMPONENTS"), 2, count); + +runInitiator(conf); +count = TxnDbUtil.countQueryAgent(conf, "select count(*) from COMPACTION_QUEUE where CQ_TYPE='p'"); +// Only one job is added to the queue per table. This job corresponds to all the entries for a particular table +// with rows in TXN_COMPONENTS +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from COMPACTION_QUEUE"), 1, count); + +ShowCompactResponse rsp = txnHandler.showCompact(new ShowCompactRequest()); +Assert.assertEquals(1, rsp.getCompacts().size()); +Assert.assertEquals(TxnStore.CLEANING_RESPONSE, rsp.getCompacts().get(0).getState()); +Assert.assertEquals("cws", rsp.getCompacts().get(0).getTablename()); +Assert.assertEquals(CompactionType.CLEAN_ABORTED, +rsp.getCompacts().get(0).getType()); + +runCleaner(conf); + +// After the cleaner runs TXN_COMPONENTS and COMPACTION_QUEUE should have zero rows, also the folders should have been deleted. +count = TxnDbUtil.countQueryAgent(conf, "select count(*) from TXN_COMPONENTS"); +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from TXN_COMPONENTS"), 0, count); + +count = TxnDbUtil.countQueryAgent(conf, "select count(*) from COMPACTION_QUEUE"); +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from COMPACTION_QUEUE"), 0, count); + +RemoteIterator it = +fs.listFiles(new Path(table.getSd().getLocation()), true); +if (it.hasNext()) { + Assert.fail("Expecting compaction to have cleaned the directories, FileStatus[] stat " + Arrays.toString(stat)); +} + +rsp = txnHandler.showCompact(new
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497353 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 13:52 Start Date: 08/Oct/20 13:52 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501737923 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -414,77 +436,56 @@ public void markCleaned(CompactionInfo info) throws MetaException { * aborted TXN_COMPONENTS above tc_writeid (and consequently about aborted txns). * See {@link ql.txn.compactor.Cleaner.removeFiles()} */ -s = "SELECT DISTINCT \"TXN_ID\" FROM \"TXNS\", \"TXN_COMPONENTS\" WHERE \"TXN_ID\" = \"TC_TXNID\" " -+ "AND \"TXN_STATE\" = " + TxnStatus.ABORTED + " AND \"TC_DATABASE\" = ? AND \"TC_TABLE\" = ?"; -if (info.highestWriteId != 0) s += " AND \"TC_WRITEID\" <= ?"; -if (info.partName != null) s += " AND \"TC_PARTITION\" = ?"; +List queries = new ArrayList<>(); +Iterator writeIdsIter = null; +List counts = null; -pStmt = dbConn.prepareStatement(s); -paramCount = 1; -pStmt.setString(paramCount++, info.dbname); -pStmt.setString(paramCount++, info.tableName); -if(info.highestWriteId != 0) { - pStmt.setLong(paramCount++, info.highestWriteId); +s = "DELETE FROM \"TXN_COMPONENTS\" WHERE \"TC_TXNID\" IN (" + + " SELECT \"TXN_ID\" FROM \"TXNS\" WHERE \"TXN_STATE\" = " + TxnStatus.ABORTED + ") " + Review comment: cool This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 497353) Time Spent: 9h 20m (was: 9h 10m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 9h 20m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497352 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 13:45 Start Date: 08/Oct/20 13:45 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501732794 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -107,11 +107,12 @@ public CompactionTxnHandler() { // Check for aborted txns: number of aborted txns past threshold and age of aborted txns // past time threshold boolean checkAbortedTimeThreshold = abortedTimeThreshold >= 0; -final String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\"," -+ "MIN(\"TXN_STARTED\"), COUNT(*)" +String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\", " ++ "MIN(\"TXN_STARTED\"), COUNT(*), " ++ "MAX(CASE WHEN \"TC_OPERATION_TYPE\" = " + OperationType.DYNPART + " THEN 1 ELSE 0 END) AS \"IS_DP\" " Review comment: Previously if you had aborted txn above threshold this would generate a "normal" compaction that would clean up everything. However now if you have one dynpart aborted the type will be CLEAN_ABORTED that will only clean the writeids belonging to p-type records and leave everything else. This will delay the normal cleaning. I am not sure that is a problem or not. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 497352) Time Spent: 9h 10m (was: 9h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 9h 10m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497348=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497348 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 13:44 Start Date: 08/Oct/20 13:44 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501731802 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java ## @@ -853,6 +857,273 @@ public void majorCompactAfterAbort() throws Exception { Lists.newArrayList(5, 6), 1); } + @Test + public void testCleanAbortCompactAfterAbortTwoPartitions() throws Exception { +String dbName = "default"; +String tblName = "cws"; + +HiveStreamingConnection connection1 = prepareTableTwoPartitionsAndConnection(dbName, tblName, 1); +HiveStreamingConnection connection2 = prepareTableTwoPartitionsAndConnection(dbName, tblName, 1); + +connection1.beginTransaction(); +connection1.write("1,1".getBytes()); +connection1.write("2,2".getBytes()); +connection1.abortTransaction(); + +connection2.beginTransaction(); +connection2.write("1,3".getBytes()); +connection2.write("2,3".getBytes()); +connection2.write("3,3".getBytes()); +connection2.abortTransaction(); + +assertAndCompactCleanAbort(dbName, tblName); + +connection1.close(); +connection2.close(); + } + + @Test + public void testCleanAbortCompactAfterAbort() throws Exception { +String dbName = "default"; +String tblName = "cws"; + +// Create three folders with two different transactions +HiveStreamingConnection connection1 = prepareTableAndConnection(dbName, tblName, 1); +HiveStreamingConnection connection2 = prepareTableAndConnection(dbName, tblName, 1); + +connection1.beginTransaction(); +connection1.write("1,1".getBytes()); +connection1.write("2,2".getBytes()); +connection1.abortTransaction(); + +connection2.beginTransaction(); +connection2.write("1,3".getBytes()); +connection2.write("2,3".getBytes()); +connection2.write("3,3".getBytes()); +connection2.abortTransaction(); + +assertAndCompactCleanAbort(dbName, tblName); + +connection1.close(); +connection2.close(); + } + + private void assertAndCompactCleanAbort(String dbName, String tblName) throws Exception { +IMetaStoreClient msClient = new HiveMetaStoreClient(conf); +TxnStore txnHandler = TxnUtils.getTxnStore(conf); +Table table = msClient.getTable(dbName, tblName); +FileSystem fs = FileSystem.get(conf); +FileStatus[] stat = +fs.listStatus(new Path(table.getSd().getLocation())); +if (3 != stat.length) { + Assert.fail("Expecting three directories corresponding to three partitions, FileStatus[] stat " + Arrays.toString(stat)); +} + +int count = TxnDbUtil.countQueryAgent(conf, "select count(*) from TXN_COMPONENTS where TC_OPERATION_TYPE='p'"); +// We should have two rows corresponding to the two aborted transactions +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from TXN_COMPONENTS"), 2, count); + +runInitiator(conf); +count = TxnDbUtil.countQueryAgent(conf, "select count(*) from COMPACTION_QUEUE where CQ_TYPE='p'"); +// Only one job is added to the queue per table. This job corresponds to all the entries for a particular table +// with rows in TXN_COMPONENTS +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from COMPACTION_QUEUE"), 1, count); + +ShowCompactResponse rsp = txnHandler.showCompact(new ShowCompactRequest()); +Assert.assertEquals(1, rsp.getCompacts().size()); +Assert.assertEquals(TxnStore.CLEANING_RESPONSE, rsp.getCompacts().get(0).getState()); +Assert.assertEquals("cws", rsp.getCompacts().get(0).getTablename()); +Assert.assertEquals(CompactionType.CLEAN_ABORTED, +rsp.getCompacts().get(0).getType()); + +runCleaner(conf); + +// After the cleaner runs TXN_COMPONENTS and COMPACTION_QUEUE should have zero rows, also the folders should have been deleted. +count = TxnDbUtil.countQueryAgent(conf, "select count(*) from TXN_COMPONENTS"); +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from TXN_COMPONENTS"), 0, count); + +count = TxnDbUtil.countQueryAgent(conf, "select count(*) from COMPACTION_QUEUE"); +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from COMPACTION_QUEUE"), 0, count); + +RemoteIterator it = +fs.listFiles(new Path(table.getSd().getLocation()), true); +if (it.hasNext()) { + Assert.fail("Expecting compaction to have cleaned the directories, FileStatus[] stat " + Arrays.toString(stat)); +} + +rsp = txnHandler.showCompact(new
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497347=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497347 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 13:44 Start Date: 08/Oct/20 13:44 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501731697 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -232,6 +240,51 @@ public Object run() throws Exception { private static String idWatermark(CompactionInfo ci) { return " id=" + ci.id; } + + private void cleanAborted(CompactionInfo ci) throws MetaException { +if (ci.writeIds == null || ci.writeIds.size() == 0) { + LOG.warn("Attempted cleaning aborted transaction with empty writeId list"); Review comment: yep, good catch! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 497347) Time Spent: 8h 50m (was: 8h 40m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 8h 50m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497333=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497333 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 13:33 Start Date: 08/Oct/20 13:33 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501723853 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -232,6 +240,51 @@ public Object run() throws Exception { private static String idWatermark(CompactionInfo ci) { return " id=" + ci.id; } + + private void cleanAborted(CompactionInfo ci) throws MetaException { +if (ci.writeIds == null || ci.writeIds.size() == 0) { + LOG.warn("Attempted cleaning aborted transaction with empty writeId list"); Review comment: Shouldn't you mark the compaction failed or cleaned? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 497333) Time Spent: 8h 40m (was: 8.5h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 8h 40m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497332=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497332 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 13:32 Start Date: 08/Oct/20 13:32 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501723043 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -97,9 +100,9 @@ public void run() { long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner(); LOG.info("Cleaning based on min open txn id: " + minOpenTxnId); List cleanerList = new ArrayList<>(); - for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { + for (CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() -> -clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); + clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); Review comment: I agree, it can be addressed in a follow up Jira. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 497332) Time Spent: 8.5h (was: 8h 20m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 8.5h > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497328=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497328 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 13:25 Start Date: 08/Oct/20 13:25 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501718248 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java ## @@ -853,6 +857,273 @@ public void majorCompactAfterAbort() throws Exception { Lists.newArrayList(5, 6), 1); } + @Test + public void testCleanAbortCompactAfterAbortTwoPartitions() throws Exception { +String dbName = "default"; +String tblName = "cws"; + +HiveStreamingConnection connection1 = prepareTableTwoPartitionsAndConnection(dbName, tblName, 1); +HiveStreamingConnection connection2 = prepareTableTwoPartitionsAndConnection(dbName, tblName, 1); + +connection1.beginTransaction(); +connection1.write("1,1".getBytes()); +connection1.write("2,2".getBytes()); +connection1.abortTransaction(); + +connection2.beginTransaction(); +connection2.write("1,3".getBytes()); +connection2.write("2,3".getBytes()); +connection2.write("3,3".getBytes()); +connection2.abortTransaction(); + +assertAndCompactCleanAbort(dbName, tblName); + +connection1.close(); +connection2.close(); + } + + @Test + public void testCleanAbortCompactAfterAbort() throws Exception { +String dbName = "default"; +String tblName = "cws"; + +// Create three folders with two different transactions +HiveStreamingConnection connection1 = prepareTableAndConnection(dbName, tblName, 1); +HiveStreamingConnection connection2 = prepareTableAndConnection(dbName, tblName, 1); + +connection1.beginTransaction(); +connection1.write("1,1".getBytes()); +connection1.write("2,2".getBytes()); +connection1.abortTransaction(); + +connection2.beginTransaction(); +connection2.write("1,3".getBytes()); +connection2.write("2,3".getBytes()); +connection2.write("3,3".getBytes()); +connection2.abortTransaction(); + +assertAndCompactCleanAbort(dbName, tblName); + +connection1.close(); +connection2.close(); + } + + private void assertAndCompactCleanAbort(String dbName, String tblName) throws Exception { +IMetaStoreClient msClient = new HiveMetaStoreClient(conf); +TxnStore txnHandler = TxnUtils.getTxnStore(conf); +Table table = msClient.getTable(dbName, tblName); +FileSystem fs = FileSystem.get(conf); +FileStatus[] stat = +fs.listStatus(new Path(table.getSd().getLocation())); +if (3 != stat.length) { + Assert.fail("Expecting three directories corresponding to three partitions, FileStatus[] stat " + Arrays.toString(stat)); +} + +int count = TxnDbUtil.countQueryAgent(conf, "select count(*) from TXN_COMPONENTS where TC_OPERATION_TYPE='p'"); +// We should have two rows corresponding to the two aborted transactions +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from TXN_COMPONENTS"), 2, count); + +runInitiator(conf); +count = TxnDbUtil.countQueryAgent(conf, "select count(*) from COMPACTION_QUEUE where CQ_TYPE='p'"); +// Only one job is added to the queue per table. This job corresponds to all the entries for a particular table +// with rows in TXN_COMPONENTS +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from COMPACTION_QUEUE"), 1, count); + +ShowCompactResponse rsp = txnHandler.showCompact(new ShowCompactRequest()); +Assert.assertEquals(1, rsp.getCompacts().size()); +Assert.assertEquals(TxnStore.CLEANING_RESPONSE, rsp.getCompacts().get(0).getState()); +Assert.assertEquals("cws", rsp.getCompacts().get(0).getTablename()); +Assert.assertEquals(CompactionType.CLEAN_ABORTED, +rsp.getCompacts().get(0).getType()); + +runCleaner(conf); + +// After the cleaner runs TXN_COMPONENTS and COMPACTION_QUEUE should have zero rows, also the folders should have been deleted. +count = TxnDbUtil.countQueryAgent(conf, "select count(*) from TXN_COMPONENTS"); +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from TXN_COMPONENTS"), 0, count); + +count = TxnDbUtil.countQueryAgent(conf, "select count(*) from COMPACTION_QUEUE"); +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from COMPACTION_QUEUE"), 0, count); + +RemoteIterator it = +fs.listFiles(new Path(table.getSd().getLocation()), true); +if (it.hasNext()) { + Assert.fail("Expecting compaction to have cleaned the directories, FileStatus[] stat " + Arrays.toString(stat)); +} + +rsp = txnHandler.showCompact(new
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497326=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497326 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 13:21 Start Date: 08/Oct/20 13:21 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501715311 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java ## @@ -853,6 +857,273 @@ public void majorCompactAfterAbort() throws Exception { Lists.newArrayList(5, 6), 1); } + @Test + public void testCleanAbortCompactAfterAbortTwoPartitions() throws Exception { +String dbName = "default"; +String tblName = "cws"; + +HiveStreamingConnection connection1 = prepareTableTwoPartitionsAndConnection(dbName, tblName, 1); +HiveStreamingConnection connection2 = prepareTableTwoPartitionsAndConnection(dbName, tblName, 1); + +connection1.beginTransaction(); +connection1.write("1,1".getBytes()); +connection1.write("2,2".getBytes()); +connection1.abortTransaction(); + +connection2.beginTransaction(); +connection2.write("1,3".getBytes()); +connection2.write("2,3".getBytes()); +connection2.write("3,3".getBytes()); +connection2.abortTransaction(); + +assertAndCompactCleanAbort(dbName, tblName); + +connection1.close(); +connection2.close(); + } + + @Test + public void testCleanAbortCompactAfterAbort() throws Exception { +String dbName = "default"; +String tblName = "cws"; + +// Create three folders with two different transactions +HiveStreamingConnection connection1 = prepareTableAndConnection(dbName, tblName, 1); +HiveStreamingConnection connection2 = prepareTableAndConnection(dbName, tblName, 1); + +connection1.beginTransaction(); +connection1.write("1,1".getBytes()); +connection1.write("2,2".getBytes()); +connection1.abortTransaction(); + +connection2.beginTransaction(); +connection2.write("1,3".getBytes()); +connection2.write("2,3".getBytes()); +connection2.write("3,3".getBytes()); +connection2.abortTransaction(); + +assertAndCompactCleanAbort(dbName, tblName); + +connection1.close(); +connection2.close(); + } + + private void assertAndCompactCleanAbort(String dbName, String tblName) throws Exception { +IMetaStoreClient msClient = new HiveMetaStoreClient(conf); +TxnStore txnHandler = TxnUtils.getTxnStore(conf); +Table table = msClient.getTable(dbName, tblName); +FileSystem fs = FileSystem.get(conf); +FileStatus[] stat = +fs.listStatus(new Path(table.getSd().getLocation())); +if (3 != stat.length) { + Assert.fail("Expecting three directories corresponding to three partitions, FileStatus[] stat " + Arrays.toString(stat)); +} + +int count = TxnDbUtil.countQueryAgent(conf, "select count(*) from TXN_COMPONENTS where TC_OPERATION_TYPE='p'"); +// We should have two rows corresponding to the two aborted transactions +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from TXN_COMPONENTS"), 2, count); + +runInitiator(conf); +count = TxnDbUtil.countQueryAgent(conf, "select count(*) from COMPACTION_QUEUE where CQ_TYPE='p'"); +// Only one job is added to the queue per table. This job corresponds to all the entries for a particular table +// with rows in TXN_COMPONENTS +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from COMPACTION_QUEUE"), 1, count); + +ShowCompactResponse rsp = txnHandler.showCompact(new ShowCompactRequest()); +Assert.assertEquals(1, rsp.getCompacts().size()); +Assert.assertEquals(TxnStore.CLEANING_RESPONSE, rsp.getCompacts().get(0).getState()); +Assert.assertEquals("cws", rsp.getCompacts().get(0).getTablename()); +Assert.assertEquals(CompactionType.CLEAN_ABORTED, +rsp.getCompacts().get(0).getType()); + +runCleaner(conf); + +// After the cleaner runs TXN_COMPONENTS and COMPACTION_QUEUE should have zero rows, also the folders should have been deleted. +count = TxnDbUtil.countQueryAgent(conf, "select count(*) from TXN_COMPONENTS"); +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from TXN_COMPONENTS"), 0, count); + +count = TxnDbUtil.countQueryAgent(conf, "select count(*) from COMPACTION_QUEUE"); +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from COMPACTION_QUEUE"), 0, count); + +RemoteIterator it = +fs.listFiles(new Path(table.getSd().getLocation()), true); +if (it.hasNext()) { + Assert.fail("Expecting compaction to have cleaned the directories, FileStatus[] stat " + Arrays.toString(stat)); +} + +rsp = txnHandler.showCompact(new
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497325=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497325 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 13:19 Start Date: 08/Oct/20 13:19 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501713463 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java ## @@ -853,6 +857,273 @@ public void majorCompactAfterAbort() throws Exception { Lists.newArrayList(5, 6), 1); } + @Test + public void testCleanAbortCompactAfterAbortTwoPartitions() throws Exception { +String dbName = "default"; +String tblName = "cws"; + +HiveStreamingConnection connection1 = prepareTableTwoPartitionsAndConnection(dbName, tblName, 1); +HiveStreamingConnection connection2 = prepareTableTwoPartitionsAndConnection(dbName, tblName, 1); + +connection1.beginTransaction(); +connection1.write("1,1".getBytes()); +connection1.write("2,2".getBytes()); +connection1.abortTransaction(); + +connection2.beginTransaction(); +connection2.write("1,3".getBytes()); +connection2.write("2,3".getBytes()); +connection2.write("3,3".getBytes()); +connection2.abortTransaction(); + +assertAndCompactCleanAbort(dbName, tblName); + +connection1.close(); +connection2.close(); + } + + @Test + public void testCleanAbortCompactAfterAbort() throws Exception { +String dbName = "default"; +String tblName = "cws"; + +// Create three folders with two different transactions +HiveStreamingConnection connection1 = prepareTableAndConnection(dbName, tblName, 1); +HiveStreamingConnection connection2 = prepareTableAndConnection(dbName, tblName, 1); + +connection1.beginTransaction(); +connection1.write("1,1".getBytes()); +connection1.write("2,2".getBytes()); +connection1.abortTransaction(); + +connection2.beginTransaction(); +connection2.write("1,3".getBytes()); +connection2.write("2,3".getBytes()); +connection2.write("3,3".getBytes()); +connection2.abortTransaction(); + +assertAndCompactCleanAbort(dbName, tblName); + +connection1.close(); +connection2.close(); + } + + private void assertAndCompactCleanAbort(String dbName, String tblName) throws Exception { +IMetaStoreClient msClient = new HiveMetaStoreClient(conf); +TxnStore txnHandler = TxnUtils.getTxnStore(conf); +Table table = msClient.getTable(dbName, tblName); +FileSystem fs = FileSystem.get(conf); +FileStatus[] stat = +fs.listStatus(new Path(table.getSd().getLocation())); +if (3 != stat.length) { + Assert.fail("Expecting three directories corresponding to three partitions, FileStatus[] stat " + Arrays.toString(stat)); +} + +int count = TxnDbUtil.countQueryAgent(conf, "select count(*) from TXN_COMPONENTS where TC_OPERATION_TYPE='p'"); +// We should have two rows corresponding to the two aborted transactions +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from TXN_COMPONENTS"), 2, count); + +runInitiator(conf); +count = TxnDbUtil.countQueryAgent(conf, "select count(*) from COMPACTION_QUEUE where CQ_TYPE='p'"); +// Only one job is added to the queue per table. This job corresponds to all the entries for a particular table +// with rows in TXN_COMPONENTS +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from COMPACTION_QUEUE"), 1, count); + +ShowCompactResponse rsp = txnHandler.showCompact(new ShowCompactRequest()); +Assert.assertEquals(1, rsp.getCompacts().size()); +Assert.assertEquals(TxnStore.CLEANING_RESPONSE, rsp.getCompacts().get(0).getState()); +Assert.assertEquals("cws", rsp.getCompacts().get(0).getTablename()); +Assert.assertEquals(CompactionType.CLEAN_ABORTED, +rsp.getCompacts().get(0).getType()); + +runCleaner(conf); + +// After the cleaner runs TXN_COMPONENTS and COMPACTION_QUEUE should have zero rows, also the folders should have been deleted. +count = TxnDbUtil.countQueryAgent(conf, "select count(*) from TXN_COMPONENTS"); +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from TXN_COMPONENTS"), 0, count); + +count = TxnDbUtil.countQueryAgent(conf, "select count(*) from COMPACTION_QUEUE"); +Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from COMPACTION_QUEUE"), 0, count); + +RemoteIterator it = +fs.listFiles(new Path(table.getSd().getLocation()), true); +if (it.hasNext()) { + Assert.fail("Expecting compaction to have cleaned the directories, FileStatus[] stat " + Arrays.toString(stat)); Review comment: I think this assert is quit misleading. I
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497306=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497306 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 12:49 Start Date: 08/Oct/20 12:49 Worklog Time Spent: 10m Work Description: deniskuzZ commented on pull request #1548: URL: https://github.com/apache/hive/pull/1548#issuecomment-705545322 looks like master is broken right now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 497306) Time Spent: 7h 50m (was: 7h 40m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 7h 50m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497260=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497260 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 11:47 Start Date: 08/Oct/20 11:47 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501646861 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -97,9 +100,9 @@ public void run() { long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner(); LOG.info("Cleaning based on min open txn id: " + minOpenTxnId); List cleanerList = new ArrayList<>(); - for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { + for (CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() -> -clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); + clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); Review comment: 1. In original patch Map tableLock = new ConcurrentHashMap<>() was used to prevent a concurrent p-clean (where the whole table will be scanned). I think, that is resolved by grouping p-cleans and recording list of writeIds that needs to be removed: https://github.com/apache/hive/pull/1548/files#diff-9cf3ae764b7a33b568a984d695aff837R328 @vpnvishv is that correct? Also we do not allow concurrent Cleaners, their execution is mutexed. 2. was related to the following issue based on Map tableLock = new ConcurrentHashMap<>() design: "Suppose you have p-type clean on table T that is running (i.e. has the Write lock) and you have 30 different partition clean requests (in T). The 30 per partition cleans will get blocked but they will tie up every thread in the pool while they are blocked, right? If so, no other clean (on any other table) will actually make progress until the p-type on T is done." I think, it's not valid now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 497260) Time Spent: 7h 20m (was: 7h 10m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 7h 20m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497257=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497257 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 11:45 Start Date: 08/Oct/20 11:45 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501646861 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -97,9 +100,9 @@ public void run() { long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner(); LOG.info("Cleaning based on min open txn id: " + minOpenTxnId); List cleanerList = new ArrayList<>(); - for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { + for (CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() -> -clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); + clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); Review comment: 1. In original patch Map tableLock = new ConcurrentHashMap<>() was used to prevent a concurrent p-clean (where the whole table will be scanned). I think, that is resolved by grouping p-cleans and recording list of writeIds that needs to be removed. @vpnvishv is that correct? Also we do not allow concurrent Cleaners, their execution is mutexed. 2. was related to the following issue based on Map tableLock = new ConcurrentHashMap<>() design: "Suppose you have p-type clean on table T that is running (i.e. has the Write lock) and you have 30 different partition clean requests (in T). The 30 per partition cleans will get blocked but they will tie up every thread in the pool while they are blocked, right? If so, no other clean (on any other table) will actually make progress until the p-type on T is done." I think, it's not valid now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 497257) Time Spent: 7h 10m (was: 7h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 7h 10m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497254=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497254 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 11:43 Start Date: 08/Oct/20 11:43 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501646861 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -97,9 +100,9 @@ public void run() { long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner(); LOG.info("Cleaning based on min open txn id: " + minOpenTxnId); List cleanerList = new ArrayList<>(); - for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { + for (CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() -> -clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); + clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); Review comment: In original patch Map tableLock = new ConcurrentHashMap<>() was used to prevent a concurrent p-clean (where the whole table will be scanned). I think, that is resolved by grouping p-cleans and recording list of writeIds that needs to be removed. @vpnvishv is that correct? Also we do not allow concurrent Cleaners, their execution is mutexed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 497254) Time Spent: 7h (was: 6h 50m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 7h > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497251=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497251 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 11:29 Start Date: 08/Oct/20 11:29 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501646861 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -97,9 +100,9 @@ public void run() { long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner(); LOG.info("Cleaning based on min open txn id: " + minOpenTxnId); List cleanerList = new ArrayList<>(); - for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { + for (CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() -> -clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); + clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); Review comment: In original patch Map tableLock = new ConcurrentHashMap<>() was used to prevent a concurrent p-clean (where the whole table will be scanned). I think, that is resolved by grouping p-cleans and recording list of writeIds that needs to be removed. @vpnvishv is that correct? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 497251) Time Spent: 6h 50m (was: 6h 40m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 6h 50m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497264=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497264 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 11:53 Start Date: 08/Oct/20 11:53 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501646861 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -97,9 +100,9 @@ public void run() { long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner(); LOG.info("Cleaning based on min open txn id: " + minOpenTxnId); List cleanerList = new ArrayList<>(); - for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { + for (CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() -> -clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); + clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); Review comment: 1. In original patch Map tableLock = new ConcurrentHashMap<>() was used to prevent a concurrent p-clean (where the whole table will be scanned). I think, that is resolved by grouping p-cleans and recording list of writeIds that needs to be removed: https://github.com/apache/hive/pull/1548/files#diff-9cf3ae764b7a33b568a984d695aff837R328 @vpnvishv is that correct? Also we do not allow concurrent Cleaners, their execution is mutexed. 2. was related to the following issue based on Map tableLock = new ConcurrentHashMap<>() design: "Suppose you have p-type clean on table T that is running (i.e. has the Write lock) and you have 30 different partition clean requests (in T). The 30 per partition cleans will get blocked but they will tie up every thread in the pool while they are blocked, right? If so, no other clean (on any other table) will actually make progress until the p-type on T is done." Yes, it's still the case that we'll have to wait for all tasks to complete and if there is one long-running task, we won't be able to submit new ones. However not sure if it's a critical issue. I think, we can address it in a separate jira. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 497264) Time Spent: 7h 40m (was: 7.5h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 7h 40m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497262=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497262 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 11:52 Start Date: 08/Oct/20 11:52 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r501646861 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -97,9 +100,9 @@ public void run() { long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner(); LOG.info("Cleaning based on min open txn id: " + minOpenTxnId); List cleanerList = new ArrayList<>(); - for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { + for (CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() -> -clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); + clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); Review comment: 1. In original patch Map tableLock = new ConcurrentHashMap<>() was used to prevent a concurrent p-clean (where the whole table will be scanned). I think, that is resolved by grouping p-cleans and recording list of writeIds that needs to be removed: https://github.com/apache/hive/pull/1548/files#diff-9cf3ae764b7a33b568a984d695aff837R328 @vpnvishv is that correct? Also we do not allow concurrent Cleaners, their execution is mutexed. 2. was related to the following issue based on Map tableLock = new ConcurrentHashMap<>() design: "Suppose you have p-type clean on table T that is running (i.e. has the Write lock) and you have 30 different partition clean requests (in T). The 30 per partition cleans will get blocked but they will tie up every thread in the pool while they are blocked, right? If so, no other clean (on any other table) will actually make progress until the p-type on T is done." Yes, it's still the case that we'll have to wait for all tasks to complete and if there is one slow task, we won't be able to submit new tasks. However not sure if it's a critical issue. I think, we can address it in a separate jira. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 497262) Time Spent: 7.5h (was: 7h 20m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 7.5h > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497224=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497224 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 08/Oct/20 10:39 Start Date: 08/Oct/20 10:39 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499754423 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) { tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES); } + /** + * Look for delta directories matching the list of writeIds and deletes them. + * @param rootPartition root partition to look for the delta directories + * @param conf configuration + * @param writeIds list of writeIds to look for in the delta directories + * @return list of deleted directories. + * @throws IOException + */ + public static List deleteDeltaDirectories(Path rootPartition, Configuration conf, Set writeIds) + throws IOException { +FileSystem fs = rootPartition.getFileSystem(conf); + +PathFilter filter = (p) -> { + String name = p.getName(); + for (Long wId : writeIds) { +if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) { Review comment: changed, included delete_delta as well This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 497224) Time Spent: 6h 40m (was: 6.5h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 6h 40m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=496724=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-496724 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 07/Oct/20 16:17 Start Date: 07/Oct/20 16:17 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1415: URL: https://github.com/apache/hive/pull/1415#discussion_r501141180 ## File path: standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -386,15 +427,27 @@ public void markCleaned(CompactionInfo info) throws MetaException { pStmt.setLong(paramCount++, info.highestWriteId); } LOG.debug("Going to execute update <" + s + ">"); -if (pStmt.executeUpdate() < 1) { - LOG.error("Expected to remove at least one row from completed_txn_components when " + -"marking compaction entry as clean!"); +if ((updCount = pStmt.executeUpdate()) < 1) { + // In the case of clean abort commit hasn't happened so completed_txn_components hasn't been filled + if (!info.isCleanAbortedCompaction()) { +LOG.error( +"Expected to remove at least one row from completed_txn_components when " ++ "marking compaction entry as clean!"); + } } s = "select distinct txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and txn_state = '" + TXN_ABORTED + "' and tc_database = ? and tc_table = ?"; if (info.highestWriteId != 0) s += " and tc_writeid <= ?"; if (info.partName != null) s += " and tc_partition = ?"; +if (info.writeIds != null && info.writeIds.size() > 0) { + String[] wriStr = new String[info.writeIds.size()]; + int i = 0; + for (Long writeId: writeIds) { +wriStr[i++] = writeId.toString(); + } + s += " and tc_writeid in (" + String.join(",", wriStr) + ")"; Review comment: is this even used, statement was already compiled? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 496724) Time Spent: 6.5h (was: 6h 20m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 6.5h > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=496087=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-496087 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 06/Oct/20 18:15 Start Date: 06/Oct/20 18:15 Worklog Time Spent: 10m Work Description: deniskuzZ edited a comment on pull request #1548: URL: https://github.com/apache/hive/pull/1548#issuecomment-704456432 > @deniskuzZ Yes.. Sorry I missed those while porting changes from my private branch to open source [Hive-3](https://issues.apache.org/jira/browse/HIVE-3). > I think you have already covered it now. Still if you want to refer I have Updated [Hive-3](https://issues.apache.org/jira/browse/HIVE-3) PR with those tests, [4596b50](https://github.com/apache/hive/commit/4596b50e0f4e00c363a744bad13a8c750330d9a0) Thanks @vpnvishv, I'll check them. Also noticed that new test property is not actually configured + delete_delta isn't cleaned up. Fixed in master patch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 496087) Time Spent: 6h 20m (was: 6h 10m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 6h 20m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=496086=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-496086 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 06/Oct/20 18:13 Start Date: 06/Oct/20 18:13 Worklog Time Spent: 10m Work Description: deniskuzZ commented on pull request #1548: URL: https://github.com/apache/hive/pull/1548#issuecomment-704456432 > @deniskuzZ Yes.. Sorry I missed those while porting changes from my private branch to open source [Hive-3](https://issues.apache.org/jira/browse/HIVE-3). > I think you have already covered it now. Still if you want to refer I have Updated [Hive-3](https://issues.apache.org/jira/browse/HIVE-3) PR with those tests, [4596b50](https://github.com/apache/hive/commit/4596b50e0f4e00c363a744bad13a8c750330d9a0) Thanks @vpnvishv, I'll check them. Also noticed that new test property is not actually configured. Fixed in master patch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 496086) Time Spent: 6h 10m (was: 6h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 6h 10m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=496081=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-496081 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 06/Oct/20 18:08 Start Date: 06/Oct/20 18:08 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1415: URL: https://github.com/apache/hive/pull/1415#discussion_r500497046 ## File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java ## @@ -2503,6 +2504,214 @@ public void testFullACIDAbortWithManyPartitions() throws Exception { List rs = runStatementOnDriver("select a,b from " + Table.ACIDTBLPART + " order by a"); Assert.assertEquals(stringifyValues(resultData2), rs); } + + @Test + public void testInsertIntoDPWithAborts() throws Exception { +d.destroy(); +hiveConf.setVar(HiveConf.ConfVars.DYNAMICPARTITIONINGMODE, "nonstrict"); +hiveConf.setIntVar(HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD, 0); + hiveConf.setTimeVar(HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD, 0, TimeUnit.MILLISECONDS); +int [][] resultData = new int[][] {{1,1}, {2,2}}; +runStatementOnDriver("insert into " + Table.ACIDTBLPART + " partition(p) values(1,1,'p1'),(2,2,'p1')"); +verifyDeltaDirAndResult(1, Table.ACIDTBLPART.toString(), "p=p1", resultData); + +// forcing a txn to abort before addDynamicPartitions + hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEFAILLOADDYNAMICPARTITION, true); Review comment: What does this property do? Can't find any logic connected to it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 496081) Time Spent: 6h (was: 5h 50m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 6h > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=496037=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-496037 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 06/Oct/20 17:14 Start Date: 06/Oct/20 17:14 Worklog Time Spent: 10m Work Description: vpnvishv commented on pull request #1548: URL: https://github.com/apache/hive/pull/1548#issuecomment-704423354 @deniskuzZ Yes.. Sorry I missed those while porting changes from my private branch to open source Hive-3. I think you have already covered it now. Still if you want to refer I have Updated Hive-3 PR with those tests, https://github.com/apache/hive/pull/1415/commits/4596b50e0f4e00c363a744bad13a8c750330d9a0 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 496037) Time Spent: 5h 50m (was: 5h 40m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 5h 50m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495882=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495882 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 06/Oct/20 12:51 Start Date: 06/Oct/20 12:51 Worklog Time Spent: 10m Work Description: deniskuzZ commented on pull request #1548: URL: https://github.com/apache/hive/pull/1548#issuecomment-704246954 @vpnvishv I can't find any tests that would actually cover dynamic part abort. HIVETESTMODEFAILLOADDYNAMICPARTITION is not used anywhere. Am I missing something here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495882) Time Spent: 5h 40m (was: 5.5h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 5h 40m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495528=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495528 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 19:39 Start Date: 05/Oct/20 19:39 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499827644 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java ## @@ -237,6 +237,7 @@ void run(HiveConf conf, String jobName, Table t, Partition p, StorageDescriptor } JobConf job = createBaseJobConf(conf, jobName, t, sd, writeIds, ci); +QueryCompactor.Util.removeAbortedDirsForAcidTable(conf, dir); Review comment: removed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495528) Time Spent: 5.5h (was: 5h 20m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 5.5h > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495463=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495463 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 17:21 Start Date: 05/Oct/20 17:21 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499754946 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) { tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES); } + /** + * Look for delta directories matching the list of writeIds and deletes them. + * @param rootPartition root partition to look for the delta directories + * @param conf configuration + * @param writeIds list of writeIds to look for in the delta directories + * @return list of deleted directories. + * @throws IOException + */ + public static List deleteDeltaDirectories(Path rootPartition, Configuration conf, Set writeIds) + throws IOException { +FileSystem fs = rootPartition.getFileSystem(conf); + +PathFilter filter = (p) -> { + String name = p.getName(); + for (Long wId : writeIds) { +if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) { + return true; +} else if (name.startsWith(baseDir(wId)) && !name.contains("=")) { + return true; +} + } + return false; +}; +List deleted = new ArrayList<>(); +deleteDeltaDirectoriesAux(rootPartition, fs, filter, deleted); +return deleted; + } + + private static void deleteDeltaDirectoriesAux(Path root, FileSystem fs, PathFilter filter, List deleted) + throws IOException { +RemoteIterator it = listIterator(fs, root, null); + +while (it.hasNext()) { + FileStatus fStatus = it.next(); + if (fStatus.isDirectory()) { +if (filter.accept(fStatus.getPath())) { + fs.delete(fStatus.getPath(), true); + deleted.add(fStatus); +} else { + deleteDeltaDirectoriesAux(fStatus.getPath(), fs, filter, deleted); + if (isDirectoryEmpty(fs, fStatus.getPath())) { +fs.delete(fStatus.getPath(), false); +deleted.add(fStatus); + } +} + } +} + } + + private static boolean isDirectoryEmpty(FileSystem fs, Path path) throws IOException { +RemoteIterator it = listIterator(fs, path, null); +return !it.hasNext(); + } + + private static RemoteIterator listIterator(FileSystem fs, Path path, PathFilter filter) + throws IOException { +try { + return new ToFileStatusIterator(SHIMS.listLocatedHdfsStatusIterator(fs, path, filter)); +} catch (Throwable t) { Review comment: removed it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495463) Time Spent: 5h 20m (was: 5h 10m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 5h 20m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that >
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495460 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 17:20 Start Date: 05/Oct/20 17:20 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499754142 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) { tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES); } + /** + * Look for delta directories matching the list of writeIds and deletes them. + * @param rootPartition root partition to look for the delta directories + * @param conf configuration + * @param writeIds list of writeIds to look for in the delta directories + * @return list of deleted directories. + * @throws IOException + */ + public static List deleteDeltaDirectories(Path rootPartition, Configuration conf, Set writeIds) + throws IOException { +FileSystem fs = rootPartition.getFileSystem(conf); + +PathFilter filter = (p) -> { + String name = p.getName(); + for (Long wId : writeIds) { +if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) { + return true; +} else if (name.startsWith(baseDir(wId)) && !name.contains("=")) { + return true; +} + } + return false; +}; +List deleted = new ArrayList<>(); +deleteDeltaDirectoriesAux(rootPartition, fs, filter, deleted); +return deleted; + } + + private static void deleteDeltaDirectoriesAux(Path root, FileSystem fs, PathFilter filter, List deleted) Review comment: changed to use getHdfsDirSnapshots, @pvargacl do you know. if i should access cached data somehow? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495460) Time Spent: 5h (was: 4h 50m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 5h > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495461 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 17:20 Start Date: 05/Oct/20 17:20 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499754423 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) { tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES); } + /** + * Look for delta directories matching the list of writeIds and deletes them. + * @param rootPartition root partition to look for the delta directories + * @param conf configuration + * @param writeIds list of writeIds to look for in the delta directories + * @return list of deleted directories. + * @throws IOException + */ + public static List deleteDeltaDirectories(Path rootPartition, Configuration conf, Set writeIds) + throws IOException { +FileSystem fs = rootPartition.getFileSystem(conf); + +PathFilter filter = (p) -> { + String name = p.getName(); + for (Long wId : writeIds) { +if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) { Review comment: changed, also excluded base directory from listing This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495461) Time Spent: 5h 10m (was: 5h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 5h 10m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495459=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495459 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 17:12 Start Date: 05/Oct/20 17:12 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499749748 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) { tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES); } + /** + * Look for delta directories matching the list of writeIds and deletes them. + * @param rootPartition root partition to look for the delta directories + * @param conf configuration + * @param writeIds list of writeIds to look for in the delta directories + * @return list of deleted directories. + * @throws IOException + */ + public static List deleteDeltaDirectories(Path rootPartition, Configuration conf, Set writeIds) + throws IOException { +FileSystem fs = rootPartition.getFileSystem(conf); + +PathFilter filter = (p) -> { + String name = p.getName(); + for (Long wId : writeIds) { +if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) { + return true; +} else if (name.startsWith(baseDir(wId)) && !name.contains("=")) { + return true; +} + } + return false; +}; +List deleted = new ArrayList<>(); +deleteDeltaDirectoriesAux(rootPartition, fs, filter, deleted); +return deleted; + } + + private static void deleteDeltaDirectoriesAux(Path root, FileSystem fs, PathFilter filter, List deleted) + throws IOException { +RemoteIterator it = listIterator(fs, root, null); + +while (it.hasNext()) { + FileStatus fStatus = it.next(); + if (fStatus.isDirectory()) { +if (filter.accept(fStatus.getPath())) { + fs.delete(fStatus.getPath(), true); + deleted.add(fStatus); +} else { + deleteDeltaDirectoriesAux(fStatus.getPath(), fs, filter, deleted); + if (isDirectoryEmpty(fs, fStatus.getPath())) { Review comment: + partitions are not removed in HMS This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495459) Time Spent: 4h 50m (was: 4h 40m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 4h 50m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495457=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495457 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 17:08 Start Date: 05/Oct/20 17:08 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499747334 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -107,11 +107,12 @@ public CompactionTxnHandler() { // Check for aborted txns: number of aborted txns past threshold and age of aborted txns // past time threshold boolean checkAbortedTimeThreshold = abortedTimeThreshold >= 0; -final String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\"," -+ "MIN(\"TXN_STARTED\"), COUNT(*)" +String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\", " ++ "MIN(\"TXN_STARTED\"), COUNT(*), " ++ "MAX(CASE WHEN \"TC_OPERATION_TYPE\" = " + OperationType.DYNPART + " THEN 1 ELSE 0 END) AS \"IS_DP\" " Review comment: why is that? aborted dynPart is just a special case that would be handled separately (IS_DP=1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495457) Time Spent: 4h 40m (was: 4.5h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 4h 40m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495383=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495383 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 15:10 Start Date: 05/Oct/20 15:10 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499671848 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) { tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES); } + /** + * Look for delta directories matching the list of writeIds and deletes them. + * @param rootPartition root partition to look for the delta directories + * @param conf configuration + * @param writeIds list of writeIds to look for in the delta directories + * @return list of deleted directories. + * @throws IOException + */ + public static List deleteDeltaDirectories(Path rootPartition, Configuration conf, Set writeIds) + throws IOException { +FileSystem fs = rootPartition.getFileSystem(conf); + +PathFilter filter = (p) -> { + String name = p.getName(); + for (Long wId : writeIds) { +if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) { Review comment: You are right, I got confused, the p entry will solve this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495383) Time Spent: 4.5h (was: 4h 20m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495369 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 14:49 Start Date: 05/Oct/20 14:49 Worklog Time Spent: 10m Work Description: vpnvishv commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499656397 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) { tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES); } + /** + * Look for delta directories matching the list of writeIds and deletes them. + * @param rootPartition root partition to look for the delta directories + * @param conf configuration + * @param writeIds list of writeIds to look for in the delta directories + * @return list of deleted directories. + * @throws IOException + */ + public static List deleteDeltaDirectories(Path rootPartition, Configuration conf, Set writeIds) + throws IOException { +FileSystem fs = rootPartition.getFileSystem(conf); + +PathFilter filter = (p) -> { + String name = p.getName(); + for (Long wId : writeIds) { +if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) { Review comment: I was also wondering the same, as this code was there in the earlier patches so I have just kept it. We can remove this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495369) Time Spent: 4h 20m (was: 4h 10m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 4h 20m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495368=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495368 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 14:47 Start Date: 05/Oct/20 14:47 Worklog Time Spent: 10m Work Description: vpnvishv commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499655306 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) { tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES); } + /** + * Look for delta directories matching the list of writeIds and deletes them. + * @param rootPartition root partition to look for the delta directories + * @param conf configuration + * @param writeIds list of writeIds to look for in the delta directories + * @return list of deleted directories. + * @throws IOException + */ + public static List deleteDeltaDirectories(Path rootPartition, Configuration conf, Set writeIds) + throws IOException { +FileSystem fs = rootPartition.getFileSystem(conf); + +PathFilter filter = (p) -> { + String name = p.getName(); + for (Long wId : writeIds) { +if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) { Review comment: @pvargacl Sorry I may be missing something here, but with this change, how can compactor read the data of an aborted delta. It should be in the aborted list right, due to this dummy p type entry in TXN_COMPONENTS? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495368) Time Spent: 4h 10m (was: 4h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 4h 10m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495366=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495366 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 14:40 Start Date: 05/Oct/20 14:40 Worklog Time Spent: 10m Work Description: vpnvishv commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499649919 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java ## @@ -237,6 +237,7 @@ void run(HiveConf conf, String jobName, Table t, Partition p, StorageDescriptor } JobConf job = createBaseJobConf(conf, jobName, t, sd, writeIds, ci); +QueryCompactor.Util.removeAbortedDirsForAcidTable(conf, dir); Review comment: @pvargacl You are right, this is not required, as now compactor run in a transaction and the cleaner has validTxnList with aborted bits set. This we have added wrt to Hive-3, in which cleaner doesn't have aborted bits set, as we create validWriteIdList for cleaner based on the highestWriteId. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495366) Time Spent: 4h (was: 3h 50m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 4h > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495358=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495358 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 14:27 Start Date: 05/Oct/20 14:27 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499639831 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) { tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES); } + /** + * Look for delta directories matching the list of writeIds and deletes them. + * @param rootPartition root partition to look for the delta directories + * @param conf configuration + * @param writeIds list of writeIds to look for in the delta directories + * @return list of deleted directories. + * @throws IOException + */ + public static List deleteDeltaDirectories(Path rootPartition, Configuration conf, Set writeIds) + throws IOException { +FileSystem fs = rootPartition.getFileSystem(conf); + +PathFilter filter = (p) -> { + String name = p.getName(); + for (Long wId : writeIds) { +if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) { + return true; +} else if (name.startsWith(baseDir(wId)) && !name.contains("=")) { + return true; +} + } + return false; +}; +List deleted = new ArrayList<>(); +deleteDeltaDirectoriesAux(rootPartition, fs, filter, deleted); +return deleted; + } + + private static void deleteDeltaDirectoriesAux(Path root, FileSystem fs, PathFilter filter, List deleted) + throws IOException { +RemoteIterator it = listIterator(fs, root, null); + +while (it.hasNext()) { + FileStatus fStatus = it.next(); + if (fStatus.isDirectory()) { +if (filter.accept(fStatus.getPath())) { + fs.delete(fStatus.getPath(), true); + deleted.add(fStatus); +} else { + deleteDeltaDirectoriesAux(fStatus.getPath(), fs, filter, deleted); + if (isDirectoryEmpty(fs, fStatus.getPath())) { Review comment: agree, that would simplify re-use of getHdfsDirSnapshots This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495358) Time Spent: 3h 50m (was: 3h 40m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 3h 50m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495356=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495356 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 14:25 Start Date: 05/Oct/20 14:25 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499623864 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) { tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES); } + /** + * Look for delta directories matching the list of writeIds and deletes them. + * @param rootPartition root partition to look for the delta directories + * @param conf configuration + * @param writeIds list of writeIds to look for in the delta directories + * @return list of deleted directories. + * @throws IOException + */ + public static List deleteDeltaDirectories(Path rootPartition, Configuration conf, Set writeIds) + throws IOException { +FileSystem fs = rootPartition.getFileSystem(conf); + +PathFilter filter = (p) -> { + String name = p.getName(); + for (Long wId : writeIds) { +if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) { Review comment: Why would it read the aborted data as valid if txn is in still in aborted state? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495356) Time Spent: 3h 40m (was: 3.5h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 3h 40m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495341=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495341 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 14:05 Start Date: 05/Oct/20 14:05 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499623864 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) { tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES); } + /** + * Look for delta directories matching the list of writeIds and deletes them. + * @param rootPartition root partition to look for the delta directories + * @param conf configuration + * @param writeIds list of writeIds to look for in the delta directories + * @return list of deleted directories. + * @throws IOException + */ + public static List deleteDeltaDirectories(Path rootPartition, Configuration conf, Set writeIds) + throws IOException { +FileSystem fs = rootPartition.getFileSystem(conf); + +PathFilter filter = (p) -> { + String name = p.getName(); + for (Long wId : writeIds) { +if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) { Review comment: Why would it read the aborted data as valid if txn is in aborted state? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495341) Time Spent: 3.5h (was: 3h 20m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 3.5h > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495338=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495338 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 14:00 Start Date: 05/Oct/20 14:00 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499620518 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) { tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES); } + /** + * Look for delta directories matching the list of writeIds and deletes them. + * @param rootPartition root partition to look for the delta directories + * @param conf configuration + * @param writeIds list of writeIds to look for in the delta directories + * @return list of deleted directories. + * @throws IOException + */ + public static List deleteDeltaDirectories(Path rootPartition, Configuration conf, Set writeIds) + throws IOException { +FileSystem fs = rootPartition.getFileSystem(conf); + +PathFilter filter = (p) -> { + String name = p.getName(); + for (Long wId : writeIds) { +if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) { + return true; +} else if (name.startsWith(baseDir(wId)) && !name.contains("=")) { + return true; +} + } + return false; +}; +List deleted = new ArrayList<>(); +deleteDeltaDirectoriesAux(rootPartition, fs, filter, deleted); +return deleted; + } + + private static void deleteDeltaDirectoriesAux(Path root, FileSystem fs, PathFilter filter, List deleted) Review comment: getHdfsDirSnapshots does the same recursive listing, isn't it? ``` RemoteIterator itr = fs.listFiles(path, true); while (itr.hasNext()) { ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495338) Time Spent: 3h 20m (was: 3h 10m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 3h 20m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495288=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495288 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 11:59 Start Date: 05/Oct/20 11:59 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499545127 ## File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java ## @@ -2128,6 +2128,395 @@ public void testCleanerForTxnToWriteId() throws Exception { 0, TxnDbUtil.countQueryAgent(hiveConf, "select count(*) from TXN_TO_WRITE_ID")); } + @Test +public void testMmTableAbortWithCompaction() throws Exception { +// 1. Insert some rows into MM table +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(1,2)"); +// There should be 1 delta directory +int [][] resultData1 = new int[][] {{1,2}}; +verifyDeltaDirAndResult(1, Table.MMTBL.toString(), "", resultData1); +List r1 = runStatementOnDriver("select count(*) from " + Table.MMTBL); +Assert.assertEquals("1", r1.get(0)); + +// 2. Let a transaction be aborted +hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, true); +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(3,4)"); +hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, false); +// There should be 1 delta and 1 base directory. The base one is the aborted one. +verifyDeltaDirAndResult(2, Table.MMTBL.toString(), "", resultData1); + +r1 = runStatementOnDriver("select count(*) from " + Table.MMTBL); +Assert.assertEquals("1", r1.get(0)); + +// Verify query result +int [][] resultData2 = new int[][] {{1,2}, {5,6}}; + +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(5,6)"); +verifyDeltaDirAndResult(3, Table.MMTBL.toString(), "", resultData2); +r1 = runStatementOnDriver("select count(*) from " + Table.MMTBL); +Assert.assertEquals("2", r1.get(0)); + +// 4. Perform a MINOR compaction, expectation is it should remove aborted base dir +runStatementOnDriver("alter table "+ Table.MMTBL + " compact 'MINOR'"); +// The worker should remove the subdir for aborted transaction +runWorker(hiveConf); +verifyDeltaDirAndResult(2, Table.MMTBL.toString(), "", resultData2); +verifyBaseDirAndResult(0, Table.MMTBL.toString(), "", resultData2); +// 5. Run Cleaner. Shouldn't impact anything. +runCleaner(hiveConf); +// 6. Run initiator remove aborted entry from TXNS table +runInitiator(hiveConf); + +// Verify query result +List rs = runStatementOnDriver("select a,b from " + Table.MMTBL + " order by a"); +Assert.assertEquals(stringifyValues(resultData2), rs); + +int [][] resultData3 = new int[][] {{1,2}, {5,6}, {7,8}}; +// 7. add few more rows +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(7,8)"); +// 8. add one more aborted delta +hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, true); +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(9,10)"); +hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, false); + +// 9. Perform a MAJOR compaction, expectation is it should remove aborted base dir +runStatementOnDriver("alter table "+ Table.MMTBL + " compact 'MAJOR'"); +verifyDeltaDirAndResult(4, Table.MMTBL.toString(), "", resultData3); +runWorker(hiveConf); +verifyDeltaDirAndResult(3, Table.MMTBL.toString(), "", resultData3); +verifyBaseDirAndResult(1, Table.MMTBL.toString(), "", resultData3); +runCleaner(hiveConf); +verifyDeltaDirAndResult(0, Table.MMTBL.toString(), "", resultData3); +verifyBaseDirAndResult(1, Table.MMTBL.toString(), "", resultData3); +runInitiator(hiveConf); +verifyDeltaDirAndResult(0, Table.MMTBL.toString(), "", resultData3); +verifyBaseDirAndResult(1, Table.MMTBL.toString(), "", resultData3); + +// Verify query result +rs = runStatementOnDriver("select a,b from " + Table.MMTBL + " order by a"); +Assert.assertEquals(stringifyValues(resultData3), rs); + } + @Test + public void testMmTableAbortWithCompactionNoCleanup() throws Exception { +// 1. Insert some rows into MM table +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(1,2)"); +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(5,6)"); +// There should be 1 delta directory +int [][] resultData1 = new int[][] {{1,2}, {5,6}}; +verifyDeltaDirAndResult(2, Table.MMTBL.toString(), "", resultData1); +List r1 = runStatementOnDriver("select count(*) from " + Table.MMTBL); +Assert.assertEquals("2", r1.get(0)); + +// 2. Let a transaction be aborted +
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495289=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495289 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 11:59 Start Date: 05/Oct/20 11:59 Worklog Time Spent: 10m Work Description: deniskuzZ commented on pull request #1548: URL: https://github.com/apache/hive/pull/1548#issuecomment-703585092 > @deniskuzZ Overall change LGTM. > Looked into the test failures, one of test requires change in expected values wrt master branch. Other two looks genuine failures to me. Please check the inline comments. @vpnvishv, thank you for the review! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495289) Time Spent: 3h 10m (was: 3h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 3h 10m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495287=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495287 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 11:58 Start Date: 05/Oct/20 11:58 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499544579 ## File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java ## @@ -2128,6 +2128,395 @@ public void testCleanerForTxnToWriteId() throws Exception { 0, TxnDbUtil.countQueryAgent(hiveConf, "select count(*) from TXN_TO_WRITE_ID")); } + @Test +public void testMmTableAbortWithCompaction() throws Exception { +// 1. Insert some rows into MM table +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(1,2)"); +// There should be 1 delta directory +int [][] resultData1 = new int[][] {{1,2}}; +verifyDeltaDirAndResult(1, Table.MMTBL.toString(), "", resultData1); +List r1 = runStatementOnDriver("select count(*) from " + Table.MMTBL); +Assert.assertEquals("1", r1.get(0)); + +// 2. Let a transaction be aborted +hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, true); +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(3,4)"); +hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, false); +// There should be 1 delta and 1 base directory. The base one is the aborted one. +verifyDeltaDirAndResult(2, Table.MMTBL.toString(), "", resultData1); + +r1 = runStatementOnDriver("select count(*) from " + Table.MMTBL); +Assert.assertEquals("1", r1.get(0)); + +// Verify query result +int [][] resultData2 = new int[][] {{1,2}, {5,6}}; + +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(5,6)"); +verifyDeltaDirAndResult(3, Table.MMTBL.toString(), "", resultData2); +r1 = runStatementOnDriver("select count(*) from " + Table.MMTBL); +Assert.assertEquals("2", r1.get(0)); Review comment: fixed, turned off StatsOptimizer ## File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java ## @@ -2128,6 +2128,395 @@ public void testCleanerForTxnToWriteId() throws Exception { 0, TxnDbUtil.countQueryAgent(hiveConf, "select count(*) from TXN_TO_WRITE_ID")); } + @Test +public void testMmTableAbortWithCompaction() throws Exception { +// 1. Insert some rows into MM table +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(1,2)"); +// There should be 1 delta directory +int [][] resultData1 = new int[][] {{1,2}}; +verifyDeltaDirAndResult(1, Table.MMTBL.toString(), "", resultData1); +List r1 = runStatementOnDriver("select count(*) from " + Table.MMTBL); +Assert.assertEquals("1", r1.get(0)); + +// 2. Let a transaction be aborted +hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, true); +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(3,4)"); +hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, false); +// There should be 1 delta and 1 base directory. The base one is the aborted one. +verifyDeltaDirAndResult(2, Table.MMTBL.toString(), "", resultData1); + +r1 = runStatementOnDriver("select count(*) from " + Table.MMTBL); +Assert.assertEquals("1", r1.get(0)); + +// Verify query result +int [][] resultData2 = new int[][] {{1,2}, {5,6}}; + +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(5,6)"); +verifyDeltaDirAndResult(3, Table.MMTBL.toString(), "", resultData2); +r1 = runStatementOnDriver("select count(*) from " + Table.MMTBL); +Assert.assertEquals("2", r1.get(0)); + +// 4. Perform a MINOR compaction, expectation is it should remove aborted base dir +runStatementOnDriver("alter table "+ Table.MMTBL + " compact 'MINOR'"); +// The worker should remove the subdir for aborted transaction +runWorker(hiveConf); +verifyDeltaDirAndResult(2, Table.MMTBL.toString(), "", resultData2); +verifyBaseDirAndResult(0, Table.MMTBL.toString(), "", resultData2); +// 5. Run Cleaner. Shouldn't impact anything. +runCleaner(hiveConf); +// 6. Run initiator remove aborted entry from TXNS table +runInitiator(hiveConf); + +// Verify query result +List rs = runStatementOnDriver("select a,b from " + Table.MMTBL + " order by a"); +Assert.assertEquals(stringifyValues(resultData2), rs); + +int [][] resultData3 = new int[][] {{1,2}, {5,6}, {7,8}}; +// 7. add few more rows +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(7,8)"); +// 8. add one more aborted delta +hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, true); +
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495235=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495235 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 09:49 Start Date: 05/Oct/20 09:49 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499475796 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -107,11 +107,12 @@ public CompactionTxnHandler() { // Check for aborted txns: number of aborted txns past threshold and age of aborted txns // past time threshold boolean checkAbortedTimeThreshold = abortedTimeThreshold >= 0; -final String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\"," -+ "MIN(\"TXN_STARTED\"), COUNT(*)" +String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\", " ++ "MIN(\"TXN_STARTED\"), COUNT(*), " ++ "MAX(CASE WHEN \"TC_OPERATION_TYPE\" = " + OperationType.DYNPART + " THEN 1 ELSE 0 END) AS \"IS_DP\" " Review comment: I might be mistaken here, but does this mean, that if we have many "normal" aborted txn and 1 aborted dynpart txn, we will not initiate a normal compaction until the dynpart stuff is not cleaned up? Is this ok, shouldn't we doing both? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495235) Time Spent: 2h 40m (was: 2.5h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 2h 40m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495226=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495226 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 09:20 Start Date: 05/Oct/20 09:20 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499457494 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java ## @@ -237,6 +237,7 @@ void run(HiveConf conf, String jobName, Table t, Partition p, StorageDescriptor } JobConf job = createBaseJobConf(conf, jobName, t, sd, writeIds, ci); +QueryCompactor.Util.removeAbortedDirsForAcidTable(conf, dir); Review comment: @vpnvishv Why do we do this here? I understand we can, but why don't we let the Cleaner to delete the files? This just makes the compactor slower. Do we have a functionality reason for this? After this change it will run in CompactorMR and in MMQueryCompactors, but not in normal QueryCompactors? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495226) Time Spent: 2.5h (was: 2h 20m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 2.5h > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495221=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495221 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 09:08 Start Date: 05/Oct/20 09:08 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499449636 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -97,9 +100,9 @@ public void run() { long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner(); LOG.info("Cleaning based on min open txn id: " + minOpenTxnId); List cleanerList = new ArrayList<>(); - for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { + for (CompactionInfo compactionInfo : txnHandler.findReadyToClean()) { cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(() -> -clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); + clean(compactionInfo, minOpenTxnId)), cleanerExecutor)); Review comment: Two questions here: 1. In the original Jira there was discussion about not allowing concurrent cleanings of the same stuff (partition / table). Should we worry about this? 2. The slow cleanAborted will clog the executor service, we should do something about this, either in this patch, or follow up something like https://issues.apache.org/jira/browse/HIVE-21150 immediately after this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495221) Time Spent: 2h 20m (was: 2h 10m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495204=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495204 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 07:56 Start Date: 05/Oct/20 07:56 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499405728 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) { tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES); } + /** + * Look for delta directories matching the list of writeIds and deletes them. + * @param rootPartition root partition to look for the delta directories + * @param conf configuration + * @param writeIds list of writeIds to look for in the delta directories + * @return list of deleted directories. + * @throws IOException + */ + public static List deleteDeltaDirectories(Path rootPartition, Configuration conf, Set writeIds) + throws IOException { +FileSystem fs = rootPartition.getFileSystem(conf); + +PathFilter filter = (p) -> { + String name = p.getName(); + for (Long wId : writeIds) { +if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) { + return true; +} else if (name.startsWith(baseDir(wId)) && !name.contains("=")) { + return true; +} + } + return false; +}; +List deleted = new ArrayList<>(); +deleteDeltaDirectoriesAux(rootPartition, fs, filter, deleted); +return deleted; + } + + private static void deleteDeltaDirectoriesAux(Path root, FileSystem fs, PathFilter filter, List deleted) + throws IOException { +RemoteIterator it = listIterator(fs, root, null); + +while (it.hasNext()) { + FileStatus fStatus = it.next(); + if (fStatus.isDirectory()) { +if (filter.accept(fStatus.getPath())) { + fs.delete(fStatus.getPath(), true); + deleted.add(fStatus); +} else { + deleteDeltaDirectoriesAux(fStatus.getPath(), fs, filter, deleted); + if (isDirectoryEmpty(fs, fStatus.getPath())) { +fs.delete(fStatus.getPath(), false); +deleted.add(fStatus); + } +} + } +} + } + + private static boolean isDirectoryEmpty(FileSystem fs, Path path) throws IOException { +RemoteIterator it = listIterator(fs, path, null); +return !it.hasNext(); + } + + private static RemoteIterator listIterator(FileSystem fs, Path path, PathFilter filter) + throws IOException { +try { + return new ToFileStatusIterator(SHIMS.listLocatedHdfsStatusIterator(fs, path, filter)); +} catch (Throwable t) { Review comment: This should be similar to tryListLocatedHdfsStatus don't catch all Throwable. And maybe add all this to the HdfsUtils class This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495204) Time Spent: 2h 10m (was: 2h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495203 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 07:56 Start Date: 05/Oct/20 07:56 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499405728 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) { tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES); } + /** + * Look for delta directories matching the list of writeIds and deletes them. + * @param rootPartition root partition to look for the delta directories + * @param conf configuration + * @param writeIds list of writeIds to look for in the delta directories + * @return list of deleted directories. + * @throws IOException + */ + public static List deleteDeltaDirectories(Path rootPartition, Configuration conf, Set writeIds) + throws IOException { +FileSystem fs = rootPartition.getFileSystem(conf); + +PathFilter filter = (p) -> { + String name = p.getName(); + for (Long wId : writeIds) { +if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) { + return true; +} else if (name.startsWith(baseDir(wId)) && !name.contains("=")) { + return true; +} + } + return false; +}; +List deleted = new ArrayList<>(); +deleteDeltaDirectoriesAux(rootPartition, fs, filter, deleted); +return deleted; + } + + private static void deleteDeltaDirectoriesAux(Path root, FileSystem fs, PathFilter filter, List deleted) + throws IOException { +RemoteIterator it = listIterator(fs, root, null); + +while (it.hasNext()) { + FileStatus fStatus = it.next(); + if (fStatus.isDirectory()) { +if (filter.accept(fStatus.getPath())) { + fs.delete(fStatus.getPath(), true); + deleted.add(fStatus); +} else { + deleteDeltaDirectoriesAux(fStatus.getPath(), fs, filter, deleted); + if (isDirectoryEmpty(fs, fStatus.getPath())) { +fs.delete(fStatus.getPath(), false); +deleted.add(fStatus); + } +} + } +} + } + + private static boolean isDirectoryEmpty(FileSystem fs, Path path) throws IOException { +RemoteIterator it = listIterator(fs, path, null); +return !it.hasNext(); + } + + private static RemoteIterator listIterator(FileSystem fs, Path path, PathFilter filter) + throws IOException { +try { + return new ToFileStatusIterator(SHIMS.listLocatedHdfsStatusIterator(fs, path, filter)); +} catch (Throwable t) { Review comment: This should be similar to tryListLocatedHdfsStatus don't catch all Throwable This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495203) Time Spent: 2h (was: 1h 50m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 2h > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495202=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495202 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 07:53 Start Date: 05/Oct/20 07:53 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499404466 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) { tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES); } + /** + * Look for delta directories matching the list of writeIds and deletes them. + * @param rootPartition root partition to look for the delta directories + * @param conf configuration + * @param writeIds list of writeIds to look for in the delta directories + * @return list of deleted directories. + * @throws IOException + */ + public static List deleteDeltaDirectories(Path rootPartition, Configuration conf, Set writeIds) + throws IOException { +FileSystem fs = rootPartition.getFileSystem(conf); + +PathFilter filter = (p) -> { + String name = p.getName(); + for (Long wId : writeIds) { +if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) { + return true; +} else if (name.startsWith(baseDir(wId)) && !name.contains("=")) { + return true; +} + } + return false; +}; +List deleted = new ArrayList<>(); +deleteDeltaDirectoriesAux(rootPartition, fs, filter, deleted); +return deleted; + } + + private static void deleteDeltaDirectoriesAux(Path root, FileSystem fs, PathFilter filter, List deleted) + throws IOException { +RemoteIterator it = listIterator(fs, root, null); + +while (it.hasNext()) { + FileStatus fStatus = it.next(); + if (fStatus.isDirectory()) { +if (filter.accept(fStatus.getPath())) { + fs.delete(fStatus.getPath(), true); + deleted.add(fStatus); +} else { + deleteDeltaDirectoriesAux(fStatus.getPath(), fs, filter, deleted); + if (isDirectoryEmpty(fs, fStatus.getPath())) { Review comment: Are we doing this to delete newly created partitions if there are no other writes? Is this ok, what if we found a valid empty partition that is registered in the HMS? We should not delete that. I think this can be skipped all together, the empty partition dir will not bother anybody This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495202) Time Spent: 1h 50m (was: 1h 40m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495201=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495201 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 07:50 Start Date: 05/Oct/20 07:50 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499402826 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) { tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES); } + /** + * Look for delta directories matching the list of writeIds and deletes them. + * @param rootPartition root partition to look for the delta directories + * @param conf configuration + * @param writeIds list of writeIds to look for in the delta directories + * @return list of deleted directories. + * @throws IOException + */ + public static List deleteDeltaDirectories(Path rootPartition, Configuration conf, Set writeIds) + throws IOException { +FileSystem fs = rootPartition.getFileSystem(conf); + +PathFilter filter = (p) -> { + String name = p.getName(); + for (Long wId : writeIds) { +if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) { Review comment: I am wondering are we covering all the use cases here? Is it possible that this dynamic part query was writing to an existing partition with existing older writes and a compaction was running before we managed to delete the aborted delta? I think in this case sadly, we still going to read the aborted data as valid. Could you add a test case to check if it is indeed a problem or not? (I do not have an idea for a solution...) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495201) Time Spent: 1h 40m (was: 1.5h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495199=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495199 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 07:44 Start Date: 05/Oct/20 07:44 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499399709 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) { tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES); } + /** + * Look for delta directories matching the list of writeIds and deletes them. + * @param rootPartition root partition to look for the delta directories + * @param conf configuration + * @param writeIds list of writeIds to look for in the delta directories + * @return list of deleted directories. + * @throws IOException + */ + public static List deleteDeltaDirectories(Path rootPartition, Configuration conf, Set writeIds) + throws IOException { +FileSystem fs = rootPartition.getFileSystem(conf); + +PathFilter filter = (p) -> { + String name = p.getName(); + for (Long wId : writeIds) { +if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) { Review comment: Why the contains "=", are we checking for a partition where the user named the column exactly like a valid delta dir? I don't think we should support that This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495199) Time Spent: 1.5h (was: 1h 20m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495191=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495191 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 05/Oct/20 07:21 Start Date: 05/Oct/20 07:21 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499388356 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) { tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES); } + /** + * Look for delta directories matching the list of writeIds and deletes them. + * @param rootPartition root partition to look for the delta directories + * @param conf configuration + * @param writeIds list of writeIds to look for in the delta directories + * @return list of deleted directories. + * @throws IOException + */ + public static List deleteDeltaDirectories(Path rootPartition, Configuration conf, Set writeIds) + throws IOException { +FileSystem fs = rootPartition.getFileSystem(conf); + +PathFilter filter = (p) -> { + String name = p.getName(); + for (Long wId : writeIds) { +if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) { + return true; +} else if (name.startsWith(baseDir(wId)) && !name.contains("=")) { + return true; +} + } + return false; +}; +List deleted = new ArrayList<>(); +deleteDeltaDirectoriesAux(rootPartition, fs, filter, deleted); +return deleted; + } + + private static void deleteDeltaDirectoriesAux(Path root, FileSystem fs, PathFilter filter, List deleted) Review comment: This is going issue many filesystem listing on a table with many partitions, that is going to be very slow on S3. I think you should consider changing this logic to be similar to getHdfsDirSnapshots that would do 1 recursive listing, iterate all the files and collect the deltas that needs to be deleted and delete them at the end (possible concurrently) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 495191) Time Spent: 1h 20m (was: 1h 10m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=494272=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494272 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 03/Oct/20 10:54 Start Date: 03/Oct/20 10:54 Worklog Time Spent: 10m Work Description: vpnvishv commented on a change in pull request #1548: URL: https://github.com/apache/hive/pull/1548#discussion_r499137459 ## File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java ## @@ -2128,6 +2128,395 @@ public void testCleanerForTxnToWriteId() throws Exception { 0, TxnDbUtil.countQueryAgent(hiveConf, "select count(*) from TXN_TO_WRITE_ID")); } + @Test +public void testMmTableAbortWithCompaction() throws Exception { +// 1. Insert some rows into MM table +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(1,2)"); +// There should be 1 delta directory +int [][] resultData1 = new int[][] {{1,2}}; +verifyDeltaDirAndResult(1, Table.MMTBL.toString(), "", resultData1); +List r1 = runStatementOnDriver("select count(*) from " + Table.MMTBL); +Assert.assertEquals("1", r1.get(0)); + +// 2. Let a transaction be aborted +hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, true); +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(3,4)"); +hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, false); +// There should be 1 delta and 1 base directory. The base one is the aborted one. +verifyDeltaDirAndResult(2, Table.MMTBL.toString(), "", resultData1); + +r1 = runStatementOnDriver("select count(*) from " + Table.MMTBL); +Assert.assertEquals("1", r1.get(0)); + +// Verify query result +int [][] resultData2 = new int[][] {{1,2}, {5,6}}; + +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(5,6)"); +verifyDeltaDirAndResult(3, Table.MMTBL.toString(), "", resultData2); +r1 = runStatementOnDriver("select count(*) from " + Table.MMTBL); +Assert.assertEquals("2", r1.get(0)); + +// 4. Perform a MINOR compaction, expectation is it should remove aborted base dir +runStatementOnDriver("alter table "+ Table.MMTBL + " compact 'MINOR'"); +// The worker should remove the subdir for aborted transaction +runWorker(hiveConf); +verifyDeltaDirAndResult(2, Table.MMTBL.toString(), "", resultData2); +verifyBaseDirAndResult(0, Table.MMTBL.toString(), "", resultData2); +// 5. Run Cleaner. Shouldn't impact anything. +runCleaner(hiveConf); +// 6. Run initiator remove aborted entry from TXNS table +runInitiator(hiveConf); + +// Verify query result +List rs = runStatementOnDriver("select a,b from " + Table.MMTBL + " order by a"); +Assert.assertEquals(stringifyValues(resultData2), rs); + +int [][] resultData3 = new int[][] {{1,2}, {5,6}, {7,8}}; +// 7. add few more rows +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(7,8)"); +// 8. add one more aborted delta +hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, true); +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(9,10)"); +hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, false); + +// 9. Perform a MAJOR compaction, expectation is it should remove aborted base dir +runStatementOnDriver("alter table "+ Table.MMTBL + " compact 'MAJOR'"); +verifyDeltaDirAndResult(4, Table.MMTBL.toString(), "", resultData3); +runWorker(hiveConf); +verifyDeltaDirAndResult(3, Table.MMTBL.toString(), "", resultData3); +verifyBaseDirAndResult(1, Table.MMTBL.toString(), "", resultData3); +runCleaner(hiveConf); +verifyDeltaDirAndResult(0, Table.MMTBL.toString(), "", resultData3); +verifyBaseDirAndResult(1, Table.MMTBL.toString(), "", resultData3); +runInitiator(hiveConf); +verifyDeltaDirAndResult(0, Table.MMTBL.toString(), "", resultData3); +verifyBaseDirAndResult(1, Table.MMTBL.toString(), "", resultData3); + +// Verify query result +rs = runStatementOnDriver("select a,b from " + Table.MMTBL + " order by a"); +Assert.assertEquals(stringifyValues(resultData3), rs); + } + @Test + public void testMmTableAbortWithCompactionNoCleanup() throws Exception { +// 1. Insert some rows into MM table +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(1,2)"); +runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(5,6)"); +// There should be 1 delta directory +int [][] resultData1 = new int[][] {{1,2}, {5,6}}; +verifyDeltaDirAndResult(2, Table.MMTBL.toString(), "", resultData1); +List r1 = runStatementOnDriver("select count(*) from " + Table.MMTBL); +Assert.assertEquals("2", r1.get(0)); + +// 2. Let a transaction be aborted +
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=494269=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494269 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 03/Oct/20 10:43 Start Date: 03/Oct/20 10:43 Worklog Time Spent: 10m Work Description: vpnvishv commented on pull request #1415: URL: https://github.com/apache/hive/pull/1415#issuecomment-703083771 @deniskuzZ Thanks for reviewing and porting to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 494269) Time Spent: 1h (was: 50m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 1h > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=494005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494005 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 02/Oct/20 15:38 Start Date: 02/Oct/20 15:38 Worklog Time Spent: 10m Work Description: deniskuzZ commented on pull request #1548: URL: https://github.com/apache/hive/pull/1548#issuecomment-702804902 @vpnvishv , could you please check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 494005) Time Spent: 50m (was: 40m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 50m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=493954=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493954 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 02/Oct/20 13:45 Start Date: 02/Oct/20 13:45 Worklog Time Spent: 10m Work Description: deniskuzZ opened a new pull request #1548: URL: https://github.com/apache/hive/pull/1548 ### What changes were proposed in this pull request? Below changes are only with respect to branch-3.1. Design: taken from https://issues.apache.org/jira/secure/attachment/12954375/Aborted%20Txn%20w_Direct%20Write.pdf **Overview:** 1. add a dummy row to TXN_COMPONENTS with operation type 'p' in enqueueLockWithRetry, which will be removed in addDynamicPartition 2. If anytime txn is aborted, this dummy entry will be block initiator to remove this txnId from TXNS 3. Initiator will add a row in COMPACTION_QUEUE (with type 'p') for the above aborted txn with the state as READY_FOR_CLEANING, at a time there will be a single entry of this type for a table in COMPACTION_QUEUE. 4. Cleaner will directly pickup above request, and process it via new cleanAborted code path(scan all partitions and remove aborted dirs), once successful cleaner will remove dummy row from TXN_COMPONENTS **Cleaner Design:** - We are keeping cleaner single thread, and this new type of cleanup will be handled similar to any regular cleanup **Aborted dirs cleanup:** - In p-type cleanup, cleaner will iterate over all the partitions and remove all delta/base dirs with given aborted writeId list - added cleanup of aborted base/delta in the worker also **TXN_COMPONENTS cleanup:** - If successful, p-type entry will be removed from TXN_COMPONENTS during addDynamicPartitions - If aborted, cleaner will clean in markCleaned after successful processing of p-type cleanup **TXNS cleanup:** - No change, will be cleaned up by the initiator ### Why are the changes needed? To fix above mentioned issue. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? unit-tests added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 493954) Time Spent: 0.5h (was: 20m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=493956=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493956 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 02/Oct/20 13:45 Start Date: 02/Oct/20 13:45 Worklog Time Spent: 10m Work Description: deniskuzZ commented on pull request #1415: URL: https://github.com/apache/hive/pull/1415#issuecomment-702742918 created same for master: https://github.com/apache/hive/pull/1548 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 493956) Time Spent: 40m (was: 0.5h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 40m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=493001=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493001 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 30/Sep/20 15:00 Start Date: 30/Sep/20 15:00 Worklog Time Spent: 10m Work Description: deniskuzZ commented on pull request #1415: URL: https://github.com/apache/hive/pull/1415#issuecomment-701447242 @vpnvishv, thank you for your efforts, patch looks good! Could you please submit similar pull request for the master? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 493001) Time Spent: 20m (was: 10m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 20m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=473016=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473016 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 20/Aug/20 18:11 Start Date: 20/Aug/20 18:11 Worklog Time Spent: 10m Work Description: vpnvishv opened a new pull request #1415: URL: https://github.com/apache/hive/pull/1415 ### What changes were proposed in this pull request? Below changes are only with respect to branch-3.1. Design: taken from https://issues.apache.org/jira/secure/attachment/12954375/Aborted%20Txn%20w_Direct%20Write.pdf **Overview:** 1. add a dummy row to TXN_COMPONENTS with operation type 'p' in enqueueLockWithRetry, which will be removed in addDynamicPartition 2. If anytime txn is aborted, this dummy entry will be block initiator to remove this txnId from TXNS 3. Initiator will add a row in COMPACTION_QUEUE (with type 'p') for the above aborted txn with the state as READY_FOR_CLEANING, at a time there will be a single entry of this type for a table in COMPACTION_QUEUE. 4. Cleaner will directly pickup above request, and process it via new cleanAborted code path(scan all partitions and remove aborted dirs), once successful cleaner will remove dummy row from TXN_COMPONENTS **Cleaner Design:** - We are keeping cleaner single thread, and this new type of cleanup will be handled similar to any regular cleanup **Aborted dirs cleanup:** - In p-type cleanup, cleaner will iterate over all the partitions and remove all delta/base dirs with given aborted writeId list - added cleanup of aborted base/delta in the worker also **TXN_COMPONENTS cleanup:** - If successful, p-type entry will be removed from TXN_COMPONENTS during addDynamicPartitions - If aborted, cleaner will clean in markCleaned after successful processing of p-type cleanup **TXNS cleanup:** - No change, will be cleaned up by the initiator ### Why are the changes needed? To fix above mentioned issue. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? unit-tests added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 473016) Remaining Estimate: 0h Time Spent: 10m > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 10m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)