[ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360131#comment-15360131
 ] 

Hive QA commented on HIVE-13369:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12815793/HIVE-13369.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 10287 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
org.apache.hadoop.hive.ql.TestTxnCommands2.testNonAcidToAcidConversion3
org.apache.hadoop.hive.ql.io.TestAcidUtils.testBaseDeltas
org.apache.hadoop.hive.ql.io.TestAcidUtils.testOriginalDeltas
org.apache.hadoop.hive.ql.io.TestAcidUtils.testOverlapingDelta
org.apache.hadoop.hive.ql.io.TestAcidUtils.testOverlapingDelta2
org.apache.hadoop.hive.ql.txn.compactor.TestCleaner.cleanupAfterMajorPartitionCompaction
org.apache.hadoop.hive.ql.txn.compactor.TestCleaner.cleanupAfterMajorPartitionCompactionNoBase
org.apache.hadoop.hive.ql.txn.compactor.TestCleaner.cleanupAfterMajorTableCompaction
org.apache.hadoop.hive.ql.txn.compactor.TestCleaner.cleanupAfterMinorPartitionCompaction
org.apache.hadoop.hive.ql.txn.compactor.TestCleaner.cleanupAfterMinorTableCompaction
org.apache.hadoop.hive.ql.txn.compactor.TestCleaner2.cleanupAfterMajorPartitionCompaction
org.apache.hadoop.hive.ql.txn.compactor.TestCleaner2.cleanupAfterMajorPartitionCompactionNoBase
org.apache.hadoop.hive.ql.txn.compactor.TestCleaner2.cleanupAfterMajorTableCompaction
org.apache.hadoop.hive.ql.txn.compactor.TestCleaner2.cleanupAfterMinorPartitionCompaction
org.apache.hadoop.hive.ql.txn.compactor.TestCleaner2.cleanupAfterMinorTableCompaction
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/347/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/347/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-347/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 20 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12815793 - PreCommit-HIVE-MASTER-Build

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-13369
>                 URL: https://issues.apache.org/jira/browse/HIVE-13369
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 1.0.0
>            Reporter: Eugene Koifman
>            Assignee: Wei Zheng
>            Priority: Blocker
>         Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>    preserve the history, we can't use a base directory that includes a
>    transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==========================================================
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago or (at least in theory) by 
> autoCommit one due to long GC pause for example.   It acquires locks after 
> the Cleaner checks LM state and calls getAcidState(). This request will 
> choose base_5 but it won't see delta_16_16 and delta_17_17 and thus return 
> the snapshot w/o modifications made by those txns.
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to