[ 
https://issues.apache.org/jira/browse/HIVE-23725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Varga updated HIVE-23725:
-------------------------------
    Description: 
When the ValidTxnManager invalidates the snapshot during merge insert and 
starts to read committed transactions that were not committed when the query 
compilation happened, it can cause partial read problems if the committed 
transaction created new partition in the source or target table.

The solution should be not only fix the snapshot but also recompile the query 
and acquire the locks again.
You could construct an example like this:
1. open and compile transaction 1 that merge inserts data from a partitioned 
source table that has a few partition.
2. Open, run and commit transaction 2 that inserts data to an old and a new 
partition to the source table.
3. Open, run and commit transaction 3 that inserts data to the target table of 
the merge statement, that will retrigger a snapshot generation in transaction 1.
4. Run transaction 1, the snapshot will be regenerated, and it will read 
partial data from transaction 2 breaking the ACID properties.

Different setup.
Switch the transaction order:
1. compile transaction 1 that inserts data to an old and a new partition of the 
source table.
2. compile transaction 2 that insert data to the target table
2. compile transaction 3 that merge inserts data from the source table to the 
target table
3. run and commit transaction 1
4. run and commit transaction 2
5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
isValidTxnListState will be triggered and we do a partial read of the 
transaction 1 for the same reasons.

  was:
When the ValidTxnManager invalidates the snapshot during merge insert and 
starts to read committed transactions that were not committed when the query 
compilation happened, it can cause partial read problems if the committed 
transaction created new partition in the source or target table.

The solution should be not only fix the snapshot but also recompile the query 
and acquire the locks again


> ValidTxnManager snapshot outdating causing partial reads in merge insert
> ------------------------------------------------------------------------
>
>                 Key: HIVE-23725
>                 URL: https://issues.apache.org/jira/browse/HIVE-23725
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Peter Varga
>            Assignee: Peter Varga
>            Priority: Major
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to