> On May 20, 2020, 3:16 p.m., Peter Varga wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/Driver.java
> > Lines 686 (patched)
> > <https://reviews.apache.org/r/72528/diff/1/?file=2232587#file2232587line686>
> >
> >     I have concerns here, but I am not sure if they are well founded or 
> > not. I think this will break what the outside world thinks of snapshot 
> > isolation. I might have a hypothetical client that inserts lots of data in 
> > a source table and sometimes issue a merge statement from the source to the 
> > target table. They have some requirement that the target table can not have 
> > partial data regarding some property. Example they inserting sales data, 
> > and the target table can not contain half the data of a day, it can either 
> > have all or none. So what the clients does, it will issue the inserts into 
> > the source table synchronously ordered by the date and when it gets to a 
> > next day it issue a merge statement asynchronously and continues to inserts 
> > the data for the next day synchronously. And it might think that it is save 
> > to do so, since the merge statement has a snapshot it will not see the data 
> > inserted afterwards. But with this change it will break.
> >     It might not be the best example, since how would the client know when 
> > the snapshot is actually captured. But I am not familiar enough with the 
> > ecosystem, does anything use the Hive by issuing the compile and run 
> > separately? Because there you could be sure before this change, that the 
> > compilation order also meant snapshot order. So summarized, I don't know 
> > what the outside world excepts of the snapshot isolation.
> 
> Denys Kuzmenko wrote:
>     insert into source and merge from source into target won't conflict with 
> each other, they touch different tables. Maybe I missing something here...
> 
> Peter Varga wrote:
>     My example was not perfect. I don't mean that it will conflict with the 
> insert into the source table. It can conflict with some other client's 
> transaction. My main point is, after the conflict is noticed and you 
> regenerate the snapshot it will starts to read results from transactions that 
> were opened and committed after the original query was compiled, and I'm just 
> trying to figure out, what kinf of problems can it cause, if any. In my 
> example you start to read records inserted later, but what if somebody added 
> a new partition since the compilation, wouldn't it cause problem?
> 
> Denys Kuzmenko wrote:
>     probably there might be an issue as we won't create any locks for the 
> newly created partition, however we'll start reading it.
>     instead of rollback & retry on Hive side we might consider to just fail 
> and let the user re-try.
> 
> Denys Kuzmenko wrote:
>     however it still leaves the question what happens now in Hive when 
> somebody adds a new partition (insert with dynamic partitioning) since the 
> compilation (merge insert). I'll test this out.

You could construct an example like this:
1. open and compile transaction 1 that merge inserts data from a partitioned 
source table that has a few partition.
2. Open, run and commit transaction 2 that inserts data to an old and a new 
partition to the source table.
3. Open, run and commit transaction 3 that inserts data to the target table of 
the merge statement, that will retrigger a snapshot generation in transaction 1.
4. Run transaction 1, the snapshot will be regenerated, and I think it will 
read partial data from transaction 2 breaking the ACID properties.

But we should try the test without your patch with a little different setup.
Switch the transaction order:
1. compile transaction 1 that inserts data to an old and a new partition of the 
source table.
2. compile transaction 2 that insert data to the target table
2. compile transaction 3 that merge inserts data from the source table to the 
target table
3. run and commit transaction 1
4. run and commit transaction 2
5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
isValidTxnListState will be triggered without your patch and I think it 
possible that we do a partial read of the transaction 1 for the same reasons.


- Peter


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72528/#review220838
-----------------------------------------------------------


On May 19, 2020, 11:19 a.m., Denys Kuzmenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72528/
> -----------------------------------------------------------
> 
> (Updated May 19, 2020, 11:19 a.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez, Peter Varga, and Peter Vary.
> 
> 
> Bugs: HIVE-23503
>     https://issues.apache.org/jira/browse/HIVE-23503
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> ValidTxnManager doesn't consider txns opened and committed between snapshot 
> generation and locking when evaluating ValidTxnListState. This cause issues 
> like duplicate insert in case of concurrent merge insert & insert.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java e70c92eef4 
>   ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java a8c83fc504 
>   ql/src/java/org/apache/hadoop/hive/ql/ValidTxnManager.java 7d49c57dda 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 71afcbdc68 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java 
> 0383881acc 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java 
> 600289f837 
>   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
> 8a15b7cc5d 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
>  65df9c2ba9 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
>  887d4303f4 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java
>  312936efa8 
>   storage-api/src/java/org/apache/hadoop/hive/common/ValidReadTxnList.java 
> b8ff03f9c4 
>   storage-api/src/java/org/apache/hadoop/hive/common/ValidTxnList.java 
> d4c3b09730 
> 
> 
> Diff: https://reviews.apache.org/r/72528/diff/1/
> 
> 
> Testing
> -------
> 
> DbTxnManager tests.
> 
> 
> Thanks,
> 
> Denys Kuzmenko
> 
>

Reply via email to