[ 
https://issues.apache.org/jira/browse/HIVE-21757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844284#comment-16844284
 ] 

Todd Lipcon commented on HIVE-21757:
------------------------------------

This is particularly important if we want to be able to cache the file listings 
for a table based on the table's latest write ID. If the compactor doesn't 
change write IDs, but changes the set of files, then that caching strategy 
becomes impossible. Given that file listing is pretty expensive on cloud 
stores, caching them can be quite useful for low-latency queries.

It seems likely this could cause problems for things like replication as well.

> ACID: use a new write id for compaction's output instead of the visibility id
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-21757
>                 URL: https://issues.apache.org/jira/browse/HIVE-21757
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 4.0.0
>            Reporter: Vaibhav Gumashta
>            Priority: Major
>
> HIVE-20823 added support for running compaction within a transaction. To 
> control the visibility of the output directory, it uses 
> base_writeId_visibilityId, where visibilityId is the transaction id of the 
> transaction that the compactor ran in. Perhaps we can keep using the 
> base_writeId format, by allocating a new writeId for the compactor and 
> creating the new base/delta with that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to