>  For performance reasons, you prefer the second option that I rejected
>  where users give a file and the system finds the deletes from there.  I can
>  buy that.

That's simpler at least to understand and debug, the logs from ORC alone are 
enough to find consistency issues.

The rest of the details are implicit to the implementation, beyond a base file 
and the current transaction state.

This is nearly exactly how the LLAP ACID cache patch does today, which looks 
the cache up on the base file and applies local transaction state per query 
(i.e valid txns list which hides the committed deletes from an older query).

>  I don’t follow your last comment about ROW__ID being projected out to the
> user.  ORC isn’t currently hiding that field from the reader is it?

In general, a BI tool of some kind over ACID probably cares about the data and 
not the metadata about which rows belong to which transaction in general.

Hiding ROW__ID makes the consumer side of the reader identical between ACID and 
non-ACID, unless it is being read by a "SELECT FOR UPDATE" reader.

Cheers,
Gopal


Reply via email to