> For performance reasons, you prefer the second option that I rejected > where users give a file and the system finds the deletes from there. I can > buy that.
That's simpler at least to understand and debug, the logs from ORC alone are enough to find consistency issues. The rest of the details are implicit to the implementation, beyond a base file and the current transaction state. This is nearly exactly how the LLAP ACID cache patch does today, which looks the cache up on the base file and applies local transaction state per query (i.e valid txns list which hides the committed deletes from an older query). > I don’t follow your last comment about ROW__ID being projected out to the > user. ORC isn’t currently hiding that field from the reader is it? In general, a BI tool of some kind over ACID probably cares about the data and not the metadata about which rows belong to which transaction in general. Hiding ROW__ID makes the consumer side of the reader identical between ACID and non-ACID, unless it is being read by a "SELECT FOR UPDATE" reader. Cheers, Gopal