Yeah, I'd suggest adding to: OrcFile.ReaderOptions: exposeAcidRowId(boolean); -- so that the returned schema includes the ACID row id
Reader.Options: setValidTransactions(TransactionList); -- apply transaction filtering Then it will read a single file (or range using Reader.Options.range(long,long)). .. Owen On Thu, Sep 14, 2017 at 4:52 PM, Gopal Vijayaraghavan <[email protected]> wrote: > > For performance reasons, you prefer the second option that I rejected > > where users give a file and the system finds the deletes from there. I > can > > buy that. > > That's simpler at least to understand and debug, the logs from ORC alone > are enough to find consistency issues. > > The rest of the details are implicit to the implementation, beyond a base > file and the current transaction state. > > This is nearly exactly how the LLAP ACID cache patch does today, which > looks the cache up on the base file and applies local transaction state per > query (i.e valid txns list which hides the committed deletes from an older > query). > > > I don’t follow your last comment about ROW__ID being projected out to > the > > user. ORC isn’t currently hiding that field from the reader is it? > > In general, a BI tool of some kind over ACID probably cares about the data > and not the metadata about which rows belong to which transaction in > general. > > Hiding ROW__ID makes the consumer side of the reader identical between > ACID and non-ACID, unless it is being read by a "SELECT FOR UPDATE" reader. > > Cheers, > Gopal > > >
