I’ve been looking at the OrcFile.createReader method and thinking about what I will need to do to read acid files. The first thing that strikes me is that createReader takes a file. But for acid, you need to pass the directory because it needs to look for any relevant delta files. Acid also requires a ValidTxnList. We can add that to the ReaderOptions.
It seems the best way to do this is to add a new method OrcFile.createAcidReader that takes a directory. I don’t like that the user has to make a different call in the acid case. But the user will have to set the ValidTxnList in the reader options anyway, so the user will already have to have split logic. Every way I could think of for createReader to decide if it was dealing with an acid directory or a non-acid file seemed to create jumbled semantics. Does the user pass a directory for the acid case but a file for non-acid? Yuck. Does the user pass a base file in the acid case and the code walks up the path to find the relevant directory? Seems error prone and slow. Related to this is my assumption that I will need to write a new implementation of Reader and RecordReader that understand acid. This seems better than putting a bunch of branches into the existing code to try to handle both cases. Thoughts? Alan.
