A similar issue was reported here https://issues.apache.org/jira/browse/DRILL-827 Not quite sure about the fix they made.
On Fri, Aug 29, 2014 at 8:09 AM, pratik khadloya <[email protected]> wrote: > Hello, > > I have written the following two column readers for parquet, the first one > opens a parquet file once and reads all columns and the second one re-opens > the parquet file for every column it reads. > > With the first one, i get an exception while reading some columns. > > Exception in thread "main" parquet.io.ParquetDecodingException: Can't read > value in column [description] BINARY at value 44899 out of 57096, 44899 out > of 57096 in currentPage. repetition level: 0, definition level: 1 > > *1st:* https://gist.github.com/tispratik/f0044dd84dc8d8c6cbcf > > > With the second one, i do not get any exception. But this way of reading > the columns by re-opening the file for every column is not efficient. > > *2nd:* https://gist.github.com/tispratik/f7a66f6a40b7ae3b98ad > > Does anyone know whats going on here. I suspect a bug in the > ParquetFileReader class where it is storing some state which it is not able > to flush out completely. > > Any help is appreciated. > > Thanks, > Pratik >
