Hello,

I have written the following two column readers for parquet, the first one
opens a parquet file once and reads all columns and the second one re-opens
the parquet file for every column it reads.

With the first one, i get an exception while reading some columns.

Exception in thread "main" parquet.io.ParquetDecodingException: Can't read
value in column [description] BINARY at value 44899 out of 57096, 44899 out
of 57096 in currentPage. repetition level: 0, definition level: 1

*1st:* https://gist.github.com/tispratik/f0044dd84dc8d8c6cbcf


With the second one, i do not get any exception. But this way of reading
the columns by re-opening the file for every column is not efficient.

*2nd:* https://gist.github.com/tispratik/f7a66f6a40b7ae3b98ad

Does anyone know whats going on here. I suspect a bug in the
ParquetFileReader class where it is storing some state which it is not able
to flush out completely.

Any help is appreciated.

Thanks,
Pratik

Reply via email to