Hello, I have written the following two column readers for parquet, the first one opens a parquet file once and reads all columns and the second one re-opens the parquet file for every column it reads.
With the first one, i get an exception while reading some columns. Exception in thread "main" parquet.io.ParquetDecodingException: Can't read value in column [description] BINARY at value 44899 out of 57096, 44899 out of 57096 in currentPage. repetition level: 0, definition level: 1 *1st:* https://gist.github.com/tispratik/f0044dd84dc8d8c6cbcf With the second one, i do not get any exception. But this way of reading the columns by re-opening the file for every column is not efficient. *2nd:* https://gist.github.com/tispratik/f7a66f6a40b7ae3b98ad Does anyone know whats going on here. I suspect a bug in the ParquetFileReader class where it is storing some state which it is not able to flush out completely. Any help is appreciated. Thanks, Pratik
