A similar issue was reported here
https://issues.apache.org/jira/browse/DRILL-827
Not quite sure about the fix they made.


On Fri, Aug 29, 2014 at 8:09 AM, pratik khadloya <[email protected]>
wrote:

> Hello,
>
> I have written the following two column readers for parquet, the first one
> opens a parquet file once and reads all columns and the second one re-opens
> the parquet file for every column it reads.
>
> With the first one, i get an exception while reading some columns.
>
> Exception in thread "main" parquet.io.ParquetDecodingException: Can't read
> value in column [description] BINARY at value 44899 out of 57096, 44899 out
> of 57096 in currentPage. repetition level: 0, definition level: 1
>
> *1st:* https://gist.github.com/tispratik/f0044dd84dc8d8c6cbcf
>
>
> With the second one, i do not get any exception. But this way of reading
> the columns by re-opening the file for every column is not efficient.
>
> *2nd:* https://gist.github.com/tispratik/f7a66f6a40b7ae3b98ad
>
> Does anyone know whats going on here. I suspect a bug in the
> ParquetFileReader class where it is storing some state which it is not able
> to flush out completely.
>
> Any help is appreciated.
>
> Thanks,
> Pratik
>

Reply via email to