[
https://issues.apache.org/jira/browse/DRILL-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13991481#comment-13991481
]
Jason Altekruse commented on DRILL-649:
---------------------------------------
I took a look at the file, I had only implemented the dictionary encoding for
varchar fields. I had seen that they added dictionaries for other types, but I
thought that the varchar dictionaries would be the ones blocking our reading of
Impala generated files. They currently use ints to index into the dictionary,
which makes having a dictionary of floats or ints seemingly useless, but with a
cap on dictionary sizes around 50,000 they can still save some space by bit
packing the dictionary keys so each of them is stored in less than 4 bytes (we
will have to read each 'int' into memory, bit mask it to re-zero fill the value
that was bit packed and then use that to look up in the dictionary).
This is going to kill our read performance on the other types, because we have
to materialize everything at read time and can no longer use vector copies, but
I'll get together a fix for it before the end of the week to allow us at least
to read the files. I'll try to get the dictionaries off heap for performance,
but I will focus first on just getting it working.
> Unable to read dictionary encoded parquet file generated from impala or avro
> ----------------------------------------------------------------------------
>
> Key: DRILL-649
> URL: https://issues.apache.org/jira/browse/DRILL-649
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Steven Phillips
> Assignee: Jason Altekruse
> Attachments: nation.parquet
>
>
> support for dictionary encoding was recently added, but it looks like some
> dictionary encoded files are still unreadable by drill. For example, the
> parquet file created from an avro file attached to DRILL-389 still fails.
> I also created a simple parquet file from impala, which also fails to read.
> I will attach the file.
--
This message was sent by Atlassian JIRA
(v6.2#6252)