[ 
https://issues.apache.org/jira/browse/DRILL-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340825#comment-14340825
 ] 

Deneche A. Hakim commented on DRILL-2267:
-----------------------------------------

all unit tests are passing along with functional, customer and tpch100

> Parquet writer with dictionary encoding results in corrupted varchar columns
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-2267
>                 URL: https://issues.apache.org/jira/browse/DRILL-2267
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>            Reporter: Ramana Inukonda Nagaraj
>            Assignee: Deneche A. Hakim
>             Fix For: 0.8.0
>
>         Attachments: 0_0_0.parquet, DRILL-2267.1.patch.txt
>
>
> Using CTAS created a parquet file through drill having the varchar datatype.
> Created parquet file looks like this through parquet-tools 
> VARCHAR_col:         OPTIONAL BINARY O:UTF8 R:0 D:1
> VAR16CHAR_col:       OPTIONAL BINARY O:UTF8 R:0 D:1
> VARCHAR_col:          BINARY SNAPPY DO:0 FPO:894307 SZ:16344/231716/14.18 
> VC:378624 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
> VAR16CHAR_col:        BINARY SNAPPY DO:0 FPO:910651 SZ:25830/381493/14.77 
> VC:378624 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
> On querying the file several records show up having corrupted data for these 
> fields.
> | VAR16CHAR_col |
> +---------------+
> | ������������  |
> |               |
> | ��������      |
> | �����         |
> | ��   |
> |          |
> |           |
> | ��        |
> | ������������  |
> |               |
> | ��������      |
> | �����         |
> | ��   |
> |          |
> |           |
> | ��        |
> | ������������  |
> |               |
> | ��������      |
> | �����         |
> | ��   |
> |          |
> |           |
> | ��        |
> | ������������  |
> |               |
> | ��������      |
> | �����         |
> | ��   |
> |          |
> |           |
> | ��        |
> | ������������  |
> |               |
> | ��������      |
> | �����         |
> | ��   |
> If dictionary encoding is turned off the resultant file can be read without 
> these issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to