[jira] [Resolved] (DRILL-4048) Parquet reader corrupts dictionary encoded binary columns

2016-02-02 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4048.

   Resolution: Fixed
Fix Version/s: 1.4.0

> Parquet reader corrupts dictionary encoded binary columns
> -
>
> Key: DRILL-4048
> URL: https://issues.apache.org/jira/browse/DRILL-4048
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Assignee: Jason Altekruse
>Priority: Blocker
> Fix For: 1.4.0
>
> Attachments: lineitem_dic_enc.parquet
>
>
> git.commit.id.abbrev=04c01bd
> The below query returns corrupted data (not even showing up here) for binary 
> columns
> {code}
> select * from `lineitem_dic_enc.parquet` limit 1;
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
> l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
> l_shipdate  | l_commitdate  | l_receiptdate  |   l_shipinstruct   | 
> l_shipmode  |l_comment |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | 1   | 1552   | 93 | 1 | 17.0| 
> 24710.35 | 0.04| 0.02   |  |  | 
> 1996-03-13  | 1996-02-12| 1996-03-22 | DELIVER IN PE  | T   | 
> egular courts above the  |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> {code}
> The same query from an older build (git.commit.id.abbrev=839f8da)
> {code}
> select * from `lineitem_dic_enc.parquet` limit 1;
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
> l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
> l_shipdate  | l_commitdate  | l_receiptdate  |   l_shipinstruct   | 
> l_shipmode  |l_comment |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | 1   | 1552   | 93 | 1 | 17.0| 
> 24710.35 | 0.04| 0.02   | N | O | 
> 1996-03-13  | 1996-02-12| 1996-03-22 | DELIVER IN PERSON  | TRUCK 
>   | egular courts above the  |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> {code}
> Below is the output of the parquet-meta command for this dataset
> {code}
> creator: parquet-mr 
> file schema: root 
> ---
> l_orderkey:  REQUIRED INT32 R:0 D:0
> l_partkey:   REQUIRED INT32 R:0 D:0
> l_suppkey:   REQUIRED INT32 R:0 D:0
> l_linenumber:REQUIRED INT32 R:0 D:0
> l_quantity:  REQUIRED DOUBLE R:0 D:0
> l_extendedprice: REQUIRED DOUBLE R:0 D:0
> l_discount:  REQUIRED DOUBLE R:0 D:0
> l_tax:   REQUIRED DOUBLE R:0 D:0
> l_returnflag:REQUIRED BINARY O:UTF8 R:0 D:0
> l_linestatus:REQUIRED BINARY O:UTF8 R:0 D:0
> l_shipdate:  REQUIRED INT32 O:DATE R:0 D:0
> l_commitdate:REQUIRED INT32 O:DATE R:0 D:0
> l_receiptdate:   REQUIRED INT32 O:DATE R:0 D:0
> l_shipinstruct:  REQUIRED BINARY O:UTF8 R:0 D:0
> l_shipmode:  REQUIRED BINARY O:UTF8 R:0 D:0
> l_comment:   REQUIRED BINARY O:UTF8 R:0 D:0
> row group 1: RC:60175 TS:3049610 
> --

[jira] [Resolved] (DRILL-4048) Parquet reader corrupts dictionary encoded binary columns

2015-11-06 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-4048.
---
Resolution: Fixed

Fixed in a5a1aa6

> Parquet reader corrupts dictionary encoded binary columns
> -
>
> Key: DRILL-4048
> URL: https://issues.apache.org/jira/browse/DRILL-4048
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Assignee: Jason Altekruse
>Priority: Blocker
> Attachments: lineitem_dic_enc.parquet
>
>
> git.commit.id.abbrev=04c01bd
> The below query returns corrupted data (not even showing up here) for binary 
> columns
> {code}
> select * from `lineitem_dic_enc.parquet` limit 1;
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
> l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
> l_shipdate  | l_commitdate  | l_receiptdate  |   l_shipinstruct   | 
> l_shipmode  |l_comment |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | 1   | 1552   | 93 | 1 | 17.0| 
> 24710.35 | 0.04| 0.02   |  |  | 
> 1996-03-13  | 1996-02-12| 1996-03-22 | DELIVER IN PE  | T   | 
> egular courts above the  |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> {code}
> The same query from an older build (git.commit.id.abbrev=839f8da)
> {code}
> select * from `lineitem_dic_enc.parquet` limit 1;
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
> l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
> l_shipdate  | l_commitdate  | l_receiptdate  |   l_shipinstruct   | 
> l_shipmode  |l_comment |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | 1   | 1552   | 93 | 1 | 17.0| 
> 24710.35 | 0.04| 0.02   | N | O | 
> 1996-03-13  | 1996-02-12| 1996-03-22 | DELIVER IN PERSON  | TRUCK 
>   | egular courts above the  |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> {code}
> Below is the output of the parquet-meta command for this dataset
> {code}
> creator: parquet-mr 
> file schema: root 
> ---
> l_orderkey:  REQUIRED INT32 R:0 D:0
> l_partkey:   REQUIRED INT32 R:0 D:0
> l_suppkey:   REQUIRED INT32 R:0 D:0
> l_linenumber:REQUIRED INT32 R:0 D:0
> l_quantity:  REQUIRED DOUBLE R:0 D:0
> l_extendedprice: REQUIRED DOUBLE R:0 D:0
> l_discount:  REQUIRED DOUBLE R:0 D:0
> l_tax:   REQUIRED DOUBLE R:0 D:0
> l_returnflag:REQUIRED BINARY O:UTF8 R:0 D:0
> l_linestatus:REQUIRED BINARY O:UTF8 R:0 D:0
> l_shipdate:  REQUIRED INT32 O:DATE R:0 D:0
> l_commitdate:REQUIRED INT32 O:DATE R:0 D:0
> l_receiptdate:   REQUIRED INT32 O:DATE R:0 D:0
> l_shipinstruct:  REQUIRED BINARY O:UTF8 R:0 D:0
> l_shipmode:  REQUIRED BINARY O:UTF8 R:0 D:0
> l_comment:   REQUIRED BINARY O:UTF8 R:0 D:0
> row group 1: RC:60175 TS:3049610 
> -