[jira] [Resolved] (DRILL-4048) Parquet reader corrupts dictionary encoded binary columns
[ https://issues.apache.org/jira/browse/DRILL-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4048. Resolution: Fixed Fix Version/s: 1.4.0 > Parquet reader corrupts dictionary encoded binary columns > - > > Key: DRILL-4048 > URL: https://issues.apache.org/jira/browse/DRILL-4048 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.3.0 >Reporter: Rahul Challapalli >Assignee: Jason Altekruse >Priority: Blocker > Fix For: 1.4.0 > > Attachments: lineitem_dic_enc.parquet > > > git.commit.id.abbrev=04c01bd > The below query returns corrupted data (not even showing up here) for binary > columns > {code} > select * from `lineitem_dic_enc.parquet` limit 1; > +-+++---+-+--+-++---+---+-+---+++-+--+ > | l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity | > l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus | > l_shipdate | l_commitdate | l_receiptdate | l_shipinstruct | > l_shipmode |l_comment | > +-+++---+-+--+-++---+---+-+---+++-+--+ > | 1 | 1552 | 93 | 1 | 17.0| > 24710.35 | 0.04| 0.02 | | | > 1996-03-13 | 1996-02-12| 1996-03-22 | DELIVER IN PE | T | > egular courts above the | > +-+++---+-+--+-++---+---+-+---+++-+--+ > {code} > The same query from an older build (git.commit.id.abbrev=839f8da) > {code} > select * from `lineitem_dic_enc.parquet` limit 1; > +-+++---+-+--+-++---+---+-+---+++-+--+ > | l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity | > l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus | > l_shipdate | l_commitdate | l_receiptdate | l_shipinstruct | > l_shipmode |l_comment | > +-+++---+-+--+-++---+---+-+---+++-+--+ > | 1 | 1552 | 93 | 1 | 17.0| > 24710.35 | 0.04| 0.02 | N | O | > 1996-03-13 | 1996-02-12| 1996-03-22 | DELIVER IN PERSON | TRUCK > | egular courts above the | > +-+++---+-+--+-++---+---+-+---+++-+--+ > {code} > Below is the output of the parquet-meta command for this dataset > {code} > creator: parquet-mr > file schema: root > --- > l_orderkey: REQUIRED INT32 R:0 D:0 > l_partkey: REQUIRED INT32 R:0 D:0 > l_suppkey: REQUIRED INT32 R:0 D:0 > l_linenumber:REQUIRED INT32 R:0 D:0 > l_quantity: REQUIRED DOUBLE R:0 D:0 > l_extendedprice: REQUIRED DOUBLE R:0 D:0 > l_discount: REQUIRED DOUBLE R:0 D:0 > l_tax: REQUIRED DOUBLE R:0 D:0 > l_returnflag:REQUIRED BINARY O:UTF8 R:0 D:0 > l_linestatus:REQUIRED BINARY O:UTF8 R:0 D:0 > l_shipdate: REQUIRED INT32 O:DATE R:0 D:0 > l_commitdate:REQUIRED INT32 O:DATE R:0 D:0 > l_receiptdate: REQUIRED INT32 O:DATE R:0 D:0 > l_shipinstruct: REQUIRED BINARY O:UTF8 R:0 D:0 > l_shipmode: REQUIRED BINARY O:UTF8 R:0 D:0 > l_comment: REQUIRED BINARY O:UTF8 R:0 D:0 > row group 1: RC:60175 TS:3049610 > --
[jira] [Resolved] (DRILL-4048) Parquet reader corrupts dictionary encoded binary columns
[ https://issues.apache.org/jira/browse/DRILL-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau resolved DRILL-4048. --- Resolution: Fixed Fixed in a5a1aa6 > Parquet reader corrupts dictionary encoded binary columns > - > > Key: DRILL-4048 > URL: https://issues.apache.org/jira/browse/DRILL-4048 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.3.0 >Reporter: Rahul Challapalli >Assignee: Jason Altekruse >Priority: Blocker > Attachments: lineitem_dic_enc.parquet > > > git.commit.id.abbrev=04c01bd > The below query returns corrupted data (not even showing up here) for binary > columns > {code} > select * from `lineitem_dic_enc.parquet` limit 1; > +-+++---+-+--+-++---+---+-+---+++-+--+ > | l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity | > l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus | > l_shipdate | l_commitdate | l_receiptdate | l_shipinstruct | > l_shipmode |l_comment | > +-+++---+-+--+-++---+---+-+---+++-+--+ > | 1 | 1552 | 93 | 1 | 17.0| > 24710.35 | 0.04| 0.02 | | | > 1996-03-13 | 1996-02-12| 1996-03-22 | DELIVER IN PE | T | > egular courts above the | > +-+++---+-+--+-++---+---+-+---+++-+--+ > {code} > The same query from an older build (git.commit.id.abbrev=839f8da) > {code} > select * from `lineitem_dic_enc.parquet` limit 1; > +-+++---+-+--+-++---+---+-+---+++-+--+ > | l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity | > l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus | > l_shipdate | l_commitdate | l_receiptdate | l_shipinstruct | > l_shipmode |l_comment | > +-+++---+-+--+-++---+---+-+---+++-+--+ > | 1 | 1552 | 93 | 1 | 17.0| > 24710.35 | 0.04| 0.02 | N | O | > 1996-03-13 | 1996-02-12| 1996-03-22 | DELIVER IN PERSON | TRUCK > | egular courts above the | > +-+++---+-+--+-++---+---+-+---+++-+--+ > {code} > Below is the output of the parquet-meta command for this dataset > {code} > creator: parquet-mr > file schema: root > --- > l_orderkey: REQUIRED INT32 R:0 D:0 > l_partkey: REQUIRED INT32 R:0 D:0 > l_suppkey: REQUIRED INT32 R:0 D:0 > l_linenumber:REQUIRED INT32 R:0 D:0 > l_quantity: REQUIRED DOUBLE R:0 D:0 > l_extendedprice: REQUIRED DOUBLE R:0 D:0 > l_discount: REQUIRED DOUBLE R:0 D:0 > l_tax: REQUIRED DOUBLE R:0 D:0 > l_returnflag:REQUIRED BINARY O:UTF8 R:0 D:0 > l_linestatus:REQUIRED BINARY O:UTF8 R:0 D:0 > l_shipdate: REQUIRED INT32 O:DATE R:0 D:0 > l_commitdate:REQUIRED INT32 O:DATE R:0 D:0 > l_receiptdate: REQUIRED INT32 O:DATE R:0 D:0 > l_shipinstruct: REQUIRED BINARY O:UTF8 R:0 D:0 > l_shipmode: REQUIRED BINARY O:UTF8 R:0 D:0 > l_comment: REQUIRED BINARY O:UTF8 R:0 D:0 > row group 1: RC:60175 TS:3049610 > -