[ 
https://issues.apache.org/jira/browse/DRILL-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974617#comment-14974617
 ] 

ASF GitHub Bot commented on DRILL-3871:
---------------------------------------

GitHub user parthchandra opened a pull request:

    https://github.com/apache/drill/pull/219

    DRILL-3871: Off by one error while reading binary fields with one ter…

    …minal null in parquet.
    
    Changes -
      1) Rewrote the NullableColumnReader.processPages function to process runs 
of Null values and Non-Null values without needing to keeping track of whether 
the previous iteration in the while loop had encountered a null or not. A pair 
of loops now iterates over a run of nulls or a run of non-null values.
      2) Removed some redundant code.
      3) Renamed some variables. The indexInOutputVector is now replaced by two 
local variables, readCount and writeCount only for clarity.
     4) Adding tracing.
     5) Added unit tests for edge cases of nulls occurring on page boundaries. 
    
    For all the unit tests, tpch-h and tpc-ds test data sets, the state of the 
NullableColumnReader at the end of each iteration of processPages is identical 
to the old code. In addition the boundary conditions are taken care of.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/parthchandra/incubator-drill DRILL-3871

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/219.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #219
    
----
commit d23ceb2a4c32da9535f1e482c4c70fcc31b8b2b8
Author: Parth Chandra <par...@apache.org>
Date:   2015-10-05T17:25:56Z

    DRILL-3871: Off by one error while reading binary fields with one terminal 
null in parquet.

----


> Exception on inner join when join predicate is int96 field generated by impala
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-3871
>                 URL: https://issues.apache.org/jira/browse/DRILL-3871
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 1.2.0
>            Reporter: Victoria Markman
>            Assignee: Parth Chandra
>            Priority: Critical
>              Labels: int96
>             Fix For: 1.3.0
>
>         Attachments: tables.tar
>
>
> Both tables in the join where created by impala, with column c_timestamp 
> being parquet int96. 
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . >         max(t1.c_timestamp),
> . . . . . . . . . . . . >         min(t1.c_timestamp),
> . . . . . . . . . . . . >         count(t1.c_timestamp)
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . >         imp_t1 t1
> . . . . . . . . . . . . >                 inner join
> . . . . . . . . . . . . >         imp_t2 t2
> . . . . . . . . . . . . > on      (t1.c_timestamp = t2.c_timestamp)
> . . . . . . . . . . . . > ;
> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: 
> TProtocolException: Required field 'uncompressed_page_size' was not found in 
> serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0, 
> compressed_page_size:0)
> Fragment 0:0
> [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010]
>         at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
>         at 
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
>         at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
>         at sqlline.SqlLine.print(SqlLine.java:1583)
>         at sqlline.Commands.execute(Commands.java:852)
>         at sqlline.Commands.sql(Commands.java:751)
>         at sqlline.SqlLine.dispatch(SqlLine.java:738)
>         at sqlline.SqlLine.begin(SqlLine.java:612)
>         at sqlline.SqlLine.start(SqlLine.java:366)
>         at sqlline.SqlLine.main(SqlLine.java:259)
> {code}
> drillbit.log
> {code}
> 2015-09-30 21:15:45,710 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Took 0 ms to get file statuses
> 2015-09-30 21:15:45,712 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 
> 1 using 1 threads. Time: 1ms total, 1.645381ms avg, 1ms max.
> 2015-09-30 21:15:45,712 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 
> 1 using 1 threads. Earliest start: 1.332000 μs, Latest start: 1.332000 μs, 
> Average start: 1.332000 μs .
> 2015-09-30 21:15:45,830 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested 
> AWAITING_ALLOCATION --> RUNNING
> 2015-09-30 21:15:45,830 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  
> o.a.d.e.w.f.FragmentStatusReporter - 
> 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State to report: RUNNING
> 2015-09-30 21:15:45,925 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested RUNNING --> 
> FAILED
> 2015-09-30 21:15:45,930 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested FAILED --> 
> FINISHED
> 2015-09-30 21:15:45,931 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: TProtocolException: 
> Required field 'uncompressed_page_size' was not found in serialized data! 
> Struct: PageHeader(type:null, uncompressed_page_size:0, 
> compressed_page_size:0)
> Fragment 0:0
> [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> TProtocolException: Required field 'uncompressed_page_size' was not found in 
> serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0, 
> compressed_page_size:0)
> Fragment 0:0
> [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010]
>         at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:323)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:178)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in 
> parquet record reader.
> Message:
> Hadoop path: 
> /drill/testdata/subqueries/imp_t2/bf4261140dac8d45-814d66b86bf960b8_853027779_data.0.parq
> Total records read: 10
> Mock records read: 0
> Records to read: 1
> Row group index: 0
> Records in row group: 10
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema {
>   optional binary c_varchar (UTF8);
>   optional int32 c_integer;
>   optional int64 c_bigint;
>   optional float c_float;
>   optional double c_double;
>   optional binary c_date (UTF8);
>   optional binary c_time (UTF8);
>   optional int96 c_timestamp;
>   optional boolean c_boolean;
>   optional double d9;
>   optional double d18;
>   optional double d28;
>   optional double d38;
> }
> , metadata: {}}, blocks: [BlockMetaData{10, 1507 [ColumnMetaData{SNAPPY 
> [c_varchar] BINARY  [PLAIN, PLAIN_DICTIONARY, RLE], 173}, 
> ColumnMetaData{SNAPPY [c_integer] INT32  [PLAIN, PLAIN_DICTIONARY, RLE], 
> 299}, ColumnMetaData{SNAPPY [c_bigint] INT64  [PLAIN, PLAIN_DICTIONARY, RLE], 
> 453}, ColumnMetaData{SNAPPY [c_float] FLOAT  [PLAIN, PLAIN_DICTIONARY, RLE], 
> 581}, ColumnMetaData{SNAPPY [c_double] DOUBLE  [PLAIN, PLAIN_DICTIONARY, 
> RLE], 747}, ColumnMetaData{SNAPPY [c_date] BINARY  [PLAIN, PLAIN_DICTIONARY, 
> RLE], 900}, ColumnMetaData{SNAPPY [c_time] BINARY  [PLAIN, PLAIN_DICTIONARY, 
> RLE], 1045}, ColumnMetaData{SNAPPY [c_timestamp] INT96  [PLAIN, 
> PLAIN_DICTIONARY, RLE], 1213}, ColumnMetaData{SNAPPY [c_boolean] BOOLEAN  
> [PLAIN, PLAIN_DICTIONARY, RLE], 1293}, ColumnMetaData{SNAPPY [d9] DOUBLE  
> [PLAIN, PLAIN_DICTIONARY, RLE], 1448}, ColumnMetaData{SNAPPY [d18] DOUBLE  
> [PLAIN, PLAIN_DICTIONARY, RLE], 1609}, ColumnMetaData{SNAPPY [d28] DOUBLE  
> [PLAIN, PLAIN_DICTIONARY, RLE], 1771}, ColumnMetaData{SNAPPY [d38] DOUBLE  
> [PLAIN, PLAIN_DICTIONARY, RLE], 1933}]}]}
>         at 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise(ParquetRecordReader.java:346)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:448)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:183) 
> ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase(HashJoinBatch.java:403)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:218)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext(StreamingAggBatch.java:136)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83) 
> ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:80)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73) 
> ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:258)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:252)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at java.security.AccessController.doPrivileged(Native Method) 
> ~[na:1.7.0_71]
>         at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_71]
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
>  ~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:252)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         ... 4 common frames omitted
> Caused by: java.io.IOException: can not read class parquet.format.PageHeader: 
> Required field 'uncompressed_page_size' was not found in serialized data! 
> Struct: PageHeader(type:null, uncompressed_page_size:0, 
> compressed_page_size:0)
>         at parquet.format.Util.read(Util.java:50) 
> ~[parquet-format-2.1.1-drill-r1.jar:na]
>         at parquet.format.Util.readPageHeader(Util.java:26) 
> ~[parquet-format-2.1.1-drill-r1.jar:na]
>         at 
> org.apache.drill.exec.store.parquet.ColumnDataReader.readPageHeader(ColumnDataReader.java:46)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.store.parquet.columnreaders.PageReader.next(PageReader.java:191)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.store.parquet.columnreaders.NullableColumnReader.processPages(NullableColumnReader.java:76)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.readAllFixedFields(ParquetRecordReader.java:387)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:430)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         ... 43 common frames omitted
> Caused by: parquet.org.apache.thrift.protocol.TProtocolException: Required 
> field 'uncompressed_page_size' was not found in serialized data! Struct: 
> PageHeader(type:null, uncompressed_page_size:0, compressed_page_size:0)
>         at parquet.format.PageHeader.read(PageHeader.java:905) 
> ~[parquet-format-2.1.1-drill-r1.jar:na]
>         at parquet.format.Util.read(Util.java:47) 
> ~[parquet-format-2.1.1-drill-r1.jar:na]
>         ... 49 common frames omitted
> 2015-09-30 21:15:45,951 [BitServer-4] WARN  
> o.a.drill.exec.work.foreman.Foreman - Dropping request to move to COMPLETED 
> state as query is already at FAILED state (which is terminal).
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to