[jira] [Commented] (DRILL-8134) Regression: cannot query Parquet INT96 columns as timestamps

ASF GitHub Bot (Jira) Tue, 15 Feb 2022 05:52:04 -0800


    [ 
https://issues.apache.org/jira/browse/DRILL-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492620#comment-17492620
 ]


ASF GitHub Bot commented on DRILL-8134:
---------------------------------------

jnturton opened a new pull request #2460:
URL: https://github.com/apache/drill/pull/2460


   # [DRILL-8134](https://issues.apache.org/jira/browse/DRILL-8134): Cannot 
query Parquet INT96 columns as timestamps
   
   ## Description
   
   As of Drill 1.19 some Parquet readers column contained code to position the 
value vector write buffer index after a read pass, while some did not.  The 
Parquet v2 PR added write buffer positioning to the cases that were missing it, 
but failed to cater for the fact that INT96 timestamps are downcast to 64 bit 
timestamps.  This PR removes all of this write buffer positioning (and 
mispositioning) since testing indicates that Drill's value vector write paths 
advance the write buffer index to correct place already.
    
   ## Documentation
   N/A
   
   ## Testing
   ParquetTestWriter#testSparkParquetBinaryAsTimeStamp_DictChange
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Regression: cannot query Parquet INT96 columns as timestamps
> ------------------------------------------------------------
>
>                 Key: DRILL-8134
>                 URL: https://issues.apache.org/jira/browse/DRILL-8134
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.20.0
>            Reporter: James Turton
>            Assignee: James Turton
>            Priority: Blocker
>              Labels: Regression
>             Fix For: 1.20.0
>
>         Attachments: result.tar.gz
>
>
> Set store.parquet.reader.int96_as_timestamp = true and then query a file with 
> an INT96 timestamp such as in the attachment.  INT96 columns get downcast to 
> 64 bit timestamps, a fact that is ignored by some buggy new write buffer 
> index positioning code that was merged in the 1.20 dev cycle.
> [^result.tar.gz]
>  
> {code:java}
> Caused by: java.lang.NullPointerException:
>         at 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:234)
>         at 
> org.apache.drill.exec.physical.impl.ScanBatch.internalNext(ScanBatch.java:234)
>         at 
> org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:298)
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:111)
>         at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59)
>         at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:85)
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:170)
>         at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:103)
>         at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
>         at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.lambda$run$0(FragmentExecutor.java:321)
>         at .......(:0)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1926)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:310)
>         at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (DRILL-8134) Regression: cannot query Parquet INT96 columns as timestamps

Reply via email to