[
https://issues.apache.org/jira/browse/DRILL-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492620#comment-17492620
]
ASF GitHub Bot commented on DRILL-8134:
---------------------------------------
jnturton opened a new pull request #2460:
URL: https://github.com/apache/drill/pull/2460
# [DRILL-8134](https://issues.apache.org/jira/browse/DRILL-8134): Cannot
query Parquet INT96 columns as timestamps
## Description
As of Drill 1.19 some Parquet readers column contained code to position the
value vector write buffer index after a read pass, while some did not. The
Parquet v2 PR added write buffer positioning to the cases that were missing it,
but failed to cater for the fact that INT96 timestamps are downcast to 64 bit
timestamps. This PR removes all of this write buffer positioning (and
mispositioning) since testing indicates that Drill's value vector write paths
advance the write buffer index to correct place already.
## Documentation
N/A
## Testing
ParquetTestWriter#testSparkParquetBinaryAsTimeStamp_DictChange
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Regression: cannot query Parquet INT96 columns as timestamps
> ------------------------------------------------------------
>
> Key: DRILL-8134
> URL: https://issues.apache.org/jira/browse/DRILL-8134
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Affects Versions: 1.20.0
> Reporter: James Turton
> Assignee: James Turton
> Priority: Blocker
> Labels: Regression
> Fix For: 1.20.0
>
> Attachments: result.tar.gz
>
>
> Set store.parquet.reader.int96_as_timestamp = true and then query a file with
> an INT96 timestamp such as in the attachment. INT96 columns get downcast to
> 64 bit timestamps, a fact that is ignored by some buggy new write buffer
> index positioning code that was merged in the 1.20 dev cycle.
> [^result.tar.gz]
>
> {code:java}
> Caused by: java.lang.NullPointerException:
> at
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:234)
> at
> org.apache.drill.exec.physical.impl.ScanBatch.internalNext(ScanBatch.java:234)
> at
> org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:298)
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:111)
> at
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59)
> at
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:85)
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:170)
> at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:103)
> at
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
> at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93)
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.lambda$run$0(FragmentExecutor.java:321)
> at .......(:0)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1926)
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:310)
> at
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)