[
https://issues.apache.org/jira/browse/DRILL-8423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17711761#comment-17711761
]
Peter Franzen commented on DRILL-8423:
--------------------------------------
The problem is cause by the column values being read as 32-bit values, not
64-bit values, in
{code:java}
org.apache.drill.exec.store.parquet.columnreaders.ParquetFixedWidthDictionaryReaders.DictionaryTimeMicrosReader::readField
(long)
{code}
line 171:
{code:java}
valueVec.getMutator().setSafe(valuesReadInCurrentPass + i,
valReader.readInteger() / 1000);
{code}
and line 176:
{code:java}
int value = pageReader.pageData.getInt((int) readStartInBytes + i *
dataTypeLengthInBytes);
{code}
The bug is also present in
{code:java}
org.apache.drill.exec.store.parquet.columnreaders.NullableFixedByteAlignedReaders.NullableDictionaryTimeMicrosReader::readField(long)
{code}
The problem should be fixed by using the same read logic as for
TIMESTAMP_MICROS in {{{}DictionaryTimeStampMicrosReader{}}}.
> Parquet TIME_MICROS columns with values > Integer.MAX_VALUE are not displayed
> correctly
> ---------------------------------------------------------------------------------------
>
> Key: DRILL-8423
> URL: https://issues.apache.org/jira/browse/DRILL-8423
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Affects Versions: 1.20.3
> Reporter: Peter Franzen
> Priority: Major
>
> Assume a parquet file in a directory "Test" with a column _timeCol_ having
> the type {{{}org.apache.parquet.schema.OriginalType.TIME_MICROS{}}}.
> Assume there are two records with the values 2147483647 and 2147483648,
> respectively, in that column (i.e. the times 00:35:47.483647 and
> 00:35:47.483648).
> Executing the query
> {code:java}
> SELECT timeCol FROM dfs.Test;{code}
> produces the result
> {code:java}
> timeCol
> -------
> 00:35:47.483
> 23:24:12.517{code}
> i.e. the microsecond value of Integer.MAX_VALUE + 1 has wrapped around when
> read from the parquet file (it is displayed as the same number of
> milliseconds before midnight as the time represented by Integer.MAX_VALUE is
> after midnight)
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)