handmadecode opened a new issue, #2788:
URL: https://github.com/apache/drill/issues/2788
**Describe the bug**
We are using Drill with parquet files where the timestamp columns are in
microseconds. When those columns are displayed, Drill converts the microsecond
values to milliseconds. However, when using a timestamp column in WHERE clauses
it looks like the original microsecond value is used instead of the adjusted
millisecond value when filtering records.
**To Reproduce**
Assume a parquet file a directory "Test" with a column _timestampCol_ having
the type `org.apache.parquet.schema.OriginalType.TIMESTAMP_MICROS`.
Assume there are two records with the values 1673981999806149 and
1674759597743552, respectively, in that column (i.e. the UTC dates
2023-01-17T18:59:59.806149 and 2023-01-26T18:59:57.743552)
1. Execute the query
`SELECT timestampCol FROM dfs.Test;`
The result includes both records, as expected.
2. Execute the query
`SELECT timestampCol FROM dfs.Test WHERE timestampCol <
TO_TIMESTAMP('2023-02-01 00:00:00', 'yyyy-MM-dd HH:mm:ss')`
This produces an empty result although both records have a value less than
the argument.
3. Execute
`SELECT timestampCol FROM dfs.Test WHERE timestampCol >
TO_TIMESTAMP('2023-02-01 00:00:00', 'yyyy-MM-dd HH:mm:ss')`
The result includes both records although neither have a value greater than
the argument.
**Expected behavior**
The query in 2) above should produce a result with both records, and the
query in 3) should produce an empty result.
**Error detail, log output or screenshots**
No errors in the logs
**Drill version**
1.21.0 and master (as of commit 9c401c6).
**Additional context**
Even timestamps long into the future produce results with both records, e.g.:
`SELECT timestampCol FROM dfs.Test WHERE timestampCol >
TO_TIMESTAMP('2502-04-04 00:00:00', 'yyyy-MM-dd HH:mm:ss')`
Manually converting the timestamp column to milliseconds produces the
expected result:
`SELECT timestampCol FROM dfs.Test WHERE
TO_TIMESTAMP(CONVERT_FROM(CONVERT_TO(timestampCol, 'TIMESTAMP_EPOCH'),
'BIGINT')/1000) < TO_TIMESTAMP('2023-02-01 00:00:00', 'yyyy-MM-dd HH:mm:ss')`
produces a result with both records.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]