[ https://issues.apache.org/jira/browse/DRILL-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845701#comment-17845701 ]
ASF GitHub Bot commented on DRILL-8492: --------------------------------------- handmadecode commented on PR #2907: URL: https://github.com/apache/drill/pull/2907#issuecomment-2106221734 > Awesome work. I can backport this too because you've left default behaviour unchanged (and it's self contained). My only question is about ParquetReaderConfig > Allow Parquet TIME_MICROS and TIMESTAMP_MICROS columns to be read as 64-bit > integer values > ------------------------------------------------------------------------------------------- > > Key: DRILL-8492 > URL: https://issues.apache.org/jira/browse/DRILL-8492 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet > Affects Versions: 1.21.1 > Reporter: Peter Franzen > Priority: Major > > When reading Parquet columns of type {{time_micros}} and > {{{}timestamp_micros{}}}, Drill truncates the microsecond values to > milliseconds in order to convert them to SQL timestamps. > It is currently not possible to read the original microsecond values (as > 64-bit values, not SQL timestamps) through Drill. > One solution for allowing reading the original 64-bit values is to add two > options similar to “store.parquet.reader.int96_as_timestamp" to control > whether microsecond > times and timestamps are truncated to millisecond timestamps or read as > non-truncated 64-bit values. > These options would be added to {{org.apache.drill.exec.ExecConstants}} and > {{{}org.apache.drill.exec.server.options.SystemOptionManager{}}}. > They would also be added to "drill-module.conf": > {{ store.parquet.reader.time_micros_as_int64: false,}} > {{ store.parquet.reader.timestamp_micros_as_int64: false,}} > These options would then be used in the same places as > {{{}store.parquet.reader.int96_as_timestamp{}}}: > * org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory > * > org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter > * org.apache.drill.exec.store.parquet2.DrillParquetGroupConverter > to create an int64 reader instead of a time/timestamp reader when the > correspondning option is set to true. > In addition to this, > {{org.apache.drill.exec.store.parquet.metadata.FileMetadataCollector }}must > be altered to _not_ truncate the min and max values for > time_micros/timestamp_micros if the corresponding option is true. This class > doesn’t have a reference to an {{{}OptionManager{}}}, so the two new options > must be extracted from the {{OptionManager}} when the {{ParquetReaderConfig}} > instance is created. > Filtering on microsecond columns would be done using 64-bit values rather > than TIME/TIMESTAMP values when the new options are true, e.g. > {{SELECT * FROM <file> WHERE <timestamp_micros_column> = 1705914906694751;}} -- This message was sent by Atlassian Jira (v8.20.10#820010)