Peter Franzen created DRILL-8492: ------------------------------------ Summary: Allow Parquet TIME_MICROS and TIMESTAMP_MICROS columns to be read as 64-bit integer values Key: DRILL-8492 URL: https://issues.apache.org/jira/browse/DRILL-8492 Project: Apache Drill Issue Type: Improvement Components: Storage - Parquet Affects Versions: 1.21.1 Reporter: Peter Franzen
When reading Parquet columns of type {{time_micros}} and {{{}timestamp_micros{}}}, Drill truncates the microsecond values to milliseconds in order to convert them to SQL timestamps. It is currently not possible to read the original microsecond values (as 64-bit values, not SQL timestamps) through Drill. One solution for allowing reading the original 64-bit values is to add two options similar to “store.parquet.reader.int96_as_timestamp" to control whether microsecond times and timestamps are truncated to millisecond timestamps or read as non-truncated 64-bit values. These options would be added to {{org.apache.drill.exec.ExecConstants}} and {{{}org.apache.drill.exec.server.options.SystemOptionManager{}}}. They would also be added to "drill-module.conf": {{ store.parquet.reader.time_micros_as_int64: false,}} {{ store.parquet.reader.timestamp_micros_as_int64: false,}} These options would then be used in the same places as {{{}store.parquet.reader.int96_as_timestamp{}}}: * org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory * org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter * org.apache.drill.exec.store.parquet2.DrillParquetGroupConverter to create an int64 reader instead of a time/timestamp reader when the correspondning option is set to true. In addition to this, {{org.apache.drill.exec.store.parquet.metadata.FileMetadataCollector }}must be altered to _not_ truncate the min and max values for time_micros/timestamp_micros if the corresponding option is true. This class doesn’t have a reference to an {{{}OptionManager{}}}, so the two new options must be extracted from the {{OptionManager}} when the {{ParquetReaderConfig}} instance is created. Filtering on microsecond columns would be done using 64-bit values rather than TIME/TIMESTAMP values when the new options are true, e.g. {{SELECT * FROM <file> WHERE <timestamp_micros_column> = 1705914906694751;}} -- This message was sent by Atlassian Jira (v8.20.10#820010)