[ 
https://issues.apache.org/jira/browse/DRILL-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845816#comment-17845816
 ] 

ASF GitHub Bot commented on DRILL-8492:
---------------------------------------

jnturton commented on PR #2907:
URL: https://github.com/apache/drill/pull/2907#issuecomment-2106890548

   > However, I could very well have overlooked some way to access the global 
configuration, and I'd be grateful for any pointers.
   
   We should existing find examples in the Parquet format plugin. E.g. the 
"old" reader is affected by the option store.parquet.reader.pagereader.async.




> Allow Parquet TIME_MICROS and TIMESTAMP_MICROS  columns to be read as 64-bit 
> integer values
> -------------------------------------------------------------------------------------------
>
>                 Key: DRILL-8492
>                 URL: https://issues.apache.org/jira/browse/DRILL-8492
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>    Affects Versions: 1.21.1
>            Reporter: Peter Franzen
>            Priority: Major
>
> When reading Parquet columns of type {{time_micros}} and 
> {{{}timestamp_micros{}}}, Drill truncates the microsecond values to 
> milliseconds in order to convert them to SQL timestamps.
> It is currently not possible to read the original microsecond values (as 
> 64-bit values, not SQL timestamps) through Drill.
> One solution for allowing reading the original 64-bit values is to add two 
> options similar to “store.parquet.reader.int96_as_timestamp" to control 
> whether microsecond
> times and timestamps are truncated to millisecond timestamps or read as 
> non-truncated 64-bit values.
> These options would be added to {{org.apache.drill.exec.ExecConstants}} and
> {{{}org.apache.drill.exec.server.options.SystemOptionManager{}}}.
> They would also be added to "drill-module.conf":
> {{   store.parquet.reader.time_micros_as_int64: false,}}
> {{   store.parquet.reader.timestamp_micros_as_int64: false,}}
> These options would then be used in the same places as 
> {{{}store.parquet.reader.int96_as_timestamp{}}}:
>  * org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory
>  * 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter
>  * org.apache.drill.exec.store.parquet2.DrillParquetGroupConverter
> to create an int64 reader instead of a time/timestamp reader when the 
> correspondning option is set to true.
> In addition to this, 
> {{org.apache.drill.exec.store.parquet.metadata.FileMetadataCollector }}must 
> be altered to _not_ truncate the min and max values for 
> time_micros/timestamp_micros if the corresponding option is true. This class 
> doesn’t have a reference to an {{{}OptionManager{}}}, so the two new options 
> must be extracted from the {{OptionManager}} when the {{ParquetReaderConfig}} 
> instance is created.
> Filtering on microsecond columns would be done using 64-bit values rather 
> than TIME/TIMESTAMP values when the new options are true, e.g.
> {{SELECT *  FROM <file> WHERE <timestamp_micros_column> = 1705914906694751;}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to