[ 
https://issues.apache.org/jira/browse/DRILL-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844315#comment-17844315
 ] 

ASF GitHub Bot commented on DRILL-8492:
---------------------------------------

handmadecode commented on PR #2907:
URL: https://github.com/apache/drill/pull/2907#issuecomment-2098543073

   > LGTM +1 Thanks for the contribution! Can you please update the 
documentation for the Parquet reader to include this? Otherwise looks good!
   
   Happy to contribute!
   
   Do you mean the documentation in the `drill-site` repo? 
(https://github.com/apache/drill-site/blob/master/_docs/en/data-sources-and-file-formats/040-parquet-format.md)




> Allow Parquet TIME_MICROS and TIMESTAMP_MICROS  columns to be read as 64-bit 
> integer values
> -------------------------------------------------------------------------------------------
>
>                 Key: DRILL-8492
>                 URL: https://issues.apache.org/jira/browse/DRILL-8492
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>    Affects Versions: 1.21.1
>            Reporter: Peter Franzen
>            Priority: Major
>
> When reading Parquet columns of type {{time_micros}} and 
> {{{}timestamp_micros{}}}, Drill truncates the microsecond values to 
> milliseconds in order to convert them to SQL timestamps.
> It is currently not possible to read the original microsecond values (as 
> 64-bit values, not SQL timestamps) through Drill.
> One solution for allowing reading the original 64-bit values is to add two 
> options similar to “store.parquet.reader.int96_as_timestamp" to control 
> whether microsecond
> times and timestamps are truncated to millisecond timestamps or read as 
> non-truncated 64-bit values.
> These options would be added to {{org.apache.drill.exec.ExecConstants}} and
> {{{}org.apache.drill.exec.server.options.SystemOptionManager{}}}.
> They would also be added to "drill-module.conf":
> {{   store.parquet.reader.time_micros_as_int64: false,}}
> {{   store.parquet.reader.timestamp_micros_as_int64: false,}}
> These options would then be used in the same places as 
> {{{}store.parquet.reader.int96_as_timestamp{}}}:
>  * org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory
>  * 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter
>  * org.apache.drill.exec.store.parquet2.DrillParquetGroupConverter
> to create an int64 reader instead of a time/timestamp reader when the 
> correspondning option is set to true.
> In addition to this, 
> {{org.apache.drill.exec.store.parquet.metadata.FileMetadataCollector }}must 
> be altered to _not_ truncate the min and max values for 
> time_micros/timestamp_micros if the corresponding option is true. This class 
> doesn’t have a reference to an {{{}OptionManager{}}}, so the two new options 
> must be extracted from the {{OptionManager}} when the {{ParquetReaderConfig}} 
> instance is created.
> Filtering on microsecond columns would be done using 64-bit values rather 
> than TIME/TIMESTAMP values when the new options are true, e.g.
> {{SELECT *  FROM <file> WHERE <timestamp_micros_column> = 1705914906694751;}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to