[ 
https://issues.apache.org/jira/browse/DRILL-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847324#comment-17847324
 ] 

ASF GitHub Bot commented on DRILL-8492:
---------------------------------------

jnturton commented on PR #2907:
URL: https://github.com/apache/drill/pull/2907#issuecomment-2117674516

   It's always bugged me that we don't have a globally accessible way of 
accessing at least one of DrillbitContext, QueryContext, FragmentContext or 
just OptionManager. We hardly want to have to spray these things through APIs 
everywhere in Drill. I'll take a look at whether something can be done...




> Allow Parquet TIME_MICROS and TIMESTAMP_MICROS  columns to be read as 64-bit 
> integer values
> -------------------------------------------------------------------------------------------
>
>                 Key: DRILL-8492
>                 URL: https://issues.apache.org/jira/browse/DRILL-8492
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>    Affects Versions: 1.21.1
>            Reporter: Peter Franzen
>            Priority: Major
>
> When reading Parquet columns of type {{time_micros}} and 
> {{{}timestamp_micros{}}}, Drill truncates the microsecond values to 
> milliseconds in order to convert them to SQL timestamps.
> It is currently not possible to read the original microsecond values (as 
> 64-bit values, not SQL timestamps) through Drill.
> One solution for allowing reading the original 64-bit values is to add two 
> options similar to “store.parquet.reader.int96_as_timestamp" to control 
> whether microsecond
> times and timestamps are truncated to millisecond timestamps or read as 
> non-truncated 64-bit values.
> These options would be added to {{org.apache.drill.exec.ExecConstants}} and
> {{{}org.apache.drill.exec.server.options.SystemOptionManager{}}}.
> They would also be added to "drill-module.conf":
> {{   store.parquet.reader.time_micros_as_int64: false,}}
> {{   store.parquet.reader.timestamp_micros_as_int64: false,}}
> These options would then be used in the same places as 
> {{{}store.parquet.reader.int96_as_timestamp{}}}:
>  * org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory
>  * 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter
>  * org.apache.drill.exec.store.parquet2.DrillParquetGroupConverter
> to create an int64 reader instead of a time/timestamp reader when the 
> correspondning option is set to true.
> In addition to this, 
> {{org.apache.drill.exec.store.parquet.metadata.FileMetadataCollector }}must 
> be altered to _not_ truncate the min and max values for 
> time_micros/timestamp_micros if the corresponding option is true. This class 
> doesn’t have a reference to an {{{}OptionManager{}}}, so the two new options 
> must be extracted from the {{OptionManager}} when the {{ParquetReaderConfig}} 
> instance is created.
> Filtering on microsecond columns would be done using 64-bit values rather 
> than TIME/TIMESTAMP values when the new options are true, e.g.
> {{SELECT *  FROM <file> WHERE <timestamp_micros_column> = 1705914906694751;}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to