parthchandra commented on code in PR #15537:
URL: https://github.com/apache/datafusion/pull/15537#discussion_r2042746875
##########
datafusion/sqllogictest/test_files/information_schema.slt:
##########
@@ -296,6 +297,7 @@ datafusion.execution.parquet.bloom_filter_fpp NULL
(writing) Sets bloom filter f
datafusion.execution.parquet.bloom_filter_ndv NULL (writing) Sets bloom filter
number of distinct values. If NULL, uses default parquet writer setting
datafusion.execution.parquet.bloom_filter_on_read true (writing) Use any
available bloom filters when reading parquet files
datafusion.execution.parquet.bloom_filter_on_write false (writing) Write bloom
filters for all columns when creating parquet files
+datafusion.execution.parquet.coerce_int96 NULL (reading) If true, parquet
reader will read columns of physical type int96 as originating from a different
resolution than nanosecond. This is useful for systems like Spark which stores
its 64-bit timestamps as microsecond resolution, so it can write values with a
larger date range than 64-bit timestamps with nanosecond resolution.
Review Comment:
```suggestion
datafusion.execution.parquet.coerce_int96 NULL (reading) If true, parquet
reader will read columns of physical type int96 as originating from a different
resolution than nanosecond. This is useful for reading data from systems like
Spark which stores microsecond resolution timestamps in an int96 allowing it to
write values with a larger date range than 64-bit timestamps with nanosecond
resolution.
```
##########
datafusion/datasource-parquet/src/source.rs:
##########
@@ -438,6 +438,22 @@ impl ParquetSource {
}
}
+/// Parses datafusion.common.config.ParquetOptions.coerce_int96 String to a
arrow_schema.datatype.TimeUnit
+fn parse_coerce_int96_string(str_setting: &str) ->
datafusion_common::Result<TimeUnit> {
+ let str_setting_lower: &str = &str_setting.to_lowercase();
+
+ match str_setting_lower {
+ "ns" => Ok(TimeUnit::Nanosecond),
+ "us" => Ok(TimeUnit::Microsecond),
+ "ms" => Ok(TimeUnit::Millisecond),
+ "s" => Ok(TimeUnit::Second),
Review Comment:
nit: Do we really need the `millis` and `secs` resolutions?
##########
datafusion/sqllogictest/test_files/information_schema.slt:
##########
@@ -296,6 +297,7 @@ datafusion.execution.parquet.bloom_filter_fpp NULL
(writing) Sets bloom filter f
datafusion.execution.parquet.bloom_filter_ndv NULL (writing) Sets bloom filter
number of distinct values. If NULL, uses default parquet writer setting
datafusion.execution.parquet.bloom_filter_on_read true (writing) Use any
available bloom filters when reading parquet files
datafusion.execution.parquet.bloom_filter_on_write false (writing) Write bloom
filters for all columns when creating parquet files
+datafusion.execution.parquet.coerce_int96 NULL (reading) If true, parquet
reader will read columns of physical type int96 as originating from a different
resolution than nanosecond. This is useful for systems like Spark which stores
its 64-bit timestamps as microsecond resolution, so it can write values with a
larger date range than 64-bit timestamps with nanosecond resolution.
Review Comment:
nit: This was sort of confusing for me. Don't know if my suggestion is
clearer though.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]