This is an automated email from the ASF dual-hosted git repository.
comphead pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git
The following commit(s) were added to refs/heads/main by this push:
new deaec6f92 docs: document datetime rebasing and V2 API limitations for
DataFusion-based scans (#3259)
deaec6f92 is described below
commit deaec6f9271cb16b020bcf70d144465bec5b780c
Author: Andy Grove <[email protected]>
AuthorDate: Sun Jan 25 15:36:53 2026 -0700
docs: document datetime rebasing and V2 API limitations for
DataFusion-based scans (#3259)
Add two new limitations to the shared limitations section for
native_datafusion and native_iceberg_compat scan implementations:
1. No support for datetime rebasing detection or the
spark.comet.exceptionOnDatetimeRebase configuration. When reading
Parquet files with dates/timestamps written before Spark 3.0
(hybrid Julian/Gregorian calendar), these implementations cannot
detect legacy values and may produce incorrect results for dates
before October 15, 1582.
2. No support for Spark's Datasource V2 API. When V2 is enabled,
Comet falls back to native_comet.
Co-authored-by: Claude Opus 4.5 <[email protected]>
---
docs/source/contributor-guide/parquet_scans.md | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/docs/source/contributor-guide/parquet_scans.md
b/docs/source/contributor-guide/parquet_scans.md
index 3dcb78e87..e6e2d09dd 100644
--- a/docs/source/contributor-guide/parquet_scans.md
+++ b/docs/source/contributor-guide/parquet_scans.md
@@ -50,6 +50,15 @@ The `native_datafusion` and `native_iceberg_compat` scans
share the following li
`spark.comet.scan.unsignedSmallIntSafetyCheck=false`. Note that `ByteType`
columns are always safe because they can
only come from signed `INT8`, where truncation preserves the signed value.
- No support for default values that are nested types (e.g., maps, arrays,
structs). Literal default values are supported.
+- No support for datetime rebasing detection or the
`spark.comet.exceptionOnDatetimeRebase` configuration. When reading
+ Parquet files containing dates or timestamps written before Spark 3.0 (which
used a hybrid Julian/Gregorian calendar),
+ the `native_comet` implementation can detect these legacy values and either
throw an exception or read them without
+ rebasing. The DataFusion-based implementations do not have this detection
capability and will read all dates/timestamps
+ as if they were written using the Proleptic Gregorian calendar. This may
produce incorrect results for dates before
+ October 15, 1582.
+- No support for Spark's Datasource V2 API. When
`spark.sql.sources.useV1SourceList` does not include `parquet`,
+ Spark uses the V2 API for Parquet scans. The DataFusion-based
implementations only support the V1 API, so Comet
+ will fall back to `native_comet` when V2 is enabled.
The `native_datafusion` scan has some additional limitations:
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]