(datafusion-comet) branch main updated: docs: document datetime rebasing and V2 API limitations for DataFusion-based scans (#3259)

comphead Sun, 25 Jan 2026 14:37:23 -0800

This is an automated email from the ASF dual-hosted git repository.

comphead pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git



The following commit(s) were added to refs/heads/main by this push:
     new deaec6f92 docs: document datetime rebasing and V2 API limitations for 
DataFusion-based scans (#3259)
deaec6f92 is described below

commit deaec6f9271cb16b020bcf70d144465bec5b780c
Author: Andy Grove <[email protected]>
AuthorDate: Sun Jan 25 15:36:53 2026 -0700

    docs: document datetime rebasing and V2 API limitations for 
DataFusion-based scans (#3259)
    
    Add two new limitations to the shared limitations section for
    native_datafusion and native_iceberg_compat scan implementations:
    
    1. No support for datetime rebasing detection or the
       spark.comet.exceptionOnDatetimeRebase configuration. When reading
       Parquet files with dates/timestamps written before Spark 3.0
       (hybrid Julian/Gregorian calendar), these implementations cannot
       detect legacy values and may produce incorrect results for dates
       before October 15, 1582.
    
    2. No support for Spark's Datasource V2 API. When V2 is enabled,
       Comet falls back to native_comet.
    
    Co-authored-by: Claude Opus 4.5 <[email protected]>
---
 docs/source/contributor-guide/parquet_scans.md | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/docs/source/contributor-guide/parquet_scans.md 
b/docs/source/contributor-guide/parquet_scans.md
index 3dcb78e87..e6e2d09dd 100644
--- a/docs/source/contributor-guide/parquet_scans.md
+++ b/docs/source/contributor-guide/parquet_scans.md
@@ -50,6 +50,15 @@ The `native_datafusion` and `native_iceberg_compat` scans 
share the following li
   `spark.comet.scan.unsignedSmallIntSafetyCheck=false`. Note that `ByteType` 
columns are always safe because they can
   only come from signed `INT8`, where truncation preserves the signed value.
 - No support for default values that are nested types (e.g., maps, arrays, 
structs). Literal default values are supported.
+- No support for datetime rebasing detection or the 
`spark.comet.exceptionOnDatetimeRebase` configuration. When reading
+  Parquet files containing dates or timestamps written before Spark 3.0 (which 
used a hybrid Julian/Gregorian calendar),
+  the `native_comet` implementation can detect these legacy values and either 
throw an exception or read them without
+  rebasing. The DataFusion-based implementations do not have this detection 
capability and will read all dates/timestamps
+  as if they were written using the Proleptic Gregorian calendar. This may 
produce incorrect results for dates before
+  October 15, 1582.
+- No support for Spark's Datasource V2 API. When 
`spark.sql.sources.useV1SourceList` does not include `parquet`,
+  Spark uses the V2 API for Parquet scans. The DataFusion-based 
implementations only support the V1 API, so Comet
+  will fall back to `native_comet` when V2 is enabled.
 
 The `native_datafusion` scan has some additional limitations:
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(datafusion-comet) branch main updated: docs: document datetime rebasing and V2 API limitations for DataFusion-based scans (#3259)

Reply via email to