andygrove opened a new pull request, #3259:
URL: https://github.com/apache/datafusion-comet/pull/3259

   ## Summary
   
   - Document that `native_datafusion` and `native_iceberg_compat` do not 
support datetime rebasing detection
   - Document that these implementations do not support Spark's Datasource V2 
API
   
   ## Background
   
   While investigating `ParquetDatetimeRebaseSuite` tests that explicitly set 
`native_comet`, we discovered these are intentional limitations of the 
DataFusion-based scan implementations, not test issues.
   
   ### Datetime Rebasing
   
   Parquet files written before Spark 3.0 may contain dates/timestamps using 
the hybrid Julian/Gregorian calendar. The `native_comet` implementation:
   - Detects legacy datetime metadata in Parquet files
   - Can throw `SparkException` when 
`spark.comet.exceptionOnDatetimeRebase=true`
   - Or reads values without rebasing (CORRECTED mode)
   
   The DataFusion-based implementations (`native_datafusion`, 
`native_iceberg_compat`) do not have this detection capability and read all 
dates/timestamps as Proleptic Gregorian, which may produce incorrect results 
for dates before October 15, 1582.
   
   ### Datasource V2 API
   
   The DataFusion-based implementations only support Spark's V1 datasource API. 
When `spark.sql.sources.useV1SourceList` does not include `parquet`, Comet 
falls back to `native_comet`.
   
   ## Test plan
   
   - [x] Documentation only change
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to