(datafusion-comet) branch asf-site updated: Publish built docs triggered by b8d8fbe047adb34c574a7e8a17f28356cb7f9db8

github-bot Wed, 18 Feb 2026 07:05:11 -0800

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new e73be6fe3 Publish built docs triggered by 
b8d8fbe047adb34c574a7e8a17f28356cb7f9db8
e73be6fe3 is described below

commit e73be6fe3be89819efdd5dca19c22b4ced1f895f
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Wed Feb 18 15:04:48 2026 +0000

    Publish built docs triggered by b8d8fbe047adb34c574a7e8a17f28356cb7f9db8
---
 _sources/contributor-guide/ffi.md.txt           |   7 +-
 _sources/contributor-guide/parquet_scans.md.txt | 120 +++++++++++-------------
 _sources/contributor-guide/roadmap.md.txt       |  14 ---
 _sources/user-guide/latest/compatibility.md.txt |   2 +-
 _sources/user-guide/latest/configs.md.txt       |   1 +
 contributor-guide/ffi.html                      |   5 +-
 contributor-guide/parquet_scans.html            | 118 ++++++++++-------------
 contributor-guide/roadmap.html                  |   9 --
 searchindex.js                                  |   2 +-
 user-guide/latest/compatibility.html            |  16 ++--
 user-guide/latest/configs.html                  |   4 +
 11 files changed, 127 insertions(+), 171 deletions(-)

diff --git a/_sources/contributor-guide/ffi.md.txt 
b/_sources/contributor-guide/ffi.md.txt
index b1a51ecb2..c40c189e9 100644
--- a/_sources/contributor-guide/ffi.md.txt
+++ b/_sources/contributor-guide/ffi.md.txt
@@ -177,9 +177,10 @@ message Scan {
 
 #### When ownership is NOT transferred to native:
 
-If the data originates from `native_comet` scan (deprecated, will be removed 
in a future release) or from
-`native_iceberg_compat` in some cases, then ownership is not transferred to 
native and the JVM may re-use the
-underlying buffers in the future.
+If the data originates from a scan that uses mutable buffers (such as Iceberg 
scans using the [hybrid Iceberg reader]),
+then ownership is not transferred to native and the JVM may re-use the 
underlying buffers in the future.
+
+[hybrid Iceberg reader]: 
https://datafusion.apache.org/comet/user-guide/latest/iceberg.html#hybrid-reader
 
 It is critical that the native code performs a deep copy of the arrays if the 
arrays are to be buffered by
 operators such as `SortExec` or `ShuffleWriterExec`, otherwise data corruption 
is likely to occur.
diff --git a/_sources/contributor-guide/parquet_scans.md.txt 
b/_sources/contributor-guide/parquet_scans.md.txt
index bbacff4d9..7df939488 100644
--- a/_sources/contributor-guide/parquet_scans.md.txt
+++ b/_sources/contributor-guide/parquet_scans.md.txt
@@ -19,71 +19,60 @@ under the License.
 
 # Comet Parquet Scan Implementations
 
-Comet currently has three distinct implementations of the Parquet scan 
operator. The configuration property
-`spark.comet.scan.impl` is used to select an implementation. The default 
setting is `spark.comet.scan.impl=auto`, and
-Comet will choose the most appropriate implementation based on the Parquet 
schema and other Comet configuration
-settings. Most users should not need to change this setting. However, it is 
possible to force Comet to try and use
-a particular implementation for all scan operations by setting this 
configuration property to one of the following
-implementations.
-
-| Implementation          | Description                                        
                                                                                
                                                                         |
-| ----------------------- | 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 |
-| `native_comet`          | **Deprecated.** This implementation provides 
strong compatibility with Spark but does not support complex types. This is the 
original scan implementation in Comet and will be removed in a future release. |
-| `native_iceberg_compat` | This implementation delegates to DataFusion's 
`DataSourceExec` but uses a hybrid approach of JVM and native code. This scan 
is designed to be integrated with Iceberg in the future.                        
|
-| `native_datafusion`     | This experimental implementation delegates to 
DataFusion's `DataSourceExec` for full native execution. There are known 
compatibility issues when using this scan.                                      
     |
-
-The `native_datafusion` and `native_iceberg_compat` scans provide the 
following benefits over the `native_comet`
-implementation:
-
-- Leverages the DataFusion community's ongoing improvements to `DataSourceExec`
-- Provides support for reading complex types (structs, arrays, and maps)
-- Delegates Parquet decoding to native Rust code rather than JVM-side decoding
-- Improves performance
-
-> **Note on mutable buffers:** Both `native_comet` and `native_iceberg_compat` 
use reusable mutable buffers
-> when transferring data from JVM to native code via Arrow FFI. The 
`native_iceberg_compat` implementation uses DataFusion's native Parquet reader 
for data columns, bypassing Comet's mutable buffer infrastructure entirely. 
However, partition columns still use `ConstantColumnReader`, which relies on 
Comet's mutable buffers that are reused across batches. This means native 
operators that buffer data (such as `SortExec` or `ShuffleWriterExec`) must 
perform deep copies to avoid data corruption.
-> See the [FFI documentation](ffi.md) for details on the `arrow_ffi_safe` flag 
and ownership semantics.
-
-The `native_datafusion` and `native_iceberg_compat` scans share the following 
limitations:
-
-- When reading Parquet files written by systems other than Spark that contain 
columns with the logical type `UINT_8`
-  (unsigned 8-bit integers), Comet may produce different results than Spark. 
Spark maps `UINT_8` to `ShortType`, but
-  Comet's Arrow-based readers respect the unsigned type and read the data as 
unsigned rather than signed. Since Comet
-  cannot distinguish `ShortType` columns that came from `UINT_8` versus signed 
`INT16`, by default Comet falls back to
-  Spark when scanning Parquet files containing `ShortType` columns. This 
behavior can be disabled by setting
-  `spark.comet.scan.unsignedSmallIntSafetyCheck=false`. Note that `ByteType` 
columns are always safe because they can
-  only come from signed `INT8`, where truncation preserves the signed value.
-- No support for default values that are nested types (e.g., maps, arrays, 
structs). Literal default values are supported.
-- No support for datetime rebasing detection or the 
`spark.comet.exceptionOnDatetimeRebase` configuration. When reading
-  Parquet files containing dates or timestamps written before Spark 3.0 (which 
used a hybrid Julian/Gregorian calendar),
-  the `native_comet` implementation can detect these legacy values and either 
throw an exception or read them without
-  rebasing. The DataFusion-based implementations do not have this detection 
capability and will read all dates/timestamps
-  as if they were written using the Proleptic Gregorian calendar. This may 
produce incorrect results for dates before
-  October 15, 1582.
-- No support for Spark's Datasource V2 API. When 
`spark.sql.sources.useV1SourceList` does not include `parquet`,
-  Spark uses the V2 API for Parquet scans. The DataFusion-based 
implementations only support the V1 API, so Comet
-  will fall back to `native_comet` when V2 is enabled.
-
-The `native_datafusion` scan has some additional limitations:
+Comet currently has two distinct implementations of the Parquet scan operator.
+
+| Scan Implementation     | Notes                  |
+| ----------------------- | ---------------------- |
+| `native_datafusion`     | Fully native scan      |
+| `native_iceberg_compat` | Hybrid JVM/native scan |
+
+The configuration property
+`spark.comet.scan.impl` is used to select an implementation. The default 
setting is `spark.comet.scan.impl=auto`, which
+currently always uses the `native_iceberg_compat` implementation. Most users 
should not need to change this setting.
+However, it is possible to force Comet to use a particular implementation for 
all scan operations by setting
+this configuration property to one of the following implementations. For 
example: `--conf spark.comet.scan.impl=native_datafusion`.
+
+The following features are not supported by either scan implementation, and 
Comet will fall back to Spark in these scenarios:
+
+- `ShortType` columns, by default. When reading Parquet files written by 
systems other than Spark that contain
+  columns with the logical type `UINT_8` (unsigned 8-bit integers), Comet may 
produce different results than Spark.
+  Spark maps `UINT_8` to `ShortType`, but Comet's Arrow-based readers respect 
the unsigned type and read the data as
+  unsigned rather than signed. Since Comet cannot distinguish `ShortType` 
columns that came from `UINT_8` versus
+  signed `INT16`, by default Comet falls back to Spark when scanning Parquet 
files containing `ShortType` columns.
+  This behavior can be disabled by setting 
`spark.comet.scan.unsignedSmallIntSafetyCheck=false`. Note that `ByteType`
+  columns are always safe because they can only come from signed `INT8`, where 
truncation preserves the signed value.
+- Default values that are nested types (e.g., maps, arrays, structs). Literal 
default values are supported.
+- Spark's Datasource V2 API. When `spark.sql.sources.useV1SourceList` does not 
include `parquet`, Spark uses the
+  V2 API for Parquet scans. The DataFusion-based implementations only support 
the V1 API.
+- Spark metadata columns (e.g., `_metadata.file_path`)
+- No support for Dynamic Partition Pruning (DPP)
+
+The following shared limitation may produce incorrect results without falling 
back to Spark:
+
+- No support for datetime rebasing detection or the 
`spark.comet.exceptionOnDatetimeRebase` configuration. When
+  reading Parquet files containing dates or timestamps written before Spark 
3.0 (which used a hybrid
+  Julian/Gregorian calendar), dates/timestamps will be read as if they were 
written using the Proleptic Gregorian
+  calendar. This may produce incorrect results for dates before October 15, 
1582.
+
+The `native_datafusion` scan has some additional limitations, mostly related 
to Parquet metadata. All of these
+cause Comet to fall back to Spark.
 
 - No support for row indexes
-- `PARQUET_FIELD_ID_READ_ENABLED` is not respected [#1758]
-- There are failures in the Spark SQL test suite [#1545]
-- Setting Spark configs `ignoreMissingFiles` or `ignoreCorruptFiles` to `true` 
is not compatible with Spark
+- No support for reading Parquet field IDs
+- No support for `input_file_name()`, `input_file_block_start()`, or 
`input_file_block_length()` SQL functions.
+  The `native_datafusion` scan does not use Spark's `FileScanRDD`, so these 
functions cannot populate their values.
+- No support for `ignoreMissingFiles` or `ignoreCorruptFiles` being set to 
`true`
 
-## S3 Support
-
-There are some differences in S3 support between the scan implementations.
-
-### `native_comet` (Deprecated)
+The `native_iceberg_compat` scan has the following additional limitation that 
may produce incorrect results
+without falling back to Spark:
 
-> **Note:** The `native_comet` scan implementation is deprecated and will be 
removed in a future release.
+- Some Spark configuration values are hard-coded to their defaults rather than 
respecting user-specified values.
+  This may produce incorrect results when non-default values are set. The 
affected configurations are
+  `spark.sql.parquet.binaryAsString`, `spark.sql.parquet.int96AsTimestamp`, 
`spark.sql.caseSensitive`,
+  `spark.sql.parquet.inferTimestampNTZ.enabled`, and 
`spark.sql.legacy.parquet.nanosAsLong`. See
+  [issue #1816](https://github.com/apache/datafusion-comet/issues/1816) for 
more details.
 
-The `native_comet` Parquet scan implementation reads data from S3 using the 
[Hadoop-AWS 
module](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html),
 which
-is identical to the approach commonly used with vanilla Spark. AWS credential 
configuration and other Hadoop S3A
-configurations works the same way as in vanilla Spark.
-
-### `native_datafusion` and `native_iceberg_compat`
+## S3 Support
 
 The `native_datafusion` and `native_iceberg_compat` Parquet scan 
implementations completely offload data loading
 to native code. They use the [`object_store` 
crate](https://crates.io/crates/object_store) to read data from S3 and
@@ -95,7 +84,8 @@ continue to work as long as the configurations are supported 
and can be translat
 
 #### Additional S3 Configuration Options
 
-Beyond credential providers, the `native_datafusion` implementation supports 
additional S3 configuration options:
+Beyond credential providers, the `native_datafusion` and 
`native_iceberg_compat` implementations support additional
+S3 configuration options:
 
 | Option                          | Description                                
                                                        |
 | ------------------------------- | 
--------------------------------------------------------------------------------------------------
 |
@@ -108,7 +98,8 @@ All configuration options support bucket-specific overrides 
using the pattern `f
 
 #### Examples
 
-The following examples demonstrate how to configure S3 access with the 
`native_datafusion` Parquet scan implementation using different authentication 
methods.
+The following examples demonstrate how to configure S3 access with the 
`native_datafusion` and `native_iceberg_compat`
+Parquet scan implementations using different authentication methods.
 
 **Example 1: Simple Credentials**
 
@@ -140,11 +131,8 @@ $SPARK_HOME/bin/spark-shell \
 
 #### Limitations
 
-The S3 support of `native_datafusion` has the following limitations:
+The S3 support of `native_datafusion` and `native_iceberg_compat` has the 
following limitations:
 
 1. **Partial Hadoop S3A configuration support**: Not all Hadoop S3A 
configurations are currently supported. Only the configurations listed in the 
tables above are translated and applied to the underlying `object_store` crate.
 
 2. **Custom credential providers**: Custom implementations of AWS credential 
providers are not supported. The implementation only supports the standard 
credential providers listed in the table above. We are planning to add support 
for custom credential providers through a JNI-based adapter that will allow 
calling Java credential providers from native code. See [issue 
#1829](https://github.com/apache/datafusion-comet/issues/1829) for more details.
-
-[#1545]: https://github.com/apache/datafusion-comet/issues/1545
-[#1758]: https://github.com/apache/datafusion-comet/issues/1758
diff --git a/_sources/contributor-guide/roadmap.md.txt 
b/_sources/contributor-guide/roadmap.md.txt
index ce9c41416..6d99ee545 100644
--- a/_sources/contributor-guide/roadmap.md.txt
+++ b/_sources/contributor-guide/roadmap.md.txt
@@ -51,20 +51,6 @@ with benchmarks that benefit from this feature like TPC-DS. 
This effort can be t
 [#3349]: https://github.com/apache/datafusion-comet/pull/3349
 [#3510]: https://github.com/apache/datafusion-comet/issues/3510
 
-### Removing the native_comet scan implementation
-
-The `native_comet` scan implementation is now deprecated and will be removed 
in a future release ([#2186], [#2177]).
-This is the original scan implementation that uses mutable buffers (which is 
incompatible with best practices around
-Arrow FFI) and does not support complex types.
-
-Now that the default `auto` scan mode uses `native_iceberg_compat` (which is 
based on DataFusion's `DataSourceExec`),
-we can proceed with removing the `native_comet` scan implementation, and then 
improve the efficiency of our use of
-Arrow FFI ([#2171]).
-
-[#2186]: https://github.com/apache/datafusion-comet/issues/2186
-[#2171]: https://github.com/apache/datafusion-comet/issues/2171
-[#2177]: https://github.com/apache/datafusion-comet/issues/2177
-
 ## Ongoing Improvements
 
 In addition to the major initiatives above, we have the following ongoing 
areas of work:
diff --git a/_sources/user-guide/latest/compatibility.md.txt 
b/_sources/user-guide/latest/compatibility.md.txt
index fffa05509..0163efc4f 100644
--- a/_sources/user-guide/latest/compatibility.md.txt
+++ b/_sources/user-guide/latest/compatibility.md.txt
@@ -117,7 +117,7 @@ Cast operations in Comet fall into three levels of support:
 | binary | - | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | C | N/A |
 | boolean | N/A | - | C | N/A | C | C | C | C | C | C | C | U |
 | byte | C | C | - | N/A | C | C | C | C | C | C | C | U |
-| date | N/A | U | U | - | U | U | U | U | U | U | C | C |
+| date | N/A | C | C | - | C | C | C | C | C | C | C | C |
 | decimal | N/A | C | C | N/A | - | C | C | C | C | C | C | U |
 | double | N/A | C | C | N/A | I | - | C | C | C | C | C | U |
 | float | N/A | C | C | N/A | I | C | - | C | C | C | C | U |
diff --git a/_sources/user-guide/latest/configs.md.txt 
b/_sources/user-guide/latest/configs.md.txt
index 6ae0ec6fb..7a0ed1dc0 100644
--- a/_sources/user-guide/latest/configs.md.txt
+++ b/_sources/user-guide/latest/configs.md.txt
@@ -49,6 +49,7 @@ Comet provides the following configuration settings.
 | `spark.comet.parquet.read.parallel.io.enabled` | Whether to enable Comet's 
parallel reader for Parquet files. The parallel reader reads ranges of 
consecutive data in a  file in parallel. It is faster for large files and row 
groups but uses more resources. | true |
 | `spark.comet.parquet.read.parallel.io.thread-pool.size` | The maximum number 
of parallel threads the parallel reader will use in a single executor. For 
executors configured with a smaller number of cores, use a smaller number. | 16 
|
 | `spark.comet.parquet.respectFilterPushdown` | Whether to respect Spark's 
PARQUET_FILTER_PUSHDOWN_ENABLED config. This needs to be respected when running 
the Spark SQL test suite but the default setting results in poor performance in 
Comet when using the new native scans, disabled by default | false |
+| `spark.comet.scan.impl` | The implementation of Comet's Parquet scan to use. 
Available scans are `native_datafusion`, and `native_iceberg_compat`. 
`native_datafusion` is a fully native implementation, and 
`native_iceberg_compat` is a hybrid implementation that supports some 
additional features, such as row indexes and field ids. `auto` (default) 
chooses the best available scan based on the scan schema. It can be overridden 
by the environment variable `COMET_PARQUET_SCAN_IMPL`. | auto |
 <!-- prettier-ignore-end -->
 <!--END:CONFIG_TABLE-->
 
diff --git a/contributor-guide/ffi.html b/contributor-guide/ffi.html
index a03074a2d..fa6a7fb06 100644
--- a/contributor-guide/ffi.html
+++ b/contributor-guide/ffi.html
@@ -605,9 +605,8 @@ ownership is being transferred according to the Arrow C 
data interface specifica
 </div>
 <section id="when-ownership-is-not-transferred-to-native">
 <h4>When ownership is NOT transferred to native:<a class="headerlink" 
href="#when-ownership-is-not-transferred-to-native" title="Link to this 
heading">#</a></h4>
-<p>If the data originates from <code class="docutils literal 
notranslate"><span class="pre">native_comet</span></code> scan (deprecated, 
will be removed in a future release) or from
-<code class="docutils literal notranslate"><span 
class="pre">native_iceberg_compat</span></code> in some cases, then ownership 
is not transferred to native and the JVM may re-use the
-underlying buffers in the future.</p>
+<p>If the data originates from a scan that uses mutable buffers (such as 
Iceberg scans using the <a class="reference external" 
href="https://datafusion.apache.org/comet/user-guide/latest/iceberg.html#hybrid-reader";>hybrid
 Iceberg reader</a>),
+then ownership is not transferred to native and the JVM may re-use the 
underlying buffers in the future.</p>
 <p>It is critical that the native code performs a deep copy of the arrays if 
the arrays are to be buffered by
 operators such as <code class="docutils literal notranslate"><span 
class="pre">SortExec</span></code> or <code class="docutils literal 
notranslate"><span class="pre">ShuffleWriterExec</span></code>, otherwise data 
corruption is likely to occur.</p>
 </section>
diff --git a/contributor-guide/parquet_scans.html 
b/contributor-guide/parquet_scans.html
index 2bb8ffb82..52544b2ea 100644
--- a/contributor-guide/parquet_scans.html
+++ b/contributor-guide/parquet_scans.html
@@ -457,85 +457,70 @@ under the License.
 -->
 <section id="comet-parquet-scan-implementations">
 <h1>Comet Parquet Scan Implementations<a class="headerlink" 
href="#comet-parquet-scan-implementations" title="Link to this 
heading">#</a></h1>
-<p>Comet currently has three distinct implementations of the Parquet scan 
operator. The configuration property
-<code class="docutils literal notranslate"><span 
class="pre">spark.comet.scan.impl</span></code> is used to select an 
implementation. The default setting is <code class="docutils literal 
notranslate"><span class="pre">spark.comet.scan.impl=auto</span></code>, and
-Comet will choose the most appropriate implementation based on the Parquet 
schema and other Comet configuration
-settings. Most users should not need to change this setting. However, it is 
possible to force Comet to try and use
-a particular implementation for all scan operations by setting this 
configuration property to one of the following
-implementations.</p>
+<p>Comet currently has two distinct implementations of the Parquet scan 
operator.</p>
 <div class="pst-scrollable-table-container"><table class="table">
 <thead>
-<tr class="row-odd"><th class="head"><p>Implementation</p></th>
-<th class="head"><p>Description</p></th>
+<tr class="row-odd"><th class="head"><p>Scan Implementation</p></th>
+<th class="head"><p>Notes</p></th>
 </tr>
 </thead>
 <tbody>
-<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">native_comet</span></code></p></td>
-<td><p><strong>Deprecated.</strong> This implementation provides strong 
compatibility with Spark but does not support complex types. This is the 
original scan implementation in Comet and will be removed in a future 
release.</p></td>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code></p></td>
+<td><p>Fully native scan</p></td>
 </tr>
 <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">native_iceberg_compat</span></code></p></td>
-<td><p>This implementation delegates to DataFusion’s <code class="docutils 
literal notranslate"><span class="pre">DataSourceExec</span></code> but uses a 
hybrid approach of JVM and native code. This scan is designed to be integrated 
with Iceberg in the future.</p></td>
-</tr>
-<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code></p></td>
-<td><p>This experimental implementation delegates to DataFusion’s <code 
class="docutils literal notranslate"><span 
class="pre">DataSourceExec</span></code> for full native execution. There are 
known compatibility issues when using this scan.</p></td>
+<td><p>Hybrid JVM/native scan</p></td>
 </tr>
 </tbody>
 </table>
 </div>
-<p>The <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> and <code class="docutils literal 
notranslate"><span class="pre">native_iceberg_compat</span></code> scans 
provide the following benefits over the <code class="docutils literal 
notranslate"><span class="pre">native_comet</span></code>
-implementation:</p>
+<p>The configuration property
+<code class="docutils literal notranslate"><span 
class="pre">spark.comet.scan.impl</span></code> is used to select an 
implementation. The default setting is <code class="docutils literal 
notranslate"><span class="pre">spark.comet.scan.impl=auto</span></code>, which
+currently always uses the <code class="docutils literal notranslate"><span 
class="pre">native_iceberg_compat</span></code> implementation. Most users 
should not need to change this setting.
+However, it is possible to force Comet to use a particular implementation for 
all scan operations by setting
+this configuration property to one of the following implementations. For 
example: <code class="docutils literal notranslate"><span 
class="pre">--conf</span> <span 
class="pre">spark.comet.scan.impl=native_datafusion</span></code>.</p>
+<p>The following features are not supported by either scan implementation, and 
Comet will fall back to Spark in these scenarios:</p>
 <ul class="simple">
-<li><p>Leverages the DataFusion community’s ongoing improvements to <code 
class="docutils literal notranslate"><span 
class="pre">DataSourceExec</span></code></p></li>
-<li><p>Provides support for reading complex types (structs, arrays, and 
maps)</p></li>
-<li><p>Delegates Parquet decoding to native Rust code rather than JVM-side 
decoding</p></li>
-<li><p>Improves performance</p></li>
+<li><p><code class="docutils literal notranslate"><span 
class="pre">ShortType</span></code> columns, by default. When reading Parquet 
files written by systems other than Spark that contain
+columns with the logical type <code class="docutils literal notranslate"><span 
class="pre">UINT_8</span></code> (unsigned 8-bit integers), Comet may produce 
different results than Spark.
+Spark maps <code class="docutils literal notranslate"><span 
class="pre">UINT_8</span></code> to <code class="docutils literal 
notranslate"><span class="pre">ShortType</span></code>, but Comet’s Arrow-based 
readers respect the unsigned type and read the data as
+unsigned rather than signed. Since Comet cannot distinguish <code 
class="docutils literal notranslate"><span class="pre">ShortType</span></code> 
columns that came from <code class="docutils literal notranslate"><span 
class="pre">UINT_8</span></code> versus
+signed <code class="docutils literal notranslate"><span 
class="pre">INT16</span></code>, by default Comet falls back to Spark when 
scanning Parquet files containing <code class="docutils literal 
notranslate"><span class="pre">ShortType</span></code> columns.
+This behavior can be disabled by setting <code class="docutils literal 
notranslate"><span 
class="pre">spark.comet.scan.unsignedSmallIntSafetyCheck=false</span></code>. 
Note that <code class="docutils literal notranslate"><span 
class="pre">ByteType</span></code>
+columns are always safe because they can only come from signed <code 
class="docutils literal notranslate"><span class="pre">INT8</span></code>, 
where truncation preserves the signed value.</p></li>
+<li><p>Default values that are nested types (e.g., maps, arrays, structs). 
Literal default values are supported.</p></li>
+<li><p>Spark’s Datasource V2 API. When <code class="docutils literal 
notranslate"><span class="pre">spark.sql.sources.useV1SourceList</span></code> 
does not include <code class="docutils literal notranslate"><span 
class="pre">parquet</span></code>, Spark uses the
+V2 API for Parquet scans. The DataFusion-based implementations only support 
the V1 API.</p></li>
+<li><p>Spark metadata columns (e.g., <code class="docutils literal 
notranslate"><span class="pre">_metadata.file_path</span></code>)</p></li>
+<li><p>No support for Dynamic Partition Pruning (DPP)</p></li>
 </ul>
-<blockquote>
-<div><p><strong>Note on mutable buffers:</strong> Both <code class="docutils 
literal notranslate"><span class="pre">native_comet</span></code> and <code 
class="docutils literal notranslate"><span 
class="pre">native_iceberg_compat</span></code> use reusable mutable buffers
-when transferring data from JVM to native code via Arrow FFI. The <code 
class="docutils literal notranslate"><span 
class="pre">native_iceberg_compat</span></code> implementation uses 
DataFusion’s native Parquet reader for data columns, bypassing Comet’s mutable 
buffer infrastructure entirely. However, partition columns still use <code 
class="docutils literal notranslate"><span 
class="pre">ConstantColumnReader</span></code>, which relies on Comet’s mutable 
buffers that are reused across b [...]
-See the <a class="reference internal" href="ffi.html"><span class="std 
std-doc">FFI documentation</span></a> for details on the <code class="docutils 
literal notranslate"><span class="pre">arrow_ffi_safe</span></code> flag and 
ownership semantics.</p>
-</div></blockquote>
-<p>The <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> and <code class="docutils literal 
notranslate"><span class="pre">native_iceberg_compat</span></code> scans share 
the following limitations:</p>
+<p>The following shared limitation may produce incorrect results without 
falling back to Spark:</p>
 <ul class="simple">
-<li><p>When reading Parquet files written by systems other than Spark that 
contain columns with the logical type <code class="docutils literal 
notranslate"><span class="pre">UINT_8</span></code>
-(unsigned 8-bit integers), Comet may produce different results than Spark. 
Spark maps <code class="docutils literal notranslate"><span 
class="pre">UINT_8</span></code> to <code class="docutils literal 
notranslate"><span class="pre">ShortType</span></code>, but
-Comet’s Arrow-based readers respect the unsigned type and read the data as 
unsigned rather than signed. Since Comet
-cannot distinguish <code class="docutils literal notranslate"><span 
class="pre">ShortType</span></code> columns that came from <code 
class="docutils literal notranslate"><span class="pre">UINT_8</span></code> 
versus signed <code class="docutils literal notranslate"><span 
class="pre">INT16</span></code>, by default Comet falls back to
-Spark when scanning Parquet files containing <code class="docutils literal 
notranslate"><span class="pre">ShortType</span></code> columns. This behavior 
can be disabled by setting
-<code class="docutils literal notranslate"><span 
class="pre">spark.comet.scan.unsignedSmallIntSafetyCheck=false</span></code>. 
Note that <code class="docutils literal notranslate"><span 
class="pre">ByteType</span></code> columns are always safe because they can
-only come from signed <code class="docutils literal notranslate"><span 
class="pre">INT8</span></code>, where truncation preserves the signed 
value.</p></li>
-<li><p>No support for default values that are nested types (e.g., maps, 
arrays, structs). Literal default values are supported.</p></li>
-<li><p>No support for datetime rebasing detection or the <code class="docutils 
literal notranslate"><span 
class="pre">spark.comet.exceptionOnDatetimeRebase</span></code> configuration. 
When reading
-Parquet files containing dates or timestamps written before Spark 3.0 (which 
used a hybrid Julian/Gregorian calendar),
-the <code class="docutils literal notranslate"><span 
class="pre">native_comet</span></code> implementation can detect these legacy 
values and either throw an exception or read them without
-rebasing. The DataFusion-based implementations do not have this detection 
capability and will read all dates/timestamps
-as if they were written using the Proleptic Gregorian calendar. This may 
produce incorrect results for dates before
-October 15, 1582.</p></li>
-<li><p>No support for Spark’s Datasource V2 API. When <code class="docutils 
literal notranslate"><span 
class="pre">spark.sql.sources.useV1SourceList</span></code> does not include 
<code class="docutils literal notranslate"><span 
class="pre">parquet</span></code>,
-Spark uses the V2 API for Parquet scans. The DataFusion-based implementations 
only support the V1 API, so Comet
-will fall back to <code class="docutils literal notranslate"><span 
class="pre">native_comet</span></code> when V2 is enabled.</p></li>
+<li><p>No support for datetime rebasing detection or the <code class="docutils 
literal notranslate"><span 
class="pre">spark.comet.exceptionOnDatetimeRebase</span></code> configuration. 
When
+reading Parquet files containing dates or timestamps written before Spark 3.0 
(which used a hybrid
+Julian/Gregorian calendar), dates/timestamps will be read as if they were 
written using the Proleptic Gregorian
+calendar. This may produce incorrect results for dates before October 15, 
1582.</p></li>
 </ul>
-<p>The <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> scan has some additional 
limitations:</p>
+<p>The <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> scan has some additional 
limitations, mostly related to Parquet metadata. All of these
+cause Comet to fall back to Spark.</p>
 <ul class="simple">
 <li><p>No support for row indexes</p></li>
-<li><p><code class="docutils literal notranslate"><span 
class="pre">PARQUET_FIELD_ID_READ_ENABLED</span></code> is not respected <a 
class="reference external" 
href="https://github.com/apache/datafusion-comet/issues/1758";>#1758</a></p></li>
-<li><p>There are failures in the Spark SQL test suite <a class="reference 
external" 
href="https://github.com/apache/datafusion-comet/issues/1545";>#1545</a></p></li>
-<li><p>Setting Spark configs <code class="docutils literal notranslate"><span 
class="pre">ignoreMissingFiles</span></code> or <code class="docutils literal 
notranslate"><span class="pre">ignoreCorruptFiles</span></code> to <code 
class="docutils literal notranslate"><span class="pre">true</span></code> is 
not compatible with Spark</p></li>
+<li><p>No support for reading Parquet field IDs</p></li>
+<li><p>No support for <code class="docutils literal notranslate"><span 
class="pre">input_file_name()</span></code>, <code class="docutils literal 
notranslate"><span class="pre">input_file_block_start()</span></code>, or <code 
class="docutils literal notranslate"><span 
class="pre">input_file_block_length()</span></code> SQL functions.
+The <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> scan does not use Spark’s <code 
class="docutils literal notranslate"><span 
class="pre">FileScanRDD</span></code>, so these functions cannot populate their 
values.</p></li>
+<li><p>No support for <code class="docutils literal notranslate"><span 
class="pre">ignoreMissingFiles</span></code> or <code class="docutils literal 
notranslate"><span class="pre">ignoreCorruptFiles</span></code> being set to 
<code class="docutils literal notranslate"><span 
class="pre">true</span></code></p></li>
+</ul>
+<p>The <code class="docutils literal notranslate"><span 
class="pre">native_iceberg_compat</span></code> scan has the following 
additional limitation that may produce incorrect results
+without falling back to Spark:</p>
+<ul class="simple">
+<li><p>Some Spark configuration values are hard-coded to their defaults rather 
than respecting user-specified values.
+This may produce incorrect results when non-default values are set. The 
affected configurations are
+<code class="docutils literal notranslate"><span 
class="pre">spark.sql.parquet.binaryAsString</span></code>, <code 
class="docutils literal notranslate"><span 
class="pre">spark.sql.parquet.int96AsTimestamp</span></code>, <code 
class="docutils literal notranslate"><span 
class="pre">spark.sql.caseSensitive</span></code>,
+<code class="docutils literal notranslate"><span 
class="pre">spark.sql.parquet.inferTimestampNTZ.enabled</span></code>, and 
<code class="docutils literal notranslate"><span 
class="pre">spark.sql.legacy.parquet.nanosAsLong</span></code>. See
+<a class="reference external" 
href="https://github.com/apache/datafusion-comet/issues/1816";>issue #1816</a> 
for more details.</p></li>
 </ul>
 <section id="s3-support">
 <h2>S3 Support<a class="headerlink" href="#s3-support" title="Link to this 
heading">#</a></h2>
-<p>There are some differences in S3 support between the scan 
implementations.</p>
-<section id="native-comet-deprecated">
-<h3><code class="docutils literal notranslate"><span 
class="pre">native_comet</span></code> (Deprecated)<a class="headerlink" 
href="#native-comet-deprecated" title="Link to this heading">#</a></h3>
-<blockquote>
-<div><p><strong>Note:</strong> The <code class="docutils literal 
notranslate"><span class="pre">native_comet</span></code> scan implementation 
is deprecated and will be removed in a future release.</p>
-</div></blockquote>
-<p>The <code class="docutils literal notranslate"><span 
class="pre">native_comet</span></code> Parquet scan implementation reads data 
from S3 using the <a class="reference external" 
href="https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html";>Hadoop-AWS
 module</a>, which
-is identical to the approach commonly used with vanilla Spark. AWS credential 
configuration and other Hadoop S3A
-configurations works the same way as in vanilla Spark.</p>
-</section>
-<section id="native-datafusion-and-native-iceberg-compat">
-<h3><code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> and <code class="docutils literal 
notranslate"><span class="pre">native_iceberg_compat</span></code><a 
class="headerlink" href="#native-datafusion-and-native-iceberg-compat" 
title="Link to this heading">#</a></h3>
 <p>The <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> and <code class="docutils literal 
notranslate"><span class="pre">native_iceberg_compat</span></code> Parquet scan 
implementations completely offload data loading
 to native code. They use the <a class="reference external" 
href="https://crates.io/crates/object_store";><code class="docutils literal 
notranslate"><span class="pre">object_store</span></code> crate</a> to read 
data from S3 and
 support configuring S3 access using standard <a class="reference external" 
href="https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#General_S3A_Client_configuration";>Hadoop
 S3A configurations</a> by translating them to
@@ -543,8 +528,9 @@ the <code class="docutils literal notranslate"><span 
class="pre">object_store</s
 <p>This implementation maintains compatibility with existing Hadoop S3A 
configurations, so existing code will
 continue to work as long as the configurations are supported and can be 
translated without loss of functionality.</p>
 <section id="additional-s3-configuration-options">
-<h4>Additional S3 Configuration Options<a class="headerlink" 
href="#additional-s3-configuration-options" title="Link to this 
heading">#</a></h4>
-<p>Beyond credential providers, the <code class="docutils literal 
notranslate"><span class="pre">native_datafusion</span></code> implementation 
supports additional S3 configuration options:</p>
+<h3>Additional S3 Configuration Options<a class="headerlink" 
href="#additional-s3-configuration-options" title="Link to this 
heading">#</a></h3>
+<p>Beyond credential providers, the <code class="docutils literal 
notranslate"><span class="pre">native_datafusion</span></code> and <code 
class="docutils literal notranslate"><span 
class="pre">native_iceberg_compat</span></code> implementations support 
additional
+S3 configuration options:</p>
 <div class="pst-scrollable-table-container"><table class="table">
 <thead>
 <tr class="row-odd"><th class="head"><p>Option</p></th>
@@ -570,8 +556,9 @@ continue to work as long as the configurations are 
supported and can be translat
 <p>All configuration options support bucket-specific overrides using the 
pattern <code class="docutils literal notranslate"><span 
class="pre">fs.s3a.bucket.{bucket-name}.{option}</span></code>.</p>
 </section>
 <section id="examples">
-<h4>Examples<a class="headerlink" href="#examples" title="Link to this 
heading">#</a></h4>
-<p>The following examples demonstrate how to configure S3 access with the 
<code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> Parquet scan implementation using 
different authentication methods.</p>
+<h3>Examples<a class="headerlink" href="#examples" title="Link to this 
heading">#</a></h3>
+<p>The following examples demonstrate how to configure S3 access with the 
<code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> and <code class="docutils literal 
notranslate"><span class="pre">native_iceberg_compat</span></code>
+Parquet scan implementations using different authentication methods.</p>
 <p><strong>Example 1: Simple Credentials</strong></p>
 <p>This example shows how to access a private S3 bucket using an access key 
and secret key. The <code class="docutils literal notranslate"><span 
class="pre">fs.s3a.aws.credentials.provider</span></code> configuration can be 
omitted since <code class="docutils literal notranslate"><span 
class="pre">org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider</span></code> 
is included in Hadoop S3A’s default credential provider chain.</p>
 <div class="highlight-shell notranslate"><div 
class="highlight"><pre><span></span><span 
class="nv">$SPARK_HOME</span>/bin/spark-shell<span class="w"> </span><span 
class="se">\</span>
@@ -596,15 +583,14 @@ continue to work as long as the configurations are 
supported and can be translat
 </div>
 </section>
 <section id="limitations">
-<h4>Limitations<a class="headerlink" href="#limitations" title="Link to this 
heading">#</a></h4>
-<p>The S3 support of <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> has the following limitations:</p>
+<h3>Limitations<a class="headerlink" href="#limitations" title="Link to this 
heading">#</a></h3>
+<p>The S3 support of <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> and <code class="docutils literal 
notranslate"><span class="pre">native_iceberg_compat</span></code> has the 
following limitations:</p>
 <ol class="arabic simple">
 <li><p><strong>Partial Hadoop S3A configuration support</strong>: Not all 
Hadoop S3A configurations are currently supported. Only the configurations 
listed in the tables above are translated and applied to the underlying <code 
class="docutils literal notranslate"><span 
class="pre">object_store</span></code> crate.</p></li>
 <li><p><strong>Custom credential providers</strong>: Custom implementations of 
AWS credential providers are not supported. The implementation only supports 
the standard credential providers listed in the table above. We are planning to 
add support for custom credential providers through a JNI-based adapter that 
will allow calling Java credential providers from native code. See <a 
class="reference external" 
href="https://github.com/apache/datafusion-comet/issues/1829";>issue #1829</a> 
for  [...]
 </ol>
 </section>
 </section>
-</section>
 </section>
 
 
diff --git a/contributor-guide/roadmap.html b/contributor-guide/roadmap.html
index be85cf9ef..9e5256a2b 100644
--- a/contributor-guide/roadmap.html
+++ b/contributor-guide/roadmap.html
@@ -480,15 +480,6 @@ Spark’s <code class="docutils literal notranslate"><span 
class="pre">PlanAdapt
 Execution requires a redesign of Comet’s plan translation. We are focused on 
implementing DPP to keep Comet competitive
 with benchmarks that benefit from this feature like TPC-DS. This effort can be 
tracked at <a class="reference external" 
href="https://github.com/apache/datafusion-comet/issues/3510";>#3510</a>.</p>
 </section>
-<section id="removing-the-native-comet-scan-implementation">
-<h3>Removing the native_comet scan implementation<a class="headerlink" 
href="#removing-the-native-comet-scan-implementation" title="Link to this 
heading">#</a></h3>
-<p>The <code class="docutils literal notranslate"><span 
class="pre">native_comet</span></code> scan implementation is now deprecated 
and will be removed in a future release (<a class="reference external" 
href="https://github.com/apache/datafusion-comet/issues/2186";>#2186</a>, <a 
class="reference external" 
href="https://github.com/apache/datafusion-comet/issues/2177";>#2177</a>).
-This is the original scan implementation that uses mutable buffers (which is 
incompatible with best practices around
-Arrow FFI) and does not support complex types.</p>
-<p>Now that the default <code class="docutils literal notranslate"><span 
class="pre">auto</span></code> scan mode uses <code class="docutils literal 
notranslate"><span class="pre">native_iceberg_compat</span></code> (which is 
based on DataFusion’s <code class="docutils literal notranslate"><span 
class="pre">DataSourceExec</span></code>),
-we can proceed with removing the <code class="docutils literal 
notranslate"><span class="pre">native_comet</span></code> scan implementation, 
and then improve the efficiency of our use of
-Arrow FFI (<a class="reference external" 
href="https://github.com/apache/datafusion-comet/issues/2171";>#2171</a>).</p>
-</section>
 </section>
 <section id="ongoing-improvements">
 <h2>Ongoing Improvements<a class="headerlink" href="#ongoing-improvements" 
title="Link to this heading">#</a></h2>
diff --git a/searchindex.js b/searchindex.js
index 1d278c792..1985b0dbe 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Format Your Code": [[12, 
"format-your-code"]], "1. Install Comet": [[22, "install-comet"]], "1. Native 
Operators (nativeExecs map)": [[4, "native-operators-nativeexecs-map"]], "2. 
Build and Verify": [[12, "build-and-verify"]], "2. Clone Spark and Apply Diff": 
[[22, "clone-spark-and-apply-diff"]], "2. Sink Operators (sinks map)": [[4, 
"sink-operators-sinks-map"]], "3. Comet JVM Operators": [[4, 
"comet-jvm-operators"]], "3. Run Clippy (Recommended)": [[12 [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Format Your Code": [[12, 
"format-your-code"]], "1. Install Comet": [[22, "install-comet"]], "1. Native 
Operators (nativeExecs map)": [[4, "native-operators-nativeexecs-map"]], "2. 
Build and Verify": [[12, "build-and-verify"]], "2. Clone Spark and Apply Diff": 
[[22, "clone-spark-and-apply-diff"]], "2. Sink Operators (sinks map)": [[4, 
"sink-operators-sinks-map"]], "3. Comet JVM Operators": [[4, 
"comet-jvm-operators"]], "3. Run Clippy (Recommended)": [[12 [...]
\ No newline at end of file
diff --git a/user-guide/latest/compatibility.html 
b/user-guide/latest/compatibility.html
index 755148812..33c007450 100644
--- a/user-guide/latest/compatibility.html
+++ b/user-guide/latest/compatibility.html
@@ -610,15 +610,15 @@ Spark.</p></li>
 </tr>
 <tr class="row-odd"><td><p>date</p></td>
 <td><p>N/A</p></td>
-<td><p>U</p></td>
-<td><p>U</p></td>
+<td><p>C</p></td>
+<td><p>C</p></td>
 <td><p>-</p></td>
-<td><p>U</p></td>
-<td><p>U</p></td>
-<td><p>U</p></td>
-<td><p>U</p></td>
-<td><p>U</p></td>
-<td><p>U</p></td>
+<td><p>C</p></td>
+<td><p>C</p></td>
+<td><p>C</p></td>
+<td><p>C</p></td>
+<td><p>C</p></td>
+<td><p>C</p></td>
 <td><p>C</p></td>
 <td><p>C</p></td>
 </tr>
diff --git a/user-guide/latest/configs.html b/user-guide/latest/configs.html
index 465e3e6ba..fa21bce21 100644
--- a/user-guide/latest/configs.html
+++ b/user-guide/latest/configs.html
@@ -543,6 +543,10 @@ under the License.
 <td><p>Whether to respect Spark’s PARQUET_FILTER_PUSHDOWN_ENABLED config. This 
needs to be respected when running the Spark SQL test suite but the default 
setting results in poor performance in Comet when using the new native scans, 
disabled by default</p></td>
 <td><p>false</p></td>
 </tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">spark.comet.scan.impl</span></code></p></td>
+<td><p>The implementation of Comet’s Parquet scan to use. Available scans are 
<code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code>, and <code class="docutils literal 
notranslate"><span class="pre">native_iceberg_compat</span></code>. <code 
class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> is a fully native implementation, 
and <code class="docutils literal notranslate"><span 
class="pre">native_iceberg_compat</spa [...]
+<td><p>auto</p></td>
+</tr>
 </tbody>
 </table>
 </div>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(datafusion-comet) branch asf-site updated: Publish built docs triggered by b8d8fbe047adb34c574a7e8a17f28356cb7f9db8

Reply via email to