This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 20ed251ad Publish built docs triggered by 
0fec0f56f251ca4acc81ee2b43c28227b6cac5ba
20ed251ad is described below

commit 20ed251adfc2cd38661473c6f93f031784e2207d
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Tue Dec 9 21:35:13 2025 +0000

    Publish built docs triggered by 0fec0f56f251ca4acc81ee2b43c28227b6cac5ba
---
 _sources/user-guide/latest/iceberg.md.txt | 62 ++++++++++++++++++++++++++++---
 searchindex.js                            |  2 +-
 user-guide/latest/iceberg.html            | 61 +++++++++++++++++++++++++++---
 3 files changed, 112 insertions(+), 13 deletions(-)

diff --git a/_sources/user-guide/latest/iceberg.md.txt 
b/_sources/user-guide/latest/iceberg.md.txt
index 3314cb692..3ecea45aa 100644
--- a/_sources/user-guide/latest/iceberg.md.txt
+++ b/_sources/user-guide/latest/iceberg.md.txt
@@ -19,10 +19,15 @@
 
 # Accelerating Apache Iceberg Parquet Scans using Comet (Experimental)
 
-**Note: Iceberg integration is a work-in-progress. It is currently necessary 
to build Iceberg from
-source rather than using available artifacts in Maven**
+**Note: Iceberg integration is a work-in-progress. Comet currently has two 
distinct Iceberg
+code paths: 1) a hybrid reader (native Parquet decoding, JVM otherwise) that 
requires
+building Iceberg from source rather than using available artifacts in Maven, 
and 2) fully-native
+reader (based on [iceberg-rust](https://github.com/apache/iceberg-rust)). 
Directions for both
+designs are provided below.**
 
-## Build Comet
+## Hybrid Reader
+
+### Build Comet
 
 Run a Maven install so that we can compile Iceberg against latest Comet:
 
@@ -42,7 +47,7 @@ Set `COMET_JAR` env var:
 export 
COMET_JAR=`pwd`/spark/target/comet-spark-spark3.5_2.12-0.13.0-SNAPSHOT.jar
 ```
 
-## Build Iceberg
+### Build Iceberg
 
 Clone the Iceberg repository and apply code changes needed by Comet
 
@@ -59,7 +64,7 @@ Perform a clean build
 ./gradlew clean build -x test -x integrationTest
 ```
 
-## Test
+### Test
 
 Set `ICEBERG_JAR` environment variable.
 
@@ -140,7 +145,52 @@ scala> spark.sql(s"SELECT * from t1").explain()
 +- CometBatchScan spark_catalog.default.t1[c0#26, c1#27] 
spark_catalog.default.t1 (branch=null) [filters=, groupedBy=] RuntimeFilters: []
 ```
 
-## Known issues
+### Known issues
 
 - Spark Runtime Filtering isn't 
[working](https://github.com/apache/datafusion-comet/issues/2116)
   - You can bypass the issue by either setting 
`spark.sql.adaptive.enabled=false` or 
`spark.comet.exec.broadcastExchange.enabled=false`
+
+## Native Reader
+
+Comet's fully-native Iceberg integration does not require modifying Iceberg 
source
+code. Instead, Comet relies on reflection to extract `FileScanTask`s from 
Iceberg, which are
+then serialized to Comet's native execution engine (see
+[PR #2528](https://github.com/apache/datafusion-comet/pull/2528)).
+
+The example below uses Spark's package downloader to retrieve Comet 0.12.0 and 
Iceberg
+1.8.1, but Comet has been tested with Iceberg 1.5, 1.7, 1.8, and 1.10. The key 
configuration
+to enable fully-native Iceberg is 
`spark.comet.scan.icebergNative.enabled=true`. This
+configuration should **not** be used with the hybrid Iceberg configuration
+`spark.sql.iceberg.parquet.reader-type=COMET` from above.
+
+```shell
+$SPARK_HOME/bin/spark-shell \
+    --packages 
org.apache.datafusion:comet-spark-spark3.5_2.12:0.12.0,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.iceberg:iceberg-core:1.8.1
 \
+    --repositories https://repo1.maven.org/maven2/ \
+    --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
+    --conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog \
+    --conf spark.sql.catalog.spark_catalog.type=hadoop \
+    --conf spark.sql.catalog.spark_catalog.warehouse=/tmp/warehouse \
+    --conf spark.plugins=org.apache.spark.CometPlugin \
+    --conf 
spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager
 \
+    --conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions \
+    --conf spark.comet.scan.icebergNative.enabled=true \
+    --conf spark.comet.explainFallback.enabled=true \
+    --conf spark.memory.offHeap.enabled=true \
+    --conf spark.memory.offHeap.size=2g
+```
+
+The same sample queries from above can be used to test Comet's fully-native 
Iceberg integration,
+however the scan node to look for is `CometIcebergNativeScan`.
+
+### Current limitations
+
+The following scenarios are not yet supported, but are work in progress:
+
+- Iceberg table spec v3 scans will fall back.
+- Iceberg writes will fall back.
+- Iceberg table scans backed by Avro or ORC data files will fall back.
+- Iceberg table scans partitioned on `BINARY` or `DECIMAL` (with precision 
>28) columns will fall back.
+- Iceberg scans with residual filters (_i.e._, filter expressions that are not 
partition values,
+  and are evaluated on the column values at scan time) of `truncate`, 
`bucket`, `year`, `month`,
+  `day`, `hour` will fall back.
diff --git a/searchindex.js b/searchindex.js
index 4e2b4749f..7148ba436 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Install Comet": [[19, "install-comet"]], 
"1. Native Operators (nativeExecs map)": [[4, 
"native-operators-nativeexecs-map"]], "2. Clone Spark and Apply Diff": [[19, 
"clone-spark-and-apply-diff"]], "2. Sink Operators (sinks map)": [[4, 
"sink-operators-sinks-map"]], "3. Comet JVM Operators": [[4, 
"comet-jvm-operators"]], "3. Run Spark SQL Tests": [[19, 
"run-spark-sql-tests"]], "ANSI Mode": [[22, "ansi-mode"], [35, "ansi-mode"], 
[48, "ansi-mode"], [88, "ans [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Install Comet": [[19, "install-comet"]], 
"1. Native Operators (nativeExecs map)": [[4, 
"native-operators-nativeexecs-map"]], "2. Clone Spark and Apply Diff": [[19, 
"clone-spark-and-apply-diff"]], "2. Sink Operators (sinks map)": [[4, 
"sink-operators-sinks-map"]], "3. Comet JVM Operators": [[4, 
"comet-jvm-operators"]], "3. Run Spark SQL Tests": [[19, 
"run-spark-sql-tests"]], "ANSI Mode": [[22, "ansi-mode"], [35, "ansi-mode"], 
[48, "ansi-mode"], [88, "ans [...]
\ No newline at end of file
diff --git a/user-guide/latest/iceberg.html b/user-guide/latest/iceberg.html
index 682978e0c..af6d18889 100644
--- a/user-guide/latest/iceberg.html
+++ b/user-guide/latest/iceberg.html
@@ -461,10 +461,15 @@ under the License.
 -->
 <section 
id="accelerating-apache-iceberg-parquet-scans-using-comet-experimental">
 <h1>Accelerating Apache Iceberg Parquet Scans using Comet (Experimental)<a 
class="headerlink" 
href="#accelerating-apache-iceberg-parquet-scans-using-comet-experimental" 
title="Link to this heading">#</a></h1>
-<p><strong>Note: Iceberg integration is a work-in-progress. It is currently 
necessary to build Iceberg from
-source rather than using available artifacts in Maven</strong></p>
+<p><strong>Note: Iceberg integration is a work-in-progress. Comet currently 
has two distinct Iceberg
+code paths: 1) a hybrid reader (native Parquet decoding, JVM otherwise) that 
requires
+building Iceberg from source rather than using available artifacts in Maven, 
and 2) fully-native
+reader (based on <a class="reference external" 
href="https://github.com/apache/iceberg-rust";>iceberg-rust</a>). Directions for 
both
+designs are provided below.</strong></p>
+<section id="hybrid-reader">
+<h2>Hybrid Reader<a class="headerlink" href="#hybrid-reader" title="Link to 
this heading">#</a></h2>
 <section id="build-comet">
-<h2>Build Comet<a class="headerlink" href="#build-comet" title="Link to this 
heading">#</a></h2>
+<h3>Build Comet<a class="headerlink" href="#build-comet" title="Link to this 
heading">#</a></h3>
 <p>Run a Maven install so that we can compile Iceberg against latest Comet:</p>
 <div class="highlight-shell notranslate"><div 
class="highlight"><pre><span></span>mvn<span class="w"> </span>install<span 
class="w"> </span>-DskipTests
 </pre></div>
@@ -479,7 +484,7 @@ source rather than using available artifacts in 
Maven</strong></p>
 </div>
 </section>
 <section id="build-iceberg">
-<h2>Build Iceberg<a class="headerlink" href="#build-iceberg" title="Link to 
this heading">#</a></h2>
+<h3>Build Iceberg<a class="headerlink" href="#build-iceberg" title="Link to 
this heading">#</a></h3>
 <p>Clone the Iceberg repository and apply code changes needed by Comet</p>
 <div class="highlight-shell notranslate"><div 
class="highlight"><pre><span></span>git<span class="w"> </span>clone<span 
class="w"> </span>[email protected]:apache/iceberg.git
 <span class="nb">cd</span><span class="w"> </span>iceberg
@@ -493,7 +498,7 @@ git<span class="w"> </span>apply<span class="w"> 
</span>../datafusion-comet/dev/
 </div>
 </section>
 <section id="test">
-<h2>Test<a class="headerlink" href="#test" title="Link to this 
heading">#</a></h2>
+<h3>Test<a class="headerlink" href="#test" title="Link to this 
heading">#</a></h3>
 <p>Set <code class="docutils literal notranslate"><span 
class="pre">ICEBERG_JAR</span></code> environment variable.</p>
 <div class="highlight-shell notranslate"><div 
class="highlight"><pre><span></span><span class="nb">export</span><span 
class="w"> </span><span class="nv">ICEBERG_JAR</span><span 
class="o">=</span><span class="sb">`</span><span class="nb">pwd</span><span 
class="sb">`</span>/spark/v3.5/spark-runtime/build/libs/iceberg-spark-runtime-3.5_2.12-1.9.0-SNAPSHOT.jar
 </pre></div>
@@ -563,7 +568,7 @@ git<span class="w"> </span>apply<span class="w"> 
</span>../datafusion-comet/dev/
 </div>
 </section>
 <section id="known-issues">
-<h2>Known issues<a class="headerlink" href="#known-issues" title="Link to this 
heading">#</a></h2>
+<h3>Known issues<a class="headerlink" href="#known-issues" title="Link to this 
heading">#</a></h3>
 <ul class="simple">
 <li><p>Spark Runtime Filtering isn’t <a class="reference external" 
href="https://github.com/apache/datafusion-comet/issues/2116";>working</a></p>
 <ul>
@@ -572,6 +577,50 @@ git<span class="w"> </span>apply<span class="w"> 
</span>../datafusion-comet/dev/
 </li>
 </ul>
 </section>
+</section>
+<section id="native-reader">
+<h2>Native Reader<a class="headerlink" href="#native-reader" title="Link to 
this heading">#</a></h2>
+<p>Comet’s fully-native Iceberg integration does not require modifying Iceberg 
source
+code. Instead, Comet relies on reflection to extract <code class="docutils 
literal notranslate"><span class="pre">FileScanTask</span></code>s from 
Iceberg, which are
+then serialized to Comet’s native execution engine (see
+<a class="reference external" 
href="https://github.com/apache/datafusion-comet/pull/2528";>PR #2528</a>).</p>
+<p>The example below uses Spark’s package downloader to retrieve Comet 0.12.0 
and Iceberg
+1.8.1, but Comet has been tested with Iceberg 1.5, 1.7, 1.8, and 1.10. The key 
configuration
+to enable fully-native Iceberg is <code class="docutils literal 
notranslate"><span 
class="pre">spark.comet.scan.icebergNative.enabled=true</span></code>. This
+configuration should <strong>not</strong> be used with the hybrid Iceberg 
configuration
+<code class="docutils literal notranslate"><span 
class="pre">spark.sql.iceberg.parquet.reader-type=COMET</span></code> from 
above.</p>
+<div class="highlight-shell notranslate"><div 
class="highlight"><pre><span></span><span 
class="nv">$SPARK_HOME</span>/bin/spark-shell<span class="w"> </span><span 
class="se">\</span>
+<span class="w">    </span>--packages<span class="w"> 
</span>org.apache.datafusion:comet-spark-spark3.5_2.12:0.12.0,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.iceberg:iceberg-core:1.8.1<span
 class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--repositories<span class="w"> 
</span>https://repo1.maven.org/maven2/<span class="w"> </span><span 
class="se">\</span>
+<span class="w">    </span>--conf<span class="w"> 
</span>spark.sql.extensions<span 
class="o">=</span>org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions<span
 class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--conf<span class="w"> 
</span>spark.sql.catalog.spark_catalog<span 
class="o">=</span>org.apache.iceberg.spark.SparkCatalog<span class="w"> 
</span><span class="se">\</span>
+<span class="w">    </span>--conf<span class="w"> 
</span>spark.sql.catalog.spark_catalog.type<span class="o">=</span>hadoop<span 
class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--conf<span class="w"> 
</span>spark.sql.catalog.spark_catalog.warehouse<span 
class="o">=</span>/tmp/warehouse<span class="w"> </span><span 
class="se">\</span>
+<span class="w">    </span>--conf<span class="w"> </span>spark.plugins<span 
class="o">=</span>org.apache.spark.CometPlugin<span class="w"> </span><span 
class="se">\</span>
+<span class="w">    </span>--conf<span class="w"> 
</span>spark.shuffle.manager<span 
class="o">=</span>org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager<span
 class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--conf<span class="w"> 
</span>spark.sql.extensions<span 
class="o">=</span>org.apache.comet.CometSparkSessionExtensions<span class="w"> 
</span><span class="se">\</span>
+<span class="w">    </span>--conf<span class="w"> 
</span>spark.comet.scan.icebergNative.enabled<span class="o">=</span><span 
class="nb">true</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--conf<span class="w"> 
</span>spark.comet.explainFallback.enabled<span class="o">=</span><span 
class="nb">true</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--conf<span class="w"> 
</span>spark.memory.offHeap.enabled<span class="o">=</span><span 
class="nb">true</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--conf<span class="w"> 
</span>spark.memory.offHeap.size<span class="o">=</span>2g
+</pre></div>
+</div>
+<p>The same sample queries from above can be used to test Comet’s fully-native 
Iceberg integration,
+however the scan node to look for is <code class="docutils literal 
notranslate"><span class="pre">CometIcebergNativeScan</span></code>.</p>
+<section id="current-limitations">
+<h3>Current limitations<a class="headerlink" href="#current-limitations" 
title="Link to this heading">#</a></h3>
+<p>The following scenarios are not yet supported, but are work in progress:</p>
+<ul class="simple">
+<li><p>Iceberg table spec v3 scans will fall back.</p></li>
+<li><p>Iceberg writes will fall back.</p></li>
+<li><p>Iceberg table scans backed by Avro or ORC data files will fall 
back.</p></li>
+<li><p>Iceberg table scans partitioned on <code class="docutils literal 
notranslate"><span class="pre">BINARY</span></code> or <code class="docutils 
literal notranslate"><span class="pre">DECIMAL</span></code> (with precision 
&gt;28) columns will fall back.</p></li>
+<li><p>Iceberg scans with residual filters (<em>i.e.</em>, filter expressions 
that are not partition values,
+and are evaluated on the column values at scan time) of <code class="docutils 
literal notranslate"><span class="pre">truncate</span></code>, <code 
class="docutils literal notranslate"><span class="pre">bucket</span></code>, 
<code class="docutils literal notranslate"><span 
class="pre">year</span></code>, <code class="docutils literal 
notranslate"><span class="pre">month</span></code>,
+<code class="docutils literal notranslate"><span 
class="pre">day</span></code>, <code class="docutils literal notranslate"><span 
class="pre">hour</span></code> will fall back.</p></li>
+</ul>
+</section>
+</section>
 </section>
 
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to