Re: [PR] docs: add documentation for fully-native Iceberg scans [datafusion-comet]

via GitHub Tue, 09 Dec 2025 12:39:37 -0800


comphead commented on code in PR #2868:
URL: https://github.com/apache/datafusion-comet/pull/2868#discussion_r2604284790



##########
docs/source/user-guide/latest/iceberg.md:
##########
@@ -140,7 +145,50 @@ scala> spark.sql(s"SELECT * from t1").explain()
 +- CometBatchScan spark_catalog.default.t1[c0#26, c1#27] 
spark_catalog.default.t1 (branch=null) [filters=, groupedBy=] RuntimeFilters: []
 ```
 
-## Known issues
+### Known issues
 
 - Spark Runtime Filtering isn't 
[working](https://github.com/apache/datafusion-comet/issues/2116)
   - You can bypass the issue by either setting 
`spark.sql.adaptive.enabled=false` or 
`spark.comet.exec.broadcastExchange.enabled=false`
+
+## Fully-Native Execution
+
+Comet's fully-native Iceberg integration does not require modifying Iceberg 
source
+code. Instead, Comet relies on reflection to extract `FileScanTask`s from 
Iceberg, which are
+then serialized to Comet's native execution engine (see
+[PR #2528](https://github.com/apache/datafusion-comet/pull/2528)).
+
+The example below uses Spark's package downloader to retrieve Comet 0.12.0 and 
Iceberg
+1.8.1, but Comet has been tested with Iceberg 1.5, 1.7, 1.8, and 1.10. The key 
configuration
+to enable fully-native Iceberg is 
`spark.comet.scan.icebergNative.enabled=true`. This
+configuration should **not** be used with the hybrid Iceberg configuration
+`spark.sql.iceberg.parquet.reader-type=COMET` from above.
+
+```shell
+$SPARK_HOME/bin/spark-shell \
+    --packages 
org.apache.datafusion:comet-spark-spark3.5_2.12:0.12.0,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.iceberg:iceberg-core:1.8.1
 \
+    --repositories https://repo1.maven.org/maven2/ \
+    --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
+    --conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog \
+    --conf spark.sql.catalog.spark_catalog.type=hadoop \
+    --conf spark.sql.catalog.spark_catalog.warehouse=/tmp/warehouse \
+    --conf spark.plugins=org.apache.spark.CometPlugin \
+    --conf 
spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager
 \
+    --conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions \
+    --conf spark.comet.scan.icebergNative.enabled=true \
+    --conf spark.comet.explainFallback.enabled=true \
+    --conf spark.memory.offHeap.enabled=true \
+    --conf spark.memory.offHeap.size=2g
+```
+
+The same sample queries from above can be used to test Comet's fully-native 
Iceberg integration,
+however the scan node to look for is `CometIcebergNativeScan`.
+
+### Current limitations
+
+- Iceberg table spec v3 scans will fall back.

Review Comment:
   perhaps lets name it as `not supported` or `work in progress`  ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] docs: add documentation for fully-native Iceberg scans [datafusion-comet]

Reply via email to