hsiang-c commented on code in PR #2868: URL: https://github.com/apache/datafusion-comet/pull/2868#discussion_r2604498864
########## docs/source/user-guide/latest/iceberg.md: ########## @@ -140,7 +145,52 @@ scala> spark.sql(s"SELECT * from t1").explain() +- CometBatchScan spark_catalog.default.t1[c0#26, c1#27] spark_catalog.default.t1 (branch=null) [filters=, groupedBy=] RuntimeFilters: [] ``` -## Known issues +### Known issues - Spark Runtime Filtering isn't [working](https://github.com/apache/datafusion-comet/issues/2116) - You can bypass the issue by either setting `spark.sql.adaptive.enabled=false` or `spark.comet.exec.broadcastExchange.enabled=false` + +## Native Reader + +Comet's fully-native Iceberg integration does not require modifying Iceberg source +code. Instead, Comet relies on reflection to extract `FileScanTask`s from Iceberg, which are +then serialized to Comet's native execution engine (see +[PR #2528](https://github.com/apache/datafusion-comet/pull/2528)). + +The example below uses Spark's package downloader to retrieve Comet 0.12.0 and Iceberg +1.8.1, but Comet has been tested with Iceberg 1.5, 1.7, 1.8, and 1.10. The key configuration +to enable fully-native Iceberg is `spark.comet.scan.icebergNative.enabled=true`. This +configuration should **not** be used with the hybrid Iceberg configuration +`spark.sql.iceberg.parquet.reader-type=COMET` from above. + +```shell +$SPARK_HOME/bin/spark-shell \ + --packages org.apache.datafusion:comet-spark-spark3.5_2.12:0.12.0,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.iceberg:iceberg-core:1.8.1 \ Review Comment: If I understand it correctly, you're talking about the `core` or `api` classes you used in [IcebergReflection](https://github.com/apache/datafusion-comet/blob/main/spark/src/main/scala/org/apache/comet/iceberg/IcebergReflection.scala#L36-L48). ```shell jar tf iceberg-spark-runtime-3.5_2.12-1.10.0.jar | grep -E "org.apache.iceberg.ContentScanTask.class|org.apache.iceberg.FileScanTask.class|org.apache.iceberg.ContentFile.class|org.apache.iceberg.StructLike.class|org.apache.iceberg.PartitionScanTask.class|org.apache.iceberg.DeleteFile.class|org.apache.iceberg.expressions.Literal.class|org.apache.iceberg.SchemaParser.class|org.apache.iceberg.Schema.class|org.apache.iceberg.PartitionSpecParser.class|org.apache.iceberg.PartitionSpec.class|org.apache.iceberg.PartitionField.class|org/apache/iceberg/expressions/UnboundPredicate.class" org/apache/iceberg/PartitionSpecParser.class org/apache/iceberg/SchemaParser.class org/apache/iceberg/ContentFile.class org/apache/iceberg/ContentScanTask.class org/apache/iceberg/DeleteFile.class org/apache/iceberg/FileScanTask.class org/apache/iceberg/PartitionField.class org/apache/iceberg/PartitionScanTask.class org/apache/iceberg/PartitionSpec.class org/apache/iceberg/Schema.class org/apache/iceberg/StructLike.class org/apache/iceberg/expressions/Literal.class org/apache/iceberg/expressions/UnboundPredicate.class ``` jar can be found from https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark-runtime-3.5_2.12/1.10.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
