This is an automated email from the ASF dual-hosted git repository.
timsaucer pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-python.git
The following commit(s) were added to refs/heads/main by this push:
new b7d3519d docs: add apache iceberg as datafusion data source (#1240)
b7d3519d is described below
commit b7d3519d395025183a06aad268ee30e61f8226df
Author: Kevin Liu <[email protected]>
AuthorDate: Tue Sep 16 14:32:35 2025 -0700
docs: add apache iceberg as datafusion data source (#1240)
* add iceberg as data source
* fix warning
---
docs/source/user-guide/data-sources.rst | 37 ++++++++++++++++++++++++++++++---
1 file changed, 34 insertions(+), 3 deletions(-)
diff --git a/docs/source/user-guide/data-sources.rst
b/docs/source/user-guide/data-sources.rst
index 7d07c67d..a9b119b9 100644
--- a/docs/source/user-guide/data-sources.rst
+++ b/docs/source/user-guide/data-sources.rst
@@ -172,10 +172,41 @@ which can lead to a significant performance difference.
df = ctx.table("my_delta_table")
df.show()
-Iceberg
--------
+Apache Iceberg
+--------------
-Coming soon!
+DataFusion 45.0.0 and later support the ability to register Apache Iceberg
tables as table providers through the Custom Table Provider interface.
+
+This requires either the `pyiceberg <https://pypi.org/project/pyiceberg/>`__
library (>=0.10.0) or the `pyiceberg-core
<https://pypi.org/project/pyiceberg-core/>`__ library (>=0.5.0).
+
+* The ``pyiceberg-core`` library exposes Iceberg Rust's implementation of the
Custom Table Provider interface as python bindings.
+* The ``pyiceberg`` library utilizes the ``pyiceberg-core`` python bindings
under the hood and provides a native way for Python users to interact with the
DataFusion.
+
+.. code-block:: python
+
+ from datafusion import SessionContext
+ from pyiceberg.catalog import load_catalog
+ import pyarrow as pa
+
+ # Load catalog and create/load a table
+ catalog = load_catalog("catalog", type="in-memory")
+ catalog.create_namespace_if_not_exists("default")
+
+ # Create some sample data
+ data = pa.table({"x": [1, 2, 3], "y": [4, 5, 6]})
+ iceberg_table = catalog.create_table("default.test", schema=data.schema)
+ iceberg_table.append(data)
+
+ # Register the table with DataFusion
+ ctx = SessionContext()
+ ctx.register_table_provider("test", iceberg_table)
+
+ # Query the table using DataFusion
+ ctx.table("test").show()
+
+
+Note that the Datafusion integration rely on features from the `Iceberg Rust
<https://github.com/apache/iceberg-rust/>`_ implementation instead of the
`PyIceberg <https://github.com/apache/iceberg-python/>`_ implementation.
+Features that are available in PyIceberg but not yet in Iceberg Rust will not
be available when using DataFusion.
Custom Table Provider
---------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]