timsaucer commented on code in PR #1243:
URL: 
https://github.com/apache/datafusion-python/pull/1243#discussion_r2366188433


##########
docs/source/user-guide/io/table_provider.rst:
##########
@@ -39,20 +39,47 @@ A complete example can be found in the `examples folder 
<https://github.com/apac
         ) -> PyResult<Bound<'py, PyCapsule>> {
             let name = CString::new("datafusion_table_provider").unwrap();
 
-            let provider = Arc::new(self.clone())
-                .map_err(|e| PyRuntimeError::new_err(e.to_string()))?;
-            let provider = FFI_TableProvider::new(Arc::new(provider), false);
+            let provider = Arc::new(self.clone());
+            let provider = FFI_TableProvider::new(provider, false, None);
 
             PyCapsule::new_bound(py, provider, Some(name.clone()))
         }
     }
 
-Once you have this library available, in python you can register your table 
provider
-to the ``SessionContext``.
+Once you have this library available, you can construct a
+:py:class:`~datafusion.TableProvider` in Python and register it with the
+``SessionContext``.  Table providers can be created either from the PyCapsule 
exposed by
+your Rust provider or from an existing 
:py:class:`~datafusion.dataframe.DataFrame`.
+Call the provider's ``__datafusion_table_provider__()`` method to obtain the 
capsule
+before constructing a ``TableProvider``. The ``TableProvider.from_view()`` 
helper is
+deprecated; instead use ``TableProvider.from_dataframe()`` or 
``DataFrame.into_view()``.
+
+.. note::
+
+   :py:meth:`~datafusion.context.SessionContext.register_table_provider` is
+   deprecated. Use
+   :py:meth:`~datafusion.context.SessionContext.register_table` with the
+   resulting :py:class:`~datafusion.TableProvider` instead.
 
 .. code-block:: python
 
+    from datafusion import SessionContext, TableProvider
+
+    ctx = SessionContext()
     provider = MyTableProvider()
-    ctx.register_table_provider("my_table", provider)
 
-    ctx.table("my_table").show()
+    capsule = provider.__datafusion_table_provider__()
+    capsule_provider = TableProvider.from_capsule(capsule)
+
+    df = ctx.from_pydict({"a": [1]})
+    view_provider = TableProvider.from_dataframe(df)
+    # or: view_provider = df.into_view()
+
+    ctx.register_table("capsule_table", capsule_provider)
+    ctx.register_table("view_table", view_provider)
+
+    ctx.table("capsule_table").show()
+    ctx.table("view_table").show()

Review Comment:
   This example takes a bit of cognitive load to understand what we're 
demonstrating.
   
   First off, similar to my comments above I don't think we want our users to 
have to think about if they're using something that comes from a PyCapsule 
interface or not. Suppose I am a library user and I get a delta table object 
that implements PyCapsule. As a user of that library, I shouldn't have to 
understand how the interfacing works. I should just be able to use it directly. 
So I want to be able to just pass those objects directly to `TableProvider` or 
`register_table` without having to think about or understand these mechanics 
behind the scene.



##########
dev/changelog/49.0.0.md:
##########
@@ -25,6 +25,10 @@ This release consists of 16 commits from 7 contributors. See 
credits at the end
 
 - fix(build): Include build.rs in published crates 
[#1199](https://github.com/apache/datafusion-python/pull/1199) (colinmarc)
 
+**Deprecations:**
+
+- Document that `SessionContext.register_table_provider` is deprecated in 
favor of `SessionContext.register_table`.
+

Review Comment:
   These changelogs are automatically generated, so I don't think we want to 
make changes here. Regardless, these would go into the 51.0.0 release.



##########
docs/source/user-guide/data-sources.rst:
##########
@@ -152,13 +152,22 @@ as Delta Lake. This will require a recent version of
 .. code-block:: python
 
     from deltalake import DeltaTable
+    from datafusion import TableProvider
 
     delta_table = DeltaTable("path_to_table")
-    ctx.register_table_provider("my_delta_table", delta_table)
+    provider = 
TableProvider.from_capsule(delta_table.__datafusion_table_provider__())
+    ctx.register_table("my_delta_table", provider)

Review Comment:
   This feels like a worse experience than before. Why can we not just call 
`register_table("my_delta_table", delta_table)`?



##########
python/datafusion/__init__.py:
##########
@@ -21,24 +21,26 @@
 See https://datafusion.apache.org/python for more information.
 """
 
+# isort: skip_file  # Prevent import-sorting linter errors (I001)
+# ruff: noqa: I001

Review Comment:
   Is this ruff lint causing a problem?



##########
docs/source/conf.py:
##########
@@ -94,6 +97,10 @@ def autoapi_skip_member_fn(app, what, name, obj, skip, 
options) -> bool:  # noqa
     if (what, name) in skip_contents:
         skip = True
 
+    # Skip private members that start with underscore to avoid duplication
+    if name.split(".")[-1].startswith("_") and what in ("data", "variable"):
+        skip = True
+

Review Comment:
   So I can understand better, why do we need both this rule and the one above 
in lines 86-88?



##########
python/datafusion/dataframe.py:
##########
@@ -313,9 +315,29 @@ def __init__(self, df: DataFrameInternal) -> None:
         """
         self.df = df
 
-    def into_view(self) -> pa.Table:
-        """Convert DataFrame as a ViewTable which can be used in 
register_table."""
-        return self.df.into_view()
+    def into_view(self) -> TableProvider:
+        """Convert ``DataFrame`` into a ``TableProvider`` view for 
registration.
+
+        This is the preferred way to obtain a view for
+        :py:meth:`~datafusion.context.SessionContext.register_table`.

Review Comment:
   I don't understand this statement.



##########
python/datafusion/dataframe.py:
##########
@@ -313,9 +315,29 @@ def __init__(self, df: DataFrameInternal) -> None:
         """
         self.df = df
 
-    def into_view(self) -> pa.Table:
-        """Convert DataFrame as a ViewTable which can be used in 
register_table."""
-        return self.df.into_view()
+    def into_view(self) -> TableProvider:
+        """Convert ``DataFrame`` into a ``TableProvider`` view for 
registration.
+
+        This is the preferred way to obtain a view for
+        :py:meth:`~datafusion.context.SessionContext.register_table`.
+        ``datafusion.TableProvider.from_dataframe`` calls this method under 
the hood,
+        and the older ``TableProvider.from_view`` helper is deprecated.
+
+        The ``DataFrame`` remains valid after conversion, so it can still be 
used for
+        additional queries alongside the returned view.
+
+        Examples:
+            >>> from datafusion import SessionContext
+            >>> ctx = SessionContext()
+            >>> df = ctx.sql("SELECT 1 AS value")
+            >>> provider = df.into_view()

Review Comment:
   From an end user's perspective, they turn a dataframe into a view, which 
they then register so they can use it later. I don't think this end user needs 
to understand the concept of TableProvider at all. In the example I would 
change the variable name `provider` to `view`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to