[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

GitBox Mon, 28 Mar 2022 07:28:23 -0700


jorisvandenbossche commented on a change in pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#discussion_r836329892




##########
File path: python/pyarrow/_compute.pyx
##########
@@ -2127,21 +2138,76 @@ cdef class Expression(_Weakrefable):
         return Expression._call("divide_checked", [self, other])
 
     def is_valid(self):
-        """Checks whether the expression is not-null (valid)"""
+        """
+        Check whether the expression is not-null (valid).
+
+        This creates a new expression equivalent to calling the
+        `is_valid` compute function on this expression.
+
+        Returns
+        -------
+        is_valid : Expression

Review comment:
       Not that it needs to be changed for this PR, but just a general future 
note: I think putting only "Expression" (i.e. the type) on this line is valid 
as well for numpydoc, and personally I find the repeating of the name 
"is_valid: " not giving any added value.

##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -816,48 +843,25 @@ cdef class Fragment(_Weakrefable):
             Schema to use for scanning. This is used to unify a Fragment to
             it's Dataset's schema. If not specified this will use the
             Fragment's physical schema which might differ for each Fragment.
-        columns : list of str, default None

Review comment:
       For a separate JIRA: it might be nice to actually share this part of the 
docstring in both places (so users don't have to check another object's 
docstring to see the help), but for now it's good to deduplicate this and have 
a single up-to-date version.

##########
File path: python/pyarrow/table.pxi
##########
@@ -1966,9 +2085,12 @@ cdef class Table(_PandasConvertible):
 
         return pyarrow_wrap_table(c_table)
 
-    def to_batches(self, max_chunksize=None, **kwargs):
+    def to_batches(self, max_chunksize=None):
         """
-        Convert Table to list of (contiguous) RecordBatch objects.
+        Convert Table to a list of RecordBatch objects.
+
+        Note that this method is zero-copy, it merely exposes the same data
+        under a different API.

Review comment:
       This is not necessarily true, as the chunking of each column does not 
have to be the same? (in which case you will get some copies?)

##########
File path: python/pyarrow/_compute.pyx
##########
@@ -2127,21 +2138,76 @@ cdef class Expression(_Weakrefable):
         return Expression._call("divide_checked", [self, other])
 
     def is_valid(self):
-        """Checks whether the expression is not-null (valid)"""
+        """
+        Check whether the expression is not-null (valid).
+
+        This creates a new expression equivalent to calling the
+        `is_valid` compute function on this expression.
+
+        Returns
+        -------
+        is_valid : Expression

Review comment:
       Or did numpydoc complain about this? (seeing you changes this also in 
some existing docstrings)

##########
File path: python/pyarrow/table.pxi
##########
@@ -1966,9 +2085,12 @@ cdef class Table(_PandasConvertible):
 
         return pyarrow_wrap_table(c_table)
 
-    def to_batches(self, max_chunksize=None, **kwargs):
+    def to_batches(self, max_chunksize=None):
         """
-        Convert Table to list of (contiguous) RecordBatch objects.
+        Convert Table to a list of RecordBatch objects.
+
+        Note that this method is zero-copy, it merely exposes the same data
+        under a different API.

Review comment:
       Actually, I suppose we still only _slice_ data in that case, so still 
being zero-copy?

##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -700,17 +700,43 @@ cdef class FileFormat(_Weakrefable):
         return self.wrapped
 
     def inspect(self, file, filesystem=None):
-        """Infer the schema of a file."""
+        """
+        Infer the schema of a file.
+
+        Parameters
+        ----------
+        file : file-like object, path-like or str
+            The file or file path to infer a schema from.
+        filesystem : Filesystem, optional
+            If `filesystem` is given, `file` must be a string and specifies
+            the path of the file to read from the filesystem.
+
+        Returns
+        -------
+        schema : Schema
+            The schema inferred from the file
+        """
         c_source = _make_file_source(file, filesystem)
         c_schema = GetResultValue(self.format.Inspect(c_source))
         return pyarrow_wrap_schema(move(c_schema))
 
     def make_fragment(self, file, filesystem=None,
                       Expression partition_expression=None):
         """
-        Make a FileFragment of this FileFormat. The filter may not reference
-        fields absent from the provided schema. If no schema is provided then
-        one will be inferred.
+        Make a FileFragment from a given file.
+
+        The filter may not reference fields absent from the provided schema.
+        If no schema is provided then one will be inferred.

Review comment:
       This was already in the original docstring, but the "if not schema is 
provided" seems wrong, given there is no way to provide a schema. 

##########
File path: python/pyarrow/io.pxi
##########
@@ -1257,6 +1283,10 @@ cdef class BufferReader(NativeFile):
     cdef:
         Buffer buffer
 
+    # XXX Needed to make numpydoc happy
+    def __init__(self, obj):

Review comment:
       Do you remember why this was needed? (which error did it give?)
   
   Aside, we should probably try to instruct sphinx/autodoc/numpydoc to not 
include `__init__` functions in the reference docs, as I don't think we have 
any case where this adds value (compared to the class docstring). See eg 
https://arrow.apache.org/docs/dev/python/generated/pyarrow.parquet.ParquetDataset.html#pyarrow.parquet.ParquetDataset.__init__

##########
File path: python/pyarrow/array.pxi
##########
@@ -849,19 +883,48 @@ cdef class Array(_PandasConvertible):
     def sum(self, **kwargs):
         """
         Sum the values in a numerical array.
+
+        See `pyarrow.compute.sum` for full usage.

Review comment:
       We could make this:
   
   ```suggestion
           See :func:`pyarrow.compute.sum` for full usage.
   ```
   
   which would give a proper link in the html version (and not too much visual 
noise in the text version)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Reply via email to