Re: [PR] [Python][Docs] Adds type checks for source in read_table [arrow]

via GitHub Fri, 09 May 2025 03:40:02 -0700


raulcd commented on code in PR #46330:
URL: https://github.com/apache/arrow/pull/46330#discussion_r2081406088



##########
python/pyarrow/parquet/core.py:
##########
@@ -1825,7 +1826,14 @@ def read_table(source, *, columns=None, use_threads=True,
         filesystem, path = _resolve_filesystem_and_path(source, filesystem)
         if filesystem is not None:
             source = filesystem.open_input_file(path)
-        # TODO test that source is not a directory or a list
+        if not (
+            isinstance(source, str)

Review Comment:
   The `#TODO` pointed out to validate this was not a directory and checking 
whether this is a string is not validating whether source is a file name or a 
directory.



##########
python/pyarrow/parquet/core.py:
##########
@@ -1825,7 +1826,14 @@ def read_table(source, *, columns=None, use_threads=True,
         filesystem, path = _resolve_filesystem_and_path(source, filesystem)
         if filesystem is not None:
             source = filesystem.open_input_file(path)
-        # TODO test that source is not a directory or a list
+        if not (
+            isinstance(source, str)
+            or isinstance(source, pa.NativeFile)
+            or hasattr(source, "read")
+        ):
+            raise ValueError(
+                "source should be a file name, a pyarrow.NativeFile or a 
file-like object"
+            )

Review Comment:
   Let's use the same format as the other `ValueError` messaged on the code 
block.
   ```suggestion
               raise ValueError(
                   "source should be a file name, a pyarrow.NativeFile or a 
file-like object "
                   "when the pyarrow.dataset module is not available"
               )
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [Python][Docs] Adds type checks for source in read_table [arrow]

Reply via email to