Re: [I] Import of pyarrow.parquet and loading of non-existing file threw exception with incompatible pandas [arrow]

via GitHub Tue, 15 Apr 2025 23:20:27 -0700


AlenkaF commented on issue #46151:
URL: https://github.com/apache/arrow/issues/46151#issuecomment-2808500280

Thank you for opening up the issue @vadimkantorov.

Pandas is not a required dependency for PyArrow, but a lot of functionality
is designed to work seamlessly when both PyArrow and Pandas are installed.

The error you're encountering is actually coming from the use of an
`Expression` class and `scalar()` method in `dataset.py,` which is imported for
backward compatibility:

https://github.com/apache/arrow/blob/4937cf5721bd4912438964377361c4ec49fd5e80/python/pyarrow/dataset.py#L62-L63

In that part of the code, the Pandas API is used to check whether an object
is array-like.

You can confirm that Pandas is not a required dependency by running the
example you included in this issue without Pandas installed. In that case, you
should see the expected `FileNotFoundError: non-existing.parquet`—the same
error you'd get with a updated version of Pandas.

Regarding Pandas version compatibility, version 1.0 is currently listed
under PyArrow's [optional
dependencies](https://arrow.apache.org/docs/python/install.html#dependencies).
So it may be worth revisiting this, or at least improving the warning message
in the situation you described.

cc @raulcd.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Import of pyarrow.parquet and loading of non-existing file threw exception with incompatible pandas [arrow]

Reply via email to