[ 
https://issues.apache.org/jira/browse/ARROW-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264393#comment-17264393
 ] 

Joris Van den Bossche commented on ARROW-10264:
-----------------------------------------------

[~apitrou] in case this gives an indication. Based on the test run I started in 
the PR and the printed output, the following gets run:

{code}
pq.read_table("hdfs://impala:8020/tmp/pyarrow-test-419/multi-parquet-uri-ba686ecdea454d34b45bffc4756f0387")
{code}

and which then errors with (from inside the C++ dataset factory code):

{code}
pyarrow.lib.ArrowInvalid: Path 
'/tmp/pyarrow-test-419/multi-parquet-uri-ba686ecdea454d34b45bffc4756f0387/0.parquet'
 is not relative to '/user/root'
{code}

But I don't know eg where this "/user/root" comes from, or whether the original 
URI is a valid URI (if not, that's a problem with our test code generating the 
URI)


> [C++][Python] Parquet test failing with HadoopFileSystem URI
> ------------------------------------------------------------
>
>                 Key: ARROW-10264
>                 URL: https://issues.apache.org/jira/browse/ARROW-10264
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Priority: Major
>              Labels: filesystem, hdfs, pull-request-available
>             Fix For: 3.0.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Follow-up on ARROW-10175. In the HDFS integration tests, there is a test 
> using a URI failing if we use the new filesystem / dataset implementation:
> {code}
> FAILED 
> opt/conda/envs/arrow/lib/python3.7/site-packages/pyarrow/tests/test_hdfs.py::TestLibHdfs::test_read_multiple_parquet_files_with_uri
> {code}
> fails with
> {code}
> pyarrow.lib.ArrowInvalid: Path 
> '/tmp/pyarrow-test-838/multi-parquet-uri-48569714efc74397816722c9c6723191/0.parquet'
>  is not relative to '/user/root'
> {code}
> while it is passing a URI (and not a filesystem object) to 
> {{parquet.read_table}}, and the new filesystems/dataset implementation should 
> be able to handle URIs.
> cc [~apitrou]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to