[ 
https://issues.apache.org/jira/browse/ARROW-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman resolved ARROW-8136.
---------------------------------
    Resolution: Fixed

Issue resolved by pull request 6643
[https://github.com/apache/arrow/pull/6643]

> [C++][Python] Creating dataset from relative path no longer working
> -------------------------------------------------------------------
>
>                 Key: ARROW-8136
>                 URL: https://issues.apache.org/jira/browse/ARROW-8136
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>            Reporter: Joris Van den Bossche
>            Assignee: Joris Van den Bossche
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.17.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Since https://github.com/apache/arrow/pull/6597, local relative paths don't 
> work anymore:
> {code}
> In [1]: import pyarrow.dataset as ds  
> In [2]: ds.dataset("test.parquet")  
> ---------------------------------------------------------------------------
> ArrowInvalid                              Traceback (most recent call last)
> <ipython-input-2-23ecfce52d13> in <module>
> ----> 1 ds.dataset("test.parquet")
> ~/scipy/repos/arrow/python/pyarrow/dataset.py in dataset(paths_or_factories, 
> filesystem, partitioning, format)
>     327 
>     328     if isinstance(paths_or_factories, str):
> --> 329         return factory(paths_or_factories, **kwargs).finish()
>     330 
>     331     if not isinstance(paths_or_factories, list):
> ~/scipy/repos/arrow/python/pyarrow/dataset.py in factory(path_or_paths, 
> filesystem, partitioning, format)
>     246     factories = []
>     247     for path in path_or_paths:
> --> 248         fs, paths_or_selector = _ensure_fs_and_paths(path, filesystem)
>     249         factories.append(FileSystemDatasetFactory(fs, 
> paths_or_selector,
>     250                                                   format, options))
> ~/scipy/repos/arrow/python/pyarrow/dataset.py in _ensure_fs_and_paths(path, 
> filesystem)
>     165     from pyarrow.fs import FileType, FileSelector
>     166 
> --> 167     filesystem, path = _ensure_fs(filesystem, _stringify_path(path))
>     168     infos = filesystem.get_target_infos([path])[0]
>     169     if infos.type == FileType.Directory:
> ~/scipy/repos/arrow/python/pyarrow/dataset.py in _ensure_fs(filesystem, path)
>     158     if filesystem is not None:
>     159         return filesystem, path
> --> 160     return FileSystem.from_uri(path)
>     161 
>     162 
> ~/scipy/repos/arrow/python/pyarrow/_fs.pyx in 
> pyarrow._fs.FileSystem.from_uri()
> ~/scipy/repos/arrow/python/pyarrow/error.pxi in 
> pyarrow.lib.pyarrow_internal_check_status()
> ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: URI has empty scheme: 'test.parquet'
> {code}
> [~apitrou] Is this something that should be fixed in 
> {{FileSystemFromUriOrPath}} or rather on the python side? 
> ({{FileSystem.from_uri}} ensures to get the absolute path for Pathlib 
> objects, but not for strings)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to