[ 
https://issues.apache.org/jira/browse/ARROW-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-5310:
--------------------------------
    Fix Version/s: 1.0.0

> [Python] better error message on creating ParquetDataset from empty directory
> -----------------------------------------------------------------------------
>
>                 Key: ARROW-5310
>                 URL: https://issues.apache.org/jira/browse/ARROW-5310
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Assignee: Joris Van den Bossche
>            Priority: Minor
>              Labels: dataset, dataset-parquet-read, parquet
>             Fix For: 1.0.0
>
>
> Currently, you get when {{path}} is an existing but empty directory:
> {code:python}
> >>> dataset = pq.ParquetDataset(path)
> ---------------------------------------------------------------------------
> IndexError                                Traceback (most recent call last)
> <ipython-input-16-346f72ae525e> in <module>
> ----> 1 dataset = pq.ParquetDataset(path)
> ~/scipy/repos/arrow/python/pyarrow/parquet.py in __init__(self, 
> path_or_paths, filesystem, schema, metadata, split_row_groups, 
> validate_schema, filters, metadata_nthreads, memory_map)
>     989 
>     990         if validate_schema:
> --> 991             self.validate_schemas()
>     992 
>     993         if filters is not None:
> ~/scipy/repos/arrow/python/pyarrow/parquet.py in validate_schemas(self)
>    1025                 self.schema = self.common_metadata.schema
>    1026             else:
> -> 1027                 self.schema = self.pieces[0].get_metadata().schema
>    1028         elif self.schema is None:
>    1029             self.schema = self.metadata.schema
> IndexError: list index out of range
> {code}
> That could be a nicer error message. 
> Unless we actually want to allow this? (although I am not sure there are good 
> use cases of empty directories to support this, because from an empty 
> directory we cannot get any schema or metadata information?) 
> It is only failing when validating the schemas, so with 
> {{validate_schema=False}} it actually returns a ParquetDataset object, just 
> with an empty list for {{pieces}} and no schema. So it would be easy to not 
> error when validating the schemas as well for this empty-directory case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to