Joris Van den Bossche created ARROW-5310:
--------------------------------------------

             Summary: [Python] better error message on creating ParquetDataset 
from empty directory
                 Key: ARROW-5310
                 URL: https://issues.apache.org/jira/browse/ARROW-5310
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
            Reporter: Joris Van den Bossche


Currently, you get when {{path}} is an existing but empty directory:

{code:python}
>>> dataset = pq.ParquetDataset(path)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-16-346f72ae525e> in <module>
----> 1 dataset = pq.ParquetDataset(path)

~/scipy/repos/arrow/python/pyarrow/parquet.py in __init__(self, path_or_paths, 
filesystem, schema, metadata, split_row_groups, validate_schema, filters, 
metadata_nthreads, memory_map)
    989 
    990         if validate_schema:
--> 991             self.validate_schemas()
    992 
    993         if filters is not None:

~/scipy/repos/arrow/python/pyarrow/parquet.py in validate_schemas(self)
   1025                 self.schema = self.common_metadata.schema
   1026             else:
-> 1027                 self.schema = self.pieces[0].get_metadata().schema
   1028         elif self.schema is None:
   1029             self.schema = self.metadata.schema

IndexError: list index out of range
{code}

That could be a nicer error message. 

Unless we actually want to allow this? (although I am not sure there are good 
use cases of empty directories to support this, because from an empty directory 
we cannot get any schema or metadata information?) 
It is only failing when validating the schemas, so with 
{{validate_schema=False}} it actually returns a ParquetDataset object, just 
with an empty list for {{pieces}} and no schema. So it would be easy to not 
error when validating the schemas as well for this empty-directory case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to