asfimport opened a new issue, #30800:
URL: https://github.com/apache/arrow/issues/30800

   Add a note to the docs that if partitioning and schema are both specified at 
opening of a dataset and partitioning names are not included in the data, 
schema needs to include the partitioning names (directory or hive partitioning) 
in a case that filtering will be done.
   
   Example:
   
   ```python
   
   import numpy as np
   import pyarrow as pa
   import pyarrow.parquet as pq
   import pyarrow.dataset as ds
   
   # Define the data
   table = pa.table({'one': [-1, np.nan, 2.5],
                      'two': ['foo', 'bar', 'baz'],
                      'three': [True, False, True]})
   
   # Write to partitioned dataset
   # The files will include columns "two" and "three"
   pq.write_to_dataset(table, root_path='dataset_name',
                       partition_cols=['one'])
   
   # Reading the partitioned dataset with schema not including partitioned names
   # will error
   
   schema = pa.schema([("three", "double")])
   data = ds.dataset("dataset_name", partitioning="hive", schema=schema)
   subset = ds.field("one") == 2.5
   data.to_table(filter=subset)
   
   # And will not if done like so:
   schema = pa.schema([("three", "double"), ("one", "double")])
   data = ds.dataset("dataset_name", partitioning="hive", schema=schema)
   subset = ds.field("one") == 2.5
   data.to_table(filter=subset)
   
   ```
   
   **Reporter**: [Alenka 
Frim](https://issues.apache.org/jira/browse/ARROW-15311) / @AlenkaF
   
   <sub>**Note**: *This issue was originally created as 
[ARROW-15311](https://issues.apache.org/jira/browse/ARROW-15311). Please see 
the [migration documentation](https://github.com/apache/arrow/issues/14542) for 
further details.*</sub>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to