westonpace edited a comment on pull request #11014: URL: https://github.com/apache/arrow/pull/11014#issuecomment-908595122
> What would be a use case? (I don't see any test where it is actually used for something) I suppose as alternative for #11008, right? @jorisvandenbossche Yes, but not as an alternative, this would be to enable #11008. I don't believe #11008 works today. `ds.partitioning(field_names=['a'])` returns a partitioning factory and not a partitioning. It also returns an error when specifying the hive format. A partitioning factory cannot be used for writing datasets (only reading)... ``` >>> import pyarrow >>> import pyarrow as pa >>> import pyarrow.dataset as ds >>> table = pa.Table.from_pydict({'a': [1, 2, 3], 'b': ['x', 'y', 'z']}) >>> ds.partitioning(field_names=['a']) <pyarrow._dataset.PartitioningFactory object at 0x7f2410d4c170> >>> part = ds.partitioning(field_names=['a']) >>> ds.write_dataset(table, '/tmp/new_dataset', format='parquet', partitioning=part) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/pace/anaconda3/envs/arrow-release-5/lib/python3.9/site-packages/pyarrow/dataset.py", line 791, in write_dataset partitioning = _ensure_write_partitioning(partitioning) File "/home/pace/anaconda3/envs/arrow-release-5/lib/python3.9/site-packages/pyarrow/dataset.py", line 686, in _ensure_write_partitioning raise ValueError("partitioning needs to be actual Partitioning object") ValueError: partitioning needs to be actual Partitioning object ``` I didn't add real world use cases (or python bindings) because I figured those would be covered by #11008. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org