[GitHub] [arrow] westonpace edited a comment on pull request #11014: ARROW-13775: [C++] Allow Partitioning objects to be created with a vector of field names

GitBox Mon, 30 Aug 2021 11:47:37 -0700


westonpace edited a comment on pull request #11014:
URL: https://github.com/apache/arrow/pull/11014#issuecomment-908595122



   > What would be a use case? (I don't see any test where it is actually used 
for something) I suppose as alternative for #11008, right?
   @jorisvandenbossche 
   
   Yes, but not as an alternative, this would be to enable #11008.  I don't 
believe #11008 works today.  `ds.partitioning(field_names=['a'])` returns a 
partitioning factory and not a partitioning.  It also returns an error when 
specifying the hive format.  A partitioning factory cannot be used for writing 
datasets (only reading)...
   
   ```
   >>> import pyarrow
   >>> import pyarrow as pa
   >>> import pyarrow.dataset as ds
   >>> table = pa.Table.from_pydict({'a': [1, 2, 3], 'b': ['x', 'y', 'z']})
   >>> ds.partitioning(field_names=['a'])
   <pyarrow._dataset.PartitioningFactory object at 0x7f2410d4c170>
   >>> part = ds.partitioning(field_names=['a'])
   >>> ds.write_dataset(table, '/tmp/new_dataset', format='parquet', 
partitioning=part)
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File 
"/home/pace/anaconda3/envs/arrow-release-5/lib/python3.9/site-packages/pyarrow/dataset.py",
 line 791, in write_dataset
       partitioning = _ensure_write_partitioning(partitioning)
     File 
"/home/pace/anaconda3/envs/arrow-release-5/lib/python3.9/site-packages/pyarrow/dataset.py",
 line 686, in _ensure_write_partitioning
       raise ValueError("partitioning needs to be actual Partitioning object")
   ValueError: partitioning needs to be actual Partitioning object
   ```
   
   I didn't add real world use cases (or python bindings) because I figured 
those would be covered by #11008.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace edited a comment on pull request #11014: ARROW-13775: [C++] Allow Partitioning objects to be created with a vector of field names

Reply via email to