jorisvandenbossche commented on issue #11027:
URL: https://github.com/apache/arrow/issues/11027#issuecomment-909174498
Ah, ARROW-12644 indeed only implemented the _decoding_ when reading, not the
equivalent _encoding_ when writing. But so if we can read such datasets, we
should probably also enable to write them? (will open a JIRA about that)
@wanx4910 To show that we can read values with encoded `/` (illustrating
what @westonpace mentioned above), I created a small dataset with two
directories with URL encoded values (using a european date format of
2012/01/01):
```
In [44]: !ls test_decoding.parquet/
2012%2F01%2F01 2012%2F01%2F02
In [45]: dataset = ds.dataset("test_decoding.parquet/",
partitioning=["date"], format="parquet")
In [46]: dataset
Out[46]: <pyarrow._dataset.FileSystemDataset at 0x7f110c345770>
In [47]: dataset.to_table().to_pandas()
Out[47]:
b date
0 1 2012/01/01
1 2 2012/01/02
```
So when reading, we can properly decode such values.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]