[ https://issues.apache.org/jira/browse/ARROW-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-18269: ----------------------------------- Labels: good-first-issue pull-request-available (was: good-first-issue) > [C++] Slash character in partition value handling > ------------------------------------------------- > > Key: ARROW-18269 > URL: https://issues.apache.org/jira/browse/ARROW-18269 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Affects Versions: 10.0.0 > Reporter: Vadym Dytyniak > Assignee: Vibhatha Lakmal Abeykoon > Priority: Major > Labels: good-first-issue, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > > Provided example shows that pyarrow does not handle partition value that > contains '/' correctly: > {code:java} > import pandas as pd > import pyarrow as pa > from pyarrow import dataset as ds > df = pd.DataFrame({ > 'value': [1, 2], > 'instrument_id': ['A/Z', 'B'], > }) > ds.write_dataset( > data=pa.Table.from_pandas(df), > base_dir='data', > format='parquet', > partitioning=['instrument_id'], > partitioning_flavor='hive', > ) > table = ds.dataset( > source='data', > format='parquet', > partitioning='hive', > ).to_table() > tables = [table] > df = pa.concat_tables(tables).to_pandas() tables = [table] > df = pa.concat_tables(tables).to_pandas() > print(df.head()){code} > Result: > {code:java} > value instrument_id > 0 1 A > 1 2 B {code} > Expected behaviour: > Option 1: Result should be: > {code:java} > value instrument_id > 0 1 A/Z > 1 2 B {code} > Option 2: Error should be raised to avoid '/' in partition value. > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)