[ https://issues.apache.org/jira/browse/ARROW-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ludwik Bielczynski updated ARROW-7782: -------------------------------------- Description: One cannot save the index when using {{pyarrow.parquet.write_to_dataset()}} with given partition_cols arguments. Here I have created a minimal example which shows the issue: {code:java} from pathlib import Path import pandas as pd from pyarrow import Table from pyarrow.parquet import write_to_dataset path = Path('/home/user/trials') file_name = 'local_database.parquet' df = pd.DataFrame({"A": [1, 2, 3], "B": ['a', 'a', 'b']}, index=pd.Index(['a', 'b', 'c'], name='idx')) table = Table.from_pandas(df) write_to_dataset(table, str(path / file_name), partition_cols=['B'] ) {code} The issue is rather important for pandas and dask users. was: One cannot save the index when using {{pyarrow.parquet.write_to_dataset()}} with given partition_cols arguments. Here I have created a minimal example which shows the issue: {code:java} from pathlib import Path import pandas as pd from pyarrow import Table from pyarrow.parquet import write_to_dataset path = Path('/home/ludwik/Documents/YieldPlanet/research/trials') file_name = 'trial_pq.parquet' df = pd.DataFrame({"A": [1, 2, 3], "B": ['a', 'a', 'b'] }, index=pd.Index(['a', 'b', 'c'], name='idx')) table = Table.from_pandas(df) write_to_dataset(table, str(path / file_name), partition_cols=['B'], partition_filename_cb=None, filesystem=None) {code} The issue is rather important for pandas and dask users. > Losing index information when using write_to_dataset with partition_cols > ------------------------------------------------------------------------ > > Key: ARROW-7782 > URL: https://issues.apache.org/jira/browse/ARROW-7782 > Project: Apache Arrow > Issue Type: Bug > Environment: pyarrow==0.15.1 > Reporter: Ludwik Bielczynski > Priority: Major > > One cannot save the index when using {{pyarrow.parquet.write_to_dataset()}} > with given partition_cols arguments. Here I have created a minimal example > which shows the issue: > {code:java} > > from pathlib import Path > import pandas as pd > from pyarrow import Table > from pyarrow.parquet import write_to_dataset > path = Path('/home/user/trials') > file_name = 'local_database.parquet' > df = pd.DataFrame({"A": [1, 2, 3], "B": ['a', 'a', 'b']}, > index=pd.Index(['a', 'b', 'c'], > name='idx')) > table = Table.from_pandas(df) > write_to_dataset(table, > str(path / file_name), > partition_cols=['B'] > ) > {code} > > The issue is rather important for pandas and dask users. -- This message was sent by Atlassian Jira (v8.3.4#803005)