[ https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962472#comment-16962472 ]
Joris Van den Bossche commented on ARROW-6999: ---------------------------------------------- That sounds as a decent enough workaround for now. Happy you found something! > [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own > schema > ------------------------------------------------------------------------------- > > Key: ARROW-6999 > URL: https://issues.apache.org/jira/browse/ARROW-6999 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 > Environment: pandas==0.23.4 > pyarrow==0.15.0 # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0 > Reporter: Tom Goodman > Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Attachments: test3.hdf > > Time Spent: 50m > Remaining Estimate: 0h > > Steps to reproduce: > # Generate any DataFrame's pyarrow Schema using Table.from_pandas > # Pass the generated schema as input into Table.from_pandas > # Causes KeyError: '__index_level_0__' > We did not have this issue with pyarrow==0.11.0 which we used to write many > partitions across years. Our goal now is to use pyarrow==0.15.0 and produce > schema going forward that are *backwards compatible* (i.e. also have > '__index_level_0__'), so we should not need to re-generate all prior years' > partitions when we migrate to 0.15.0. > We cannot set _preserve_index=False_, since that effectively deletes > '__index_level_0__', causing inconsistent schema across earlier partitions > that had been written using pyarrow==0.11.0. > > {code:java} > import pandas as pd > import pyarrow as pa > df = pd.DataFrame() > schema = pa.Table.from_pandas(df).schema > pa_table = pa.Table.from_pandas(df, schema=schema) > {code} > {noformat} > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3078, in get_loc > return self._engine.get_loc(key) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 408, in _get_columns_to_convert_given_schema > col = df[name] > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2688, in __getitem__ > return self._getitem_column(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2695, in _getitem_column > return self._get_item_cache(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py", > line 2489, in _get_item_cache > values = self._data.get(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py", > line 4115, in get > loc = self.items.get_loc(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3080, in get_loc > return self._engine.get_loc(self._maybe_cast_indexer(key)) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/IPython/core/interactiveshell.py", > line 3326, in run_code > exec(code_obj, self.user_global_ns, self.user_ns) > File "<ipython-input-36-6711a2fcec96>", line 5, in <module> > pa_table = pa.Table.from_pandas(df, > schema=pa.Table.from_pandas(df).schema) > File "pyarrow/table.pxi", line 1057, in pyarrow.lib.Table.from_pandas > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 517, in dataframe_to_arrays > columns) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 337, in _get_columns_to_convert > return _get_columns_to_convert_given_schema(df, schema, preserve_index) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 426, in _get_columns_to_convert_given_schema > "in the columns or index".format(name)) > KeyError: "name '__index_level_0__' present in the specified schema is not > found in the columns or index" > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)