Tom Goodman created ARROW-6999: ---------------------------------- Summary: KeyError: '__index_level_0__' passing Table.from_pandas its own schema Key: ARROW-6999 URL: https://issues.apache.org/jira/browse/ARROW-6999 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.15.0, 0.14.0, 0.13.0, 0.12.0 Environment: pip freeze certifi==2019.6.16 numpy==1.17.2 pandas==0.23.4 pyarrow==0.15.0 # Issue also seen in 0.14.0, 0.13.0, 0.12.0 python-dateutil==2.8.0 pytz==2019.2 six==1.12.0
Reporter: Tom Goodman Steps to reproduce: # Generate any DataFrame's pyarrow Schema using Table.from_pandas # Pass the generated schema as input into Table.from_pandas # Causes KeyError: '__index_level_0__' We did not have this issue with pyarrow==0.11.0 which we used to write many partitions across years. Our goal now is to use pyarrow==0.15.0 and produce schema going forward that are *backwards compatible* (i.e. also have '__index_level_0__'), so we should not need to re-generate all prior years' partitions when we migrate to 0.15.0. We cannot set _preserve_index=False_, since that effectively deletes '__index_level_0__', causing inconsistent schema across earlier partitions that had been written using pyarrow==0.11.0. {code:java} import pandas as pd import pyarrow as pa df = pd.DataFrame() schema = pa.Table.from_pandas(df).schema pa_table = pa.Table.from_pandas(df, schema=schema) {code} {noformat} Traceback (most recent call last): File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: '__index_level_0__' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 408, in _get_columns_to_convert_given_schema col = df[name] File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", line 2688, in __getitem__ return self._getitem_column(key) File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", line 2695, in _getitem_column return self._get_item_cache(key) File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache values = self._data.get(item) File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py", line 4115, in get loc = self.items.get_loc(item) File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: '__index_level_0__' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-36-6711a2fcec96>", line 5, in <module> pa_table = pa.Table.from_pandas(df, schema=pa.Table.from_pandas(df).schema) File "pyarrow/table.pxi", line 1057, in pyarrow.lib.Table.from_pandas File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 517, in dataframe_to_arrays columns) File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 337, in _get_columns_to_convert return _get_columns_to_convert_given_schema(df, schema, preserve_index) File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 426, in _get_columns_to_convert_given_schema "in the columns or index".format(name)) KeyError: "name '__index_level_0__' present in the specified schema is not found in the columns or index" {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)