Tom Goodman created ARROW-6999:
----------------------------------

             Summary: KeyError: '__index_level_0__' passing Table.from_pandas 
its own schema
                 Key: ARROW-6999
                 URL: https://issues.apache.org/jira/browse/ARROW-6999
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.15.0, 0.14.0, 0.13.0, 0.12.0
         Environment: pip freeze
certifi==2019.6.16
numpy==1.17.2
pandas==0.23.4
pyarrow==0.15.0  # Issue also seen in 0.14.0, 0.13.0, 0.12.0
python-dateutil==2.8.0
pytz==2019.2
six==1.12.0

            Reporter: Tom Goodman


Steps to reproduce:
 # Generate any DataFrame's pyarrow Schema using Table.from_pandas
 # Pass the generated schema as input into Table.from_pandas
 # Causes KeyError: '__index_level_0__'

We did not have this issue with pyarrow==0.11.0 which we used to write many 
partitions across years.  Our goal now is to use pyarrow==0.15.0 and produce 
schema going forward that are *backwards compatible* (i.e. also have 
'__index_level_0__'), so we should not need to re-generate all prior years' 
partitions when we migrate to 0.15.0.

We cannot set _preserve_index=False_, since that effectively deletes 
'__index_level_0__', causing inconsistent schema across earlier partitions that 
had been written using pyarrow==0.11.0.

 
{code:java}
import pandas as pd
import pyarrow as pa
df = pd.DataFrame() 
schema = pa.Table.from_pandas(df).schema
pa_table = pa.Table.from_pandas(df, schema=schema)

{code}
{noformat}
Traceback (most recent call last):
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
 line 3078, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 140, in 
pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in 
pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '__index_level_0__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
 line 408, in _get_columns_to_convert_given_schema
    col = df[name]
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
 line 2688, in __getitem__
    return self._getitem_column(key)
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
 line 2695, in _getitem_column
    return self._get_item_cache(key)
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py",
 line 2489, in _get_item_cache
    values = self._data.get(item)
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py",
 line 4115, in get
    loc = self.items.get_loc(item)
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
 line 3080, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 140, in 
pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in 
pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '__index_level_0__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/IPython/core/interactiveshell.py",
 line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-36-6711a2fcec96>", line 5, in <module>
    pa_table = pa.Table.from_pandas(df, schema=pa.Table.from_pandas(df).schema)
  File "pyarrow/table.pxi", line 1057, in pyarrow.lib.Table.from_pandas
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
 line 517, in dataframe_to_arrays
    columns)
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
 line 337, in _get_columns_to_convert
    return _get_columns_to_convert_given_schema(df, schema, preserve_index)
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
 line 426, in _get_columns_to_convert_given_schema
    "in the columns or index".format(name))
KeyError: "name '__index_level_0__' present in the specified schema is not 
found in the columns or index"
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to