[jira] [Commented] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema
[ https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962353#comment-16962353 ] Tom Goodman commented on ARROW-6999: Thanks to your suggestions, we now have a work-around now that should allow us to remain backwards-compatible! If we get a KeyError due to missing '__index_level_0__', we'll set df.index.name = '__index_level_0__' and re-call same from_pandas function. {code:java} try: table = pa.Table.from_pandas(df, schema=schema) except KeyError as e: if '__index_level_0__' in str(e): # Happens in pyarrow 0.15.0, not 0.11.0 df.index.name = '__index_level_0__' table = pa.Table.from_pandas(df, schema=schema) else: raise e {code} _Thanks so much [~jorisvandenbossche]!_ > [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own > schema > --- > > Key: ARROW-6999 > URL: https://issues.apache.org/jira/browse/ARROW-6999 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 > Environment: pandas==0.23.4 > pyarrow==0.15.0 # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0 >Reporter: Tom Goodman >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Attachments: test3.hdf > > Time Spent: 50m > Remaining Estimate: 0h > > Steps to reproduce: > # Generate any DataFrame's pyarrow Schema using Table.from_pandas > # Pass the generated schema as input into Table.from_pandas > # Causes KeyError: '__index_level_0__' > We did not have this issue with pyarrow==0.11.0 which we used to write many > partitions across years. Our goal now is to use pyarrow==0.15.0 and produce > schema going forward that are *backwards compatible* (i.e. also have > '__index_level_0__'), so we should not need to re-generate all prior years' > partitions when we migrate to 0.15.0. > We cannot set _preserve_index=False_, since that effectively deletes > '__index_level_0__', causing inconsistent schema across earlier partitions > that had been written using pyarrow==0.11.0. > > {code:java} > import pandas as pd > import pyarrow as pa > df = pd.DataFrame() > schema = pa.Table.from_pandas(df).schema > pa_table = pa.Table.from_pandas(df, schema=schema) > {code} > {noformat} > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3078, in get_loc > return self._engine.get_loc(key) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 408, in _get_columns_to_convert_given_schema > col = df[name] > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2688, in __getitem__ > return self._getitem_column(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2695, in _getitem_column > return self._get_item_cache(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py", > line 2489, in _get_item_cache > values = self._data.get(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py", > line 4115, in get > loc = self.items.get_loc(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3080, in get_loc > return self._engine.get_loc(self._maybe_cast_indexer(key)) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__
[jira] [Comment Edited] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema
[ https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962353#comment-16962353 ] Tom Goodman edited comment on ARROW-6999 at 10/29/19 7:43 PM: -- Thanks to your suggestions, we now have a work-around that allows us to remain backwards-compatible! If we get a KeyError due to missing '__index_level_0__', we'll set df.index.name = '__index_level_0__' and re-call same from_pandas function. {code:java} try: table = pa.Table.from_pandas(df, schema=schema) except KeyError as e: if '__index_level_0__' in str(e): # Happens in pyarrow 0.15.0, not 0.11.0 df.index.name = '__index_level_0__' table = pa.Table.from_pandas(df, schema=schema) else: raise e {code} _Thanks so much [~jorisvandenbossche]!_ was (Author: goodiegoodman): Thanks to your suggestions, we now have a work-around now that should allow us to remain backwards-compatible! If we get a KeyError due to missing '__index_level_0__', we'll set df.index.name = '__index_level_0__' and re-call same from_pandas function. {code:java} try: table = pa.Table.from_pandas(df, schema=schema) except KeyError as e: if '__index_level_0__' in str(e): # Happens in pyarrow 0.15.0, not 0.11.0 df.index.name = '__index_level_0__' table = pa.Table.from_pandas(df, schema=schema) else: raise e {code} _Thanks so much [~jorisvandenbossche]!_ > [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own > schema > --- > > Key: ARROW-6999 > URL: https://issues.apache.org/jira/browse/ARROW-6999 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 > Environment: pandas==0.23.4 > pyarrow==0.15.0 # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0 >Reporter: Tom Goodman >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Attachments: test3.hdf > > Time Spent: 50m > Remaining Estimate: 0h > > Steps to reproduce: > # Generate any DataFrame's pyarrow Schema using Table.from_pandas > # Pass the generated schema as input into Table.from_pandas > # Causes KeyError: '__index_level_0__' > We did not have this issue with pyarrow==0.11.0 which we used to write many > partitions across years. Our goal now is to use pyarrow==0.15.0 and produce > schema going forward that are *backwards compatible* (i.e. also have > '__index_level_0__'), so we should not need to re-generate all prior years' > partitions when we migrate to 0.15.0. > We cannot set _preserve_index=False_, since that effectively deletes > '__index_level_0__', causing inconsistent schema across earlier partitions > that had been written using pyarrow==0.11.0. > > {code:java} > import pandas as pd > import pyarrow as pa > df = pd.DataFrame() > schema = pa.Table.from_pandas(df).schema > pa_table = pa.Table.from_pandas(df, schema=schema) > {code} > {noformat} > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3078, in get_loc > return self._engine.get_loc(key) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 408, in _get_columns_to_convert_given_schema > col = df[name] > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2688, in __getitem__ > return self._getitem_column(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2695, in _getitem_column > return self._get_item_cache(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py", > line 2489, in _get_item_cache > values = self._data.get(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py", > lin
[jira] [Comment Edited] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema
[ https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962254#comment-16962254 ] Tom Goodman edited comment on ARROW-6999 at 10/29/19 5:26 PM: -- [~jorisvandenbossche] thank you for the quick turn-around! We store the partitions in parquet files, with directories defining partitions and _common_metadata file holding schema. This allows us to use the [ParquetDataset partition level filters|https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html#pyarrow-parquet-parquetdataset] like [[('mm', '=', 201909)]] ... {noformat} tree . |-- _common_metadata |-- mm=201909 | `-- e097411586b0460e860c331b63fecb2b.parquet `-- mm=201910 `-- b8de9aa413194cc4af6f4802b5c4923f.parquet . . {noformat} was (Author: goodiegoodman): [~jorisvandenbossche] thank you for the quick turn-around! We store the partitions in parquet files, with directories defining partitions. This allows us to use the [ParquetDataset partition level filters|https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html#pyarrow-parquet-parquetdataset] like [[('mm', '=', 201909)]] ... {noformat} tree . |-- _common_metadata |-- mm=201909 | `-- e097411586b0460e860c331b63fecb2b.parquet `-- mm=201910 `-- b8de9aa413194cc4af6f4802b5c4923f.parquet . . {noformat} > [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own > schema > --- > > Key: ARROW-6999 > URL: https://issues.apache.org/jira/browse/ARROW-6999 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 > Environment: pandas==0.23.4 > pyarrow==0.15.0 # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0 >Reporter: Tom Goodman >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Attachments: test3.hdf > > Time Spent: 50m > Remaining Estimate: 0h > > Steps to reproduce: > # Generate any DataFrame's pyarrow Schema using Table.from_pandas > # Pass the generated schema as input into Table.from_pandas > # Causes KeyError: '__index_level_0__' > We did not have this issue with pyarrow==0.11.0 which we used to write many > partitions across years. Our goal now is to use pyarrow==0.15.0 and produce > schema going forward that are *backwards compatible* (i.e. also have > '__index_level_0__'), so we should not need to re-generate all prior years' > partitions when we migrate to 0.15.0. > We cannot set _preserve_index=False_, since that effectively deletes > '__index_level_0__', causing inconsistent schema across earlier partitions > that had been written using pyarrow==0.11.0. > > {code:java} > import pandas as pd > import pyarrow as pa > df = pd.DataFrame() > schema = pa.Table.from_pandas(df).schema > pa_table = pa.Table.from_pandas(df, schema=schema) > {code} > {noformat} > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3078, in get_loc > return self._engine.get_loc(key) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 408, in _get_columns_to_convert_given_schema > col = df[name] > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2688, in __getitem__ > return self._getitem_column(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2695, in _getitem_column > return self._get_item_cache(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py", > line 2489, in _get_item_cache > values = self._data.get(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py", > line 4115, in get > loc = self.items.get_loc(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/r
[jira] [Comment Edited] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema
[ https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962254#comment-16962254 ] Tom Goodman edited comment on ARROW-6999 at 10/29/19 5:25 PM: -- [~jorisvandenbossche] thank you for the quick turn-around! We store the partitions in parquet files, with directories defining partitions. This allows us to use the [ParquetDataset partition level filters|https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html#pyarrow-parquet-parquetdataset] like [[('mm', '=', 201909)]] ... {noformat} tree . |-- _common_metadata |-- mm=201909 | `-- e097411586b0460e860c331b63fecb2b.parquet `-- mm=201910 `-- b8de9aa413194cc4af6f4802b5c4923f.parquet . . {noformat} was (Author: goodiegoodman): [~jorisvandenbossche] thank you for the quick turn-around! We store the partitions in parquet files, with directories defining partitions. This allows us to use the [ParquetDataset partition level filters|https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html#pyarrow-parquet-parquetdataset] ... {noformat} tree . |-- _common_metadata |-- mm=201909 | `-- e097411586b0460e860c331b63fecb2b.parquet `-- mm=201910 `-- b8de9aa413194cc4af6f4802b5c4923f.parquet . . {noformat} > [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own > schema > --- > > Key: ARROW-6999 > URL: https://issues.apache.org/jira/browse/ARROW-6999 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 > Environment: pandas==0.23.4 > pyarrow==0.15.0 # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0 >Reporter: Tom Goodman >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Attachments: test3.hdf > > Time Spent: 50m > Remaining Estimate: 0h > > Steps to reproduce: > # Generate any DataFrame's pyarrow Schema using Table.from_pandas > # Pass the generated schema as input into Table.from_pandas > # Causes KeyError: '__index_level_0__' > We did not have this issue with pyarrow==0.11.0 which we used to write many > partitions across years. Our goal now is to use pyarrow==0.15.0 and produce > schema going forward that are *backwards compatible* (i.e. also have > '__index_level_0__'), so we should not need to re-generate all prior years' > partitions when we migrate to 0.15.0. > We cannot set _preserve_index=False_, since that effectively deletes > '__index_level_0__', causing inconsistent schema across earlier partitions > that had been written using pyarrow==0.11.0. > > {code:java} > import pandas as pd > import pyarrow as pa > df = pd.DataFrame() > schema = pa.Table.from_pandas(df).schema > pa_table = pa.Table.from_pandas(df, schema=schema) > {code} > {noformat} > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3078, in get_loc > return self._engine.get_loc(key) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 408, in _get_columns_to_convert_given_schema > col = df[name] > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2688, in __getitem__ > return self._getitem_column(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2695, in _getitem_column > return self._get_item_cache(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py", > line 2489, in _get_item_cache > values = self._data.get(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py", > line 4115, in get > loc = self.items.get_loc(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", >
[jira] [Comment Edited] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema
[ https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962254#comment-16962254 ] Tom Goodman edited comment on ARROW-6999 at 10/29/19 5:23 PM: -- [~jorisvandenbossche] thank you for the quick turn-around! We store the partitions in parquet files, with directories defining partitions. This allows us to use the [ParquetDataset partition level filters|https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html#pyarrow-parquet-parquetdataset] ... {noformat} tree . |-- _common_metadata |-- mm=201909 | `-- e097411586b0460e860c331b63fecb2b.parquet `-- mm=201910 `-- b8de9aa413194cc4af6f4802b5c4923f.parquet . . {noformat} was (Author: goodiegoodman): [~jorisvandenbossche] thank you for the quick turn-around! We store the partitions in parquet files, with directories defining partitions. This allows us to use the [ParquetDataset filters|[https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html#pyarrow-parquet-parquetdataset]] ... {noformat} tree . |-- _common_metadata |-- mm=201909 | `-- e097411586b0460e860c331b63fecb2b.parquet `-- mm=201910 `-- b8de9aa413194cc4af6f4802b5c4923f.parquet . . {noformat} > [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own > schema > --- > > Key: ARROW-6999 > URL: https://issues.apache.org/jira/browse/ARROW-6999 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 > Environment: pandas==0.23.4 > pyarrow==0.15.0 # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0 >Reporter: Tom Goodman >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Attachments: test3.hdf > > Time Spent: 50m > Remaining Estimate: 0h > > Steps to reproduce: > # Generate any DataFrame's pyarrow Schema using Table.from_pandas > # Pass the generated schema as input into Table.from_pandas > # Causes KeyError: '__index_level_0__' > We did not have this issue with pyarrow==0.11.0 which we used to write many > partitions across years. Our goal now is to use pyarrow==0.15.0 and produce > schema going forward that are *backwards compatible* (i.e. also have > '__index_level_0__'), so we should not need to re-generate all prior years' > partitions when we migrate to 0.15.0. > We cannot set _preserve_index=False_, since that effectively deletes > '__index_level_0__', causing inconsistent schema across earlier partitions > that had been written using pyarrow==0.11.0. > > {code:java} > import pandas as pd > import pyarrow as pa > df = pd.DataFrame() > schema = pa.Table.from_pandas(df).schema > pa_table = pa.Table.from_pandas(df, schema=schema) > {code} > {noformat} > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3078, in get_loc > return self._engine.get_loc(key) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 408, in _get_columns_to_convert_given_schema > col = df[name] > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2688, in __getitem__ > return self._getitem_column(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2695, in _getitem_column > return self._get_item_cache(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py", > line 2489, in _get_item_cache > values = self._data.get(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py", > line 4115, in get > loc = self.items.get_loc(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3080, in get_loc > return self._engin
[jira] [Commented] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema
[ https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962254#comment-16962254 ] Tom Goodman commented on ARROW-6999: [~jorisvandenbossche] thank you for the quick turn-around! We store the partitions in parquet files, with directories defining partitions. This allows us to use the [ParquetDataset filters|[https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html#pyarrow-parquet-parquetdataset]] ... {noformat} tree . |-- _common_metadata |-- mm=201909 | `-- e097411586b0460e860c331b63fecb2b.parquet `-- mm=201910 `-- b8de9aa413194cc4af6f4802b5c4923f.parquet . . {noformat} > [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own > schema > --- > > Key: ARROW-6999 > URL: https://issues.apache.org/jira/browse/ARROW-6999 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 > Environment: pandas==0.23.4 > pyarrow==0.15.0 # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0 >Reporter: Tom Goodman >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Attachments: test3.hdf > > Time Spent: 50m > Remaining Estimate: 0h > > Steps to reproduce: > # Generate any DataFrame's pyarrow Schema using Table.from_pandas > # Pass the generated schema as input into Table.from_pandas > # Causes KeyError: '__index_level_0__' > We did not have this issue with pyarrow==0.11.0 which we used to write many > partitions across years. Our goal now is to use pyarrow==0.15.0 and produce > schema going forward that are *backwards compatible* (i.e. also have > '__index_level_0__'), so we should not need to re-generate all prior years' > partitions when we migrate to 0.15.0. > We cannot set _preserve_index=False_, since that effectively deletes > '__index_level_0__', causing inconsistent schema across earlier partitions > that had been written using pyarrow==0.11.0. > > {code:java} > import pandas as pd > import pyarrow as pa > df = pd.DataFrame() > schema = pa.Table.from_pandas(df).schema > pa_table = pa.Table.from_pandas(df, schema=schema) > {code} > {noformat} > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3078, in get_loc > return self._engine.get_loc(key) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 408, in _get_columns_to_convert_given_schema > col = df[name] > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2688, in __getitem__ > return self._getitem_column(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2695, in _getitem_column > return self._get_item_cache(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py", > line 2489, in _get_item_cache > values = self._data.get(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py", > line 4115, in get > loc = self.items.get_loc(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3080, in get_loc > return self._engine.get_loc(self._maybe_cast_indexer(key)) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): >
[jira] [Comment Edited] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema
[ https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961317#comment-16961317 ] Tom Goodman edited comment on ARROW-6999 at 10/28/19 7:24 PM: -- [~jorisvandenbossche] please try this with the attached [^test3.hdf] (not empty) {code:java} df2 = pd.read_hdf('test3.hdf','foo') pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code} I still get KeyError: '__index_level_0__' (+without+ specifying preserve_index)._ This may be because the index on test3.hdf is Int64Index and I see [pyarrow docs|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.from_pandas] say default behavior is to "store the index as a column", except for rage indexes. This unfortunately makes the bug more prevalent. was (Author: goodiegoodman): [~jorisvandenbossche] please try this with the attached [^test3.hdf] (not empty) {code:java} df2 = pd.read_hdf('test3.hdf','foo') pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code} I still get KeyError: '__index_level_0__' (+without+ specifying preserve_index)._ This may be because the index on test3.hdf is Int64Index and I see [pyarrow docs|#pyarrow.Table.from_pandas]] say default behavior is to "store the index as a column", except for rage indexes. This unfortunately makes the bug more prevalent. > [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own > schema > --- > > Key: ARROW-6999 > URL: https://issues.apache.org/jira/browse/ARROW-6999 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 > Environment: pandas==0.23.4 > pyarrow==0.15.0 # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0 >Reporter: Tom Goodman >Priority: Major > Fix For: 1.0.0 > > Attachments: test3.hdf > > > Steps to reproduce: > # Generate any DataFrame's pyarrow Schema using Table.from_pandas > # Pass the generated schema as input into Table.from_pandas > # Causes KeyError: '__index_level_0__' > We did not have this issue with pyarrow==0.11.0 which we used to write many > partitions across years. Our goal now is to use pyarrow==0.15.0 and produce > schema going forward that are *backwards compatible* (i.e. also have > '__index_level_0__'), so we should not need to re-generate all prior years' > partitions when we migrate to 0.15.0. > We cannot set _preserve_index=False_, since that effectively deletes > '__index_level_0__', causing inconsistent schema across earlier partitions > that had been written using pyarrow==0.11.0. > > {code:java} > import pandas as pd > import pyarrow as pa > df = pd.DataFrame() > schema = pa.Table.from_pandas(df).schema > pa_table = pa.Table.from_pandas(df, schema=schema) > {code} > {noformat} > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3078, in get_loc > return self._engine.get_loc(key) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 408, in _get_columns_to_convert_given_schema > col = df[name] > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2688, in __getitem__ > return self._getitem_column(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2695, in _getitem_column > return self._get_item_cache(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py", > line 2489, in _get_item_cache > values = self._data.get(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py", > line 4115, in get > loc = self.items.get_loc(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3080, in get_loc > return se
[jira] [Comment Edited] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema
[ https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961317#comment-16961317 ] Tom Goodman edited comment on ARROW-6999 at 10/28/19 7:21 PM: -- [~jorisvandenbossche] please try this with the attached [^test3.hdf] (not empty) {code:java} df2 = pd.read_hdf('test3.hdf','foo') pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code} I still get KeyError: '__index_level_0__' (+without+ specifying preserve_index)._ This may be because the index on test3.hdf is Int64Index and I see [pyarrow docs|#pyarrow.Table.from_pandas]] say default behavior is to "store the index as a column", except for rage indexes. This unfortunately makes the bug more prevalent. was (Author: goodiegoodman): [~jorisvandenbossche] please try this with the attached [^test3.hdf] (not empty) {code:java} df2 = pd.read_hdf('test3.hdf','foo') pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code} I still get KeyError: '__index_level_0__' (+without+ specifying preserve_index)._ This may be because the index on test3.hdf is Int64Index and I see [pyarrow docs|[https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.from_pandas]] say default behavior is to "store the index as a column", except for rage indexes) > [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own > schema > --- > > Key: ARROW-6999 > URL: https://issues.apache.org/jira/browse/ARROW-6999 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 > Environment: pandas==0.23.4 > pyarrow==0.15.0 # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0 >Reporter: Tom Goodman >Priority: Major > Fix For: 1.0.0 > > Attachments: test3.hdf > > > Steps to reproduce: > # Generate any DataFrame's pyarrow Schema using Table.from_pandas > # Pass the generated schema as input into Table.from_pandas > # Causes KeyError: '__index_level_0__' > We did not have this issue with pyarrow==0.11.0 which we used to write many > partitions across years. Our goal now is to use pyarrow==0.15.0 and produce > schema going forward that are *backwards compatible* (i.e. also have > '__index_level_0__'), so we should not need to re-generate all prior years' > partitions when we migrate to 0.15.0. > We cannot set _preserve_index=False_, since that effectively deletes > '__index_level_0__', causing inconsistent schema across earlier partitions > that had been written using pyarrow==0.11.0. > > {code:java} > import pandas as pd > import pyarrow as pa > df = pd.DataFrame() > schema = pa.Table.from_pandas(df).schema > pa_table = pa.Table.from_pandas(df, schema=schema) > {code} > {noformat} > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3078, in get_loc > return self._engine.get_loc(key) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 408, in _get_columns_to_convert_given_schema > col = df[name] > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2688, in __getitem__ > return self._getitem_column(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2695, in _getitem_column > return self._get_item_cache(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py", > line 2489, in _get_item_cache > values = self._data.get(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py", > line 4115, in get > loc = self.items.get_loc(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3080, in get_loc > return self._engine.get_loc(self._maybe_cast_indexer(key)
[jira] [Comment Edited] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema
[ https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961317#comment-16961317 ] Tom Goodman edited comment on ARROW-6999 at 10/28/19 7:15 PM: -- [~jorisvandenbossche] please try this with the attached [^test3.hdf] (not empty) {code:java} df2 = pd.read_hdf('test3.hdf','foo') pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code} I still get KeyError: '__index_level_0__' (+without+ specifying preserve_index)._ This may be because the index on test3.hdf is Int64Index and I see [pyarrow docs|[https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.from_pandas]] say default behavior is to "store the index as a column", except for rage indexes) was (Author: goodiegoodman): [~jorisvandenbossche] please try this with the attached [^test3.hdf] (not empty) {code:java} df2 = pd.read_hdf('test3.hdf','foo') pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code} I still get KeyError: '__index_level_0__' (+without+ specifying preserve_index)_, do you? > [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own > schema > --- > > Key: ARROW-6999 > URL: https://issues.apache.org/jira/browse/ARROW-6999 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 > Environment: pandas==0.23.4 > pyarrow==0.15.0 # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0 >Reporter: Tom Goodman >Priority: Major > Fix For: 1.0.0 > > Attachments: test3.hdf > > > Steps to reproduce: > # Generate any DataFrame's pyarrow Schema using Table.from_pandas > # Pass the generated schema as input into Table.from_pandas > # Causes KeyError: '__index_level_0__' > We did not have this issue with pyarrow==0.11.0 which we used to write many > partitions across years. Our goal now is to use pyarrow==0.15.0 and produce > schema going forward that are *backwards compatible* (i.e. also have > '__index_level_0__'), so we should not need to re-generate all prior years' > partitions when we migrate to 0.15.0. > We cannot set _preserve_index=False_, since that effectively deletes > '__index_level_0__', causing inconsistent schema across earlier partitions > that had been written using pyarrow==0.11.0. > > {code:java} > import pandas as pd > import pyarrow as pa > df = pd.DataFrame() > schema = pa.Table.from_pandas(df).schema > pa_table = pa.Table.from_pandas(df, schema=schema) > {code} > {noformat} > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3078, in get_loc > return self._engine.get_loc(key) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 408, in _get_columns_to_convert_given_schema > col = df[name] > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2688, in __getitem__ > return self._getitem_column(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2695, in _getitem_column > return self._get_item_cache(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py", > line 2489, in _get_item_cache > values = self._data.get(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py", > line 4115, in get > loc = self.items.get_loc(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3080, in get_loc > return self._engine.get_loc(self._maybe_cast_indexer(key)) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1
[jira] [Comment Edited] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema
[ https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961317#comment-16961317 ] Tom Goodman edited comment on ARROW-6999 at 10/28/19 6:32 PM: -- [~jorisvandenbossche] please try this with the attached [^test3.hdf] (not empty) {code:java} df2 = pd.read_hdf('test3.hdf','foo') pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code} I still get KeyError: '__index_level_0__' (+without+ specifying preserve_index)_, do you? was (Author: goodiegoodman): [~jorisvandenbossche] please try this with the attached [^test3.hdf] (not empty) {code:java} df2 = pd.read_hdf('test3.hdf','foo') pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code} I still get KeyError: '__index_level_0__', do you? > [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own > schema > --- > > Key: ARROW-6999 > URL: https://issues.apache.org/jira/browse/ARROW-6999 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 > Environment: pandas==0.23.4 > pyarrow==0.15.0 # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0 >Reporter: Tom Goodman >Priority: Major > Fix For: 1.0.0 > > Attachments: test3.hdf > > > Steps to reproduce: > # Generate any DataFrame's pyarrow Schema using Table.from_pandas > # Pass the generated schema as input into Table.from_pandas > # Causes KeyError: '__index_level_0__' > We did not have this issue with pyarrow==0.11.0 which we used to write many > partitions across years. Our goal now is to use pyarrow==0.15.0 and produce > schema going forward that are *backwards compatible* (i.e. also have > '__index_level_0__'), so we should not need to re-generate all prior years' > partitions when we migrate to 0.15.0. > We cannot set _preserve_index=False_, since that effectively deletes > '__index_level_0__', causing inconsistent schema across earlier partitions > that had been written using pyarrow==0.11.0. > > {code:java} > import pandas as pd > import pyarrow as pa > df = pd.DataFrame() > schema = pa.Table.from_pandas(df).schema > pa_table = pa.Table.from_pandas(df, schema=schema) > {code} > {noformat} > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3078, in get_loc > return self._engine.get_loc(key) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 408, in _get_columns_to_convert_given_schema > col = df[name] > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2688, in __getitem__ > return self._getitem_column(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2695, in _getitem_column > return self._get_item_cache(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py", > line 2489, in _get_item_cache > values = self._data.get(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py", > line 4115, in get > loc = self.items.get_loc(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3080, in get_loc > return self._engine.get_loc(self._maybe_cast_indexer(key)) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (m
[jira] [Comment Edited] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema
[ https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961317#comment-16961317 ] Tom Goodman edited comment on ARROW-6999 at 10/28/19 6:13 PM: -- [~jorisvandenbossche] please try this with the attached [^test3.hdf] (not empty) {code:java} df2 = pd.read_hdf('test3.hdf','foo') pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code} I still get KeyError: '__index_level_0__', do you? was (Author: goodiegoodman): [~jorisvandenbossche] please try this with the attached test3.hdf (not empty) {code:java} df2 = pd.read_hdf('test3.hdf','foo') pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code} I still get KeyError: '__index_level_0__', do you? > [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own > schema > --- > > Key: ARROW-6999 > URL: https://issues.apache.org/jira/browse/ARROW-6999 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 > Environment: pandas==0.23.4 > pyarrow==0.15.0 # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0 >Reporter: Tom Goodman >Priority: Major > Fix For: 1.0.0 > > Attachments: test3.hdf > > > Steps to reproduce: > # Generate any DataFrame's pyarrow Schema using Table.from_pandas > # Pass the generated schema as input into Table.from_pandas > # Causes KeyError: '__index_level_0__' > We did not have this issue with pyarrow==0.11.0 which we used to write many > partitions across years. Our goal now is to use pyarrow==0.15.0 and produce > schema going forward that are *backwards compatible* (i.e. also have > '__index_level_0__'), so we should not need to re-generate all prior years' > partitions when we migrate to 0.15.0. > We cannot set _preserve_index=False_, since that effectively deletes > '__index_level_0__', causing inconsistent schema across earlier partitions > that had been written using pyarrow==0.11.0. > > {code:java} > import pandas as pd > import pyarrow as pa > df = pd.DataFrame() > schema = pa.Table.from_pandas(df).schema > pa_table = pa.Table.from_pandas(df, schema=schema) > {code} > {noformat} > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3078, in get_loc > return self._engine.get_loc(key) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 408, in _get_columns_to_convert_given_schema > col = df[name] > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2688, in __getitem__ > return self._getitem_column(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2695, in _getitem_column > return self._get_item_cache(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py", > line 2489, in _get_item_cache > values = self._data.get(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py", > line 4115, in get > loc = self.items.get_loc(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3080, in get_loc > return self._engine.get_loc(self._maybe_cast_indexer(key)) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/GAAR/F
[jira] [Commented] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema
[ https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961317#comment-16961317 ] Tom Goodman commented on ARROW-6999: [~jorisvandenbossche] please try this with the attached test3.hdf (not empty) {code:java} df2 = pd.read_hdf('test3.hdf','foo') pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code} I still get KeyError: '__index_level_0__', do you? > [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own > schema > --- > > Key: ARROW-6999 > URL: https://issues.apache.org/jira/browse/ARROW-6999 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 > Environment: pandas==0.23.4 > pyarrow==0.15.0 # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0 >Reporter: Tom Goodman >Priority: Major > Fix For: 1.0.0 > > Attachments: test3.hdf > > > Steps to reproduce: > # Generate any DataFrame's pyarrow Schema using Table.from_pandas > # Pass the generated schema as input into Table.from_pandas > # Causes KeyError: '__index_level_0__' > We did not have this issue with pyarrow==0.11.0 which we used to write many > partitions across years. Our goal now is to use pyarrow==0.15.0 and produce > schema going forward that are *backwards compatible* (i.e. also have > '__index_level_0__'), so we should not need to re-generate all prior years' > partitions when we migrate to 0.15.0. > We cannot set _preserve_index=False_, since that effectively deletes > '__index_level_0__', causing inconsistent schema across earlier partitions > that had been written using pyarrow==0.11.0. > > {code:java} > import pandas as pd > import pyarrow as pa > df = pd.DataFrame() > schema = pa.Table.from_pandas(df).schema > pa_table = pa.Table.from_pandas(df, schema=schema) > {code} > {noformat} > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3078, in get_loc > return self._engine.get_loc(key) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 408, in _get_columns_to_convert_given_schema > col = df[name] > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2688, in __getitem__ > return self._getitem_column(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2695, in _getitem_column > return self._get_item_cache(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py", > line 2489, in _get_item_cache > values = self._data.get(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py", > line 4115, in get > loc = self.items.get_loc(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3080, in get_loc > return self._engine.get_loc(self._maybe_cast_indexer(key)) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/IPython/core/interactiveshell.py", > line 3326, in run_code > exec(code_obj, self.user_global_ns, self.user_ns) > File "", line 5, in > pa_table = pa.Table.from_pandas(df, > schema=pa.Table.from_pandas(df).schema) > File "pyarrow/table.pxi",
[jira] [Updated] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema
[ https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Goodman updated ARROW-6999: --- Attachment: test3.hdf > [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own > schema > --- > > Key: ARROW-6999 > URL: https://issues.apache.org/jira/browse/ARROW-6999 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 > Environment: pandas==0.23.4 > pyarrow==0.15.0 # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0 >Reporter: Tom Goodman >Priority: Major > Fix For: 1.0.0 > > Attachments: test3.hdf > > > Steps to reproduce: > # Generate any DataFrame's pyarrow Schema using Table.from_pandas > # Pass the generated schema as input into Table.from_pandas > # Causes KeyError: '__index_level_0__' > We did not have this issue with pyarrow==0.11.0 which we used to write many > partitions across years. Our goal now is to use pyarrow==0.15.0 and produce > schema going forward that are *backwards compatible* (i.e. also have > '__index_level_0__'), so we should not need to re-generate all prior years' > partitions when we migrate to 0.15.0. > We cannot set _preserve_index=False_, since that effectively deletes > '__index_level_0__', causing inconsistent schema across earlier partitions > that had been written using pyarrow==0.11.0. > > {code:java} > import pandas as pd > import pyarrow as pa > df = pd.DataFrame() > schema = pa.Table.from_pandas(df).schema > pa_table = pa.Table.from_pandas(df, schema=schema) > {code} > {noformat} > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3078, in get_loc > return self._engine.get_loc(key) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 408, in _get_columns_to_convert_given_schema > col = df[name] > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2688, in __getitem__ > return self._getitem_column(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2695, in _getitem_column > return self._get_item_cache(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py", > line 2489, in _get_item_cache > values = self._data.get(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py", > line 4115, in get > loc = self.items.get_loc(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3080, in get_loc > return self._engine.get_loc(self._maybe_cast_indexer(key)) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/IPython/core/interactiveshell.py", > line 3326, in run_code > exec(code_obj, self.user_global_ns, self.user_ns) > File "", line 5, in > pa_table = pa.Table.from_pandas(df, > schema=pa.Table.from_pandas(df).schema) > File "pyarrow/table.pxi", line 1057, in pyarrow.lib.Table.from_pandas > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 517, in dataframe_to_arrays > columns) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sf
[jira] [Updated] (ARROW-6999) KeyError: '__index_level_0__' passing Table.from_pandas its own schema
[ https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Goodman updated ARROW-6999: --- Environment: pandas==0.23.4 pyarrow==0.15.0 # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0 was: pip freeze certifi==2019.6.16 numpy==1.17.2 pandas==0.23.4 pyarrow==0.15.0 # Issue also seen in 0.14.0, 0.13.0, 0.12.0 python-dateutil==2.8.0 pytz==2019.2 six==1.12.0 > KeyError: '__index_level_0__' passing Table.from_pandas its own schema > -- > > Key: ARROW-6999 > URL: https://issues.apache.org/jira/browse/ARROW-6999 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 > Environment: pandas==0.23.4 > pyarrow==0.15.0 # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0 >Reporter: Tom Goodman >Priority: Major > > Steps to reproduce: > # Generate any DataFrame's pyarrow Schema using Table.from_pandas > # Pass the generated schema as input into Table.from_pandas > # Causes KeyError: '__index_level_0__' > We did not have this issue with pyarrow==0.11.0 which we used to write many > partitions across years. Our goal now is to use pyarrow==0.15.0 and produce > schema going forward that are *backwards compatible* (i.e. also have > '__index_level_0__'), so we should not need to re-generate all prior years' > partitions when we migrate to 0.15.0. > We cannot set _preserve_index=False_, since that effectively deletes > '__index_level_0__', causing inconsistent schema across earlier partitions > that had been written using pyarrow==0.11.0. > > {code:java} > import pandas as pd > import pyarrow as pa > df = pd.DataFrame() > schema = pa.Table.from_pandas(df).schema > pa_table = pa.Table.from_pandas(df, schema=schema) > {code} > {noformat} > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3078, in get_loc > return self._engine.get_loc(key) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 408, in _get_columns_to_convert_given_schema > col = df[name] > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2688, in __getitem__ > return self._getitem_column(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", > line 2695, in _getitem_column > return self._get_item_cache(key) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py", > line 2489, in _get_item_cache > values = self._data.get(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py", > line 4115, in get > loc = self.items.get_loc(item) > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", > line 3080, in get_loc > return self._engine.get_loc(self._maybe_cast_indexer(key)) > File "pandas/_libs/index.pyx", line 140, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/index.pyx", line 162, in > pandas._libs.index.IndexEngine.get_loc > File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in > pandas._libs.hashtable.PyObjectHashTable.get_item > File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in > pandas._libs.hashtable.PyObjectHashTable.get_item > KeyError: '__index_level_0__' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/IPython/core/interactiveshell.py", > line 3326, in run_code > exec(code_obj, self.user_global_ns, self.user_ns) > File "", line 5, in > pa_table = pa.Table.from_pandas(df, > schema=pa.Table.from_pandas(df).schema) > File "pyarrow/table.pxi", line 1057, in pyarrow.lib.Table.from_pandas > File > "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2
[jira] [Created] (ARROW-6999) KeyError: '__index_level_0__' passing Table.from_pandas its own schema
Tom Goodman created ARROW-6999: -- Summary: KeyError: '__index_level_0__' passing Table.from_pandas its own schema Key: ARROW-6999 URL: https://issues.apache.org/jira/browse/ARROW-6999 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.15.0, 0.14.0, 0.13.0, 0.12.0 Environment: pip freeze certifi==2019.6.16 numpy==1.17.2 pandas==0.23.4 pyarrow==0.15.0 # Issue also seen in 0.14.0, 0.13.0, 0.12.0 python-dateutil==2.8.0 pytz==2019.2 six==1.12.0 Reporter: Tom Goodman Steps to reproduce: # Generate any DataFrame's pyarrow Schema using Table.from_pandas # Pass the generated schema as input into Table.from_pandas # Causes KeyError: '__index_level_0__' We did not have this issue with pyarrow==0.11.0 which we used to write many partitions across years. Our goal now is to use pyarrow==0.15.0 and produce schema going forward that are *backwards compatible* (i.e. also have '__index_level_0__'), so we should not need to re-generate all prior years' partitions when we migrate to 0.15.0. We cannot set _preserve_index=False_, since that effectively deletes '__index_level_0__', causing inconsistent schema across earlier partitions that had been written using pyarrow==0.11.0. {code:java} import pandas as pd import pyarrow as pa df = pd.DataFrame() schema = pa.Table.from_pandas(df).schema pa_table = pa.Table.from_pandas(df, schema=schema) {code} {noformat} Traceback (most recent call last): File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: '__index_level_0__' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 408, in _get_columns_to_convert_given_schema col = df[name] File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", line 2688, in __getitem__ return self._getitem_column(key) File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py", line 2695, in _getitem_column return self._get_item_cache(key) File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache values = self._data.get(item) File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py", line 4115, in get loc = self.items.get_loc(item) File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: '__index_level_0__' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 5, in pa_table = pa.Table.from_pandas(df, schema=pa.Table.from_pandas(df).schema) File "pyarrow/table.pxi", line 1057, in pyarrow.lib.Table.from_pandas File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 517, in dataframe_to_arrays columns) File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 337, in _get_columns_to_convert return _get_columns_to_convert_given_schema(df, schema, preserve_index) File "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 426, in _get_columns_to_convert_given_schema "in the columns or index".format(name)) KeyError: "name '__index_