[jira] [Commented] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema

2019-10-29 Thread Tom Goodman (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962353#comment-16962353
 ] 

Tom Goodman commented on ARROW-6999:


Thanks to your suggestions, we now have a work-around now that should allow us 
to remain backwards-compatible!
If we get a KeyError due to missing  '__index_level_0__', we'll set 
df.index.name = '__index_level_0__' and re-call same from_pandas function.
{code:java}
try:
table = pa.Table.from_pandas(df, schema=schema)
except KeyError as e:
if '__index_level_0__' in str(e):  # Happens in pyarrow 0.15.0, not 
0.11.0
df.index.name = '__index_level_0__'
table = pa.Table.from_pandas(df, schema=schema)
else:
raise e
{code}
_Thanks so much [~jorisvandenbossche]!_

> [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own 
> schema
> ---
>
> Key: ARROW-6999
> URL: https://issues.apache.org/jira/browse/ARROW-6999
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
> Environment: pandas==0.23.4
> pyarrow==0.15.0  # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0
>Reporter: Tom Goodman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
> Attachments: test3.hdf
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  # Generate any DataFrame's pyarrow Schema using Table.from_pandas
>  # Pass the generated schema as input into Table.from_pandas
>  # Causes KeyError: '__index_level_0__'
> We did not have this issue with pyarrow==0.11.0 which we used to write many 
> partitions across years.  Our goal now is to use pyarrow==0.15.0 and produce 
> schema going forward that are *backwards compatible* (i.e. also have 
> '__index_level_0__'), so we should not need to re-generate all prior years' 
> partitions when we migrate to 0.15.0.
> We cannot set _preserve_index=False_, since that effectively deletes 
> '__index_level_0__', causing inconsistent schema across earlier partitions 
> that had been written using pyarrow==0.11.0.
>  
> {code:java}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame() 
> schema = pa.Table.from_pandas(df).schema
> pa_table = pa.Table.from_pandas(df, schema=schema)
> {code}
> {noformat}
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3078, in get_loc
> return self._engine.get_loc(key)
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__index_level_0__'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
>  line 408, in _get_columns_to_convert_given_schema
> col = df[name]
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2688, in __getitem__
> return self._getitem_column(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2695, in _getitem_column
> return self._get_item_cache(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py",
>  line 2489, in _get_item_cache
> values = self._data.get(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py",
>  line 4115, in get
> loc = self.items.get_loc(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3080, in get_loc
> return self._engine.get_loc(self._maybe_cast_indexer(key))
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__

[jira] [Comment Edited] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema

2019-10-29 Thread Tom Goodman (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962353#comment-16962353
 ] 

Tom Goodman edited comment on ARROW-6999 at 10/29/19 7:43 PM:
--

Thanks to your suggestions, we now have a work-around that allows us to remain 
backwards-compatible!
If we get a KeyError due to missing  '__index_level_0__', we'll set 
df.index.name = '__index_level_0__' and re-call same from_pandas function.
{code:java}
try:
table = pa.Table.from_pandas(df, schema=schema)
except KeyError as e:
if '__index_level_0__' in str(e):  # Happens in pyarrow 0.15.0, not 
0.11.0
df.index.name = '__index_level_0__'
table = pa.Table.from_pandas(df, schema=schema)
else:
raise e
{code}
_Thanks so much [~jorisvandenbossche]!_


was (Author: goodiegoodman):
Thanks to your suggestions, we now have a work-around now that should allow us 
to remain backwards-compatible!
If we get a KeyError due to missing  '__index_level_0__', we'll set 
df.index.name = '__index_level_0__' and re-call same from_pandas function.
{code:java}
try:
table = pa.Table.from_pandas(df, schema=schema)
except KeyError as e:
if '__index_level_0__' in str(e):  # Happens in pyarrow 0.15.0, not 
0.11.0
df.index.name = '__index_level_0__'
table = pa.Table.from_pandas(df, schema=schema)
else:
raise e
{code}
_Thanks so much [~jorisvandenbossche]!_

> [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own 
> schema
> ---
>
> Key: ARROW-6999
> URL: https://issues.apache.org/jira/browse/ARROW-6999
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
> Environment: pandas==0.23.4
> pyarrow==0.15.0  # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0
>Reporter: Tom Goodman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
> Attachments: test3.hdf
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  # Generate any DataFrame's pyarrow Schema using Table.from_pandas
>  # Pass the generated schema as input into Table.from_pandas
>  # Causes KeyError: '__index_level_0__'
> We did not have this issue with pyarrow==0.11.0 which we used to write many 
> partitions across years.  Our goal now is to use pyarrow==0.15.0 and produce 
> schema going forward that are *backwards compatible* (i.e. also have 
> '__index_level_0__'), so we should not need to re-generate all prior years' 
> partitions when we migrate to 0.15.0.
> We cannot set _preserve_index=False_, since that effectively deletes 
> '__index_level_0__', causing inconsistent schema across earlier partitions 
> that had been written using pyarrow==0.11.0.
>  
> {code:java}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame() 
> schema = pa.Table.from_pandas(df).schema
> pa_table = pa.Table.from_pandas(df, schema=schema)
> {code}
> {noformat}
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3078, in get_loc
> return self._engine.get_loc(key)
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__index_level_0__'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
>  line 408, in _get_columns_to_convert_given_schema
> col = df[name]
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2688, in __getitem__
> return self._getitem_column(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2695, in _getitem_column
> return self._get_item_cache(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py",
>  line 2489, in _get_item_cache
> values = self._data.get(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py",
>  lin

[jira] [Comment Edited] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema

2019-10-29 Thread Tom Goodman (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962254#comment-16962254
 ] 

Tom Goodman edited comment on ARROW-6999 at 10/29/19 5:26 PM:
--

[~jorisvandenbossche] thank you for the quick turn-around!

We store the partitions in parquet files, with directories defining partitions 
and _common_metadata file holding schema.  This allows us to use the 
[ParquetDataset partition level 
filters|https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html#pyarrow-parquet-parquetdataset]
 like [[('mm', '=', 201909)]] ...
{noformat}
tree
.
|-- _common_metadata
|-- mm=201909
|   `-- e097411586b0460e860c331b63fecb2b.parquet
`-- mm=201910
`-- b8de9aa413194cc4af6f4802b5c4923f.parquet
.
.

{noformat}


was (Author: goodiegoodman):
[~jorisvandenbossche] thank you for the quick turn-around!

We store the partitions in parquet files, with directories defining partitions. 
 This allows us to use the [ParquetDataset partition level 
filters|https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html#pyarrow-parquet-parquetdataset]
 like [[('mm', '=', 201909)]] ...
{noformat}
tree
.
|-- _common_metadata
|-- mm=201909
|   `-- e097411586b0460e860c331b63fecb2b.parquet
`-- mm=201910
`-- b8de9aa413194cc4af6f4802b5c4923f.parquet
.
.

{noformat}

> [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own 
> schema
> ---
>
> Key: ARROW-6999
> URL: https://issues.apache.org/jira/browse/ARROW-6999
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
> Environment: pandas==0.23.4
> pyarrow==0.15.0  # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0
>Reporter: Tom Goodman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
> Attachments: test3.hdf
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  # Generate any DataFrame's pyarrow Schema using Table.from_pandas
>  # Pass the generated schema as input into Table.from_pandas
>  # Causes KeyError: '__index_level_0__'
> We did not have this issue with pyarrow==0.11.0 which we used to write many 
> partitions across years.  Our goal now is to use pyarrow==0.15.0 and produce 
> schema going forward that are *backwards compatible* (i.e. also have 
> '__index_level_0__'), so we should not need to re-generate all prior years' 
> partitions when we migrate to 0.15.0.
> We cannot set _preserve_index=False_, since that effectively deletes 
> '__index_level_0__', causing inconsistent schema across earlier partitions 
> that had been written using pyarrow==0.11.0.
>  
> {code:java}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame() 
> schema = pa.Table.from_pandas(df).schema
> pa_table = pa.Table.from_pandas(df, schema=schema)
> {code}
> {noformat}
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3078, in get_loc
> return self._engine.get_loc(key)
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__index_level_0__'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
>  line 408, in _get_columns_to_convert_given_schema
> col = df[name]
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2688, in __getitem__
> return self._getitem_column(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2695, in _getitem_column
> return self._get_item_cache(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py",
>  line 2489, in _get_item_cache
> values = self._data.get(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py",
>  line 4115, in get
> loc = self.items.get_loc(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/r

[jira] [Comment Edited] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema

2019-10-29 Thread Tom Goodman (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962254#comment-16962254
 ] 

Tom Goodman edited comment on ARROW-6999 at 10/29/19 5:25 PM:
--

[~jorisvandenbossche] thank you for the quick turn-around!

We store the partitions in parquet files, with directories defining partitions. 
 This allows us to use the [ParquetDataset partition level 
filters|https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html#pyarrow-parquet-parquetdataset]
 like [[('mm', '=', 201909)]] ...
{noformat}
tree
.
|-- _common_metadata
|-- mm=201909
|   `-- e097411586b0460e860c331b63fecb2b.parquet
`-- mm=201910
`-- b8de9aa413194cc4af6f4802b5c4923f.parquet
.
.

{noformat}


was (Author: goodiegoodman):
[~jorisvandenbossche] thank you for the quick turn-around!

We store the partitions in parquet files, with directories defining partitions. 
 This allows us to use the [ParquetDataset partition level 
filters|https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html#pyarrow-parquet-parquetdataset]
  ...
{noformat}
tree
.
|-- _common_metadata
|-- mm=201909
|   `-- e097411586b0460e860c331b63fecb2b.parquet
`-- mm=201910
`-- b8de9aa413194cc4af6f4802b5c4923f.parquet
.
.

{noformat}

> [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own 
> schema
> ---
>
> Key: ARROW-6999
> URL: https://issues.apache.org/jira/browse/ARROW-6999
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
> Environment: pandas==0.23.4
> pyarrow==0.15.0  # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0
>Reporter: Tom Goodman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
> Attachments: test3.hdf
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  # Generate any DataFrame's pyarrow Schema using Table.from_pandas
>  # Pass the generated schema as input into Table.from_pandas
>  # Causes KeyError: '__index_level_0__'
> We did not have this issue with pyarrow==0.11.0 which we used to write many 
> partitions across years.  Our goal now is to use pyarrow==0.15.0 and produce 
> schema going forward that are *backwards compatible* (i.e. also have 
> '__index_level_0__'), so we should not need to re-generate all prior years' 
> partitions when we migrate to 0.15.0.
> We cannot set _preserve_index=False_, since that effectively deletes 
> '__index_level_0__', causing inconsistent schema across earlier partitions 
> that had been written using pyarrow==0.11.0.
>  
> {code:java}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame() 
> schema = pa.Table.from_pandas(df).schema
> pa_table = pa.Table.from_pandas(df, schema=schema)
> {code}
> {noformat}
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3078, in get_loc
> return self._engine.get_loc(key)
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__index_level_0__'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
>  line 408, in _get_columns_to_convert_given_schema
> col = df[name]
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2688, in __getitem__
> return self._getitem_column(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2695, in _getitem_column
> return self._get_item_cache(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py",
>  line 2489, in _get_item_cache
> values = self._data.get(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py",
>  line 4115, in get
> loc = self.items.get_loc(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  

[jira] [Comment Edited] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema

2019-10-29 Thread Tom Goodman (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962254#comment-16962254
 ] 

Tom Goodman edited comment on ARROW-6999 at 10/29/19 5:23 PM:
--

[~jorisvandenbossche] thank you for the quick turn-around!

We store the partitions in parquet files, with directories defining partitions. 
 This allows us to use the [ParquetDataset partition level 
filters|https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html#pyarrow-parquet-parquetdataset]
  ...
{noformat}
tree
.
|-- _common_metadata
|-- mm=201909
|   `-- e097411586b0460e860c331b63fecb2b.parquet
`-- mm=201910
`-- b8de9aa413194cc4af6f4802b5c4923f.parquet
.
.

{noformat}


was (Author: goodiegoodman):
[~jorisvandenbossche] thank you for the quick turn-around!

We store the partitions in parquet files, with directories defining partitions. 
 This allows us to use the [ParquetDataset 
filters|[https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html#pyarrow-parquet-parquetdataset]]
  ...
{noformat}
tree
.
|-- _common_metadata
|-- mm=201909
|   `-- e097411586b0460e860c331b63fecb2b.parquet
`-- mm=201910
`-- b8de9aa413194cc4af6f4802b5c4923f.parquet
.
.

{noformat}

> [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own 
> schema
> ---
>
> Key: ARROW-6999
> URL: https://issues.apache.org/jira/browse/ARROW-6999
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
> Environment: pandas==0.23.4
> pyarrow==0.15.0  # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0
>Reporter: Tom Goodman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
> Attachments: test3.hdf
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  # Generate any DataFrame's pyarrow Schema using Table.from_pandas
>  # Pass the generated schema as input into Table.from_pandas
>  # Causes KeyError: '__index_level_0__'
> We did not have this issue with pyarrow==0.11.0 which we used to write many 
> partitions across years.  Our goal now is to use pyarrow==0.15.0 and produce 
> schema going forward that are *backwards compatible* (i.e. also have 
> '__index_level_0__'), so we should not need to re-generate all prior years' 
> partitions when we migrate to 0.15.0.
> We cannot set _preserve_index=False_, since that effectively deletes 
> '__index_level_0__', causing inconsistent schema across earlier partitions 
> that had been written using pyarrow==0.11.0.
>  
> {code:java}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame() 
> schema = pa.Table.from_pandas(df).schema
> pa_table = pa.Table.from_pandas(df, schema=schema)
> {code}
> {noformat}
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3078, in get_loc
> return self._engine.get_loc(key)
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__index_level_0__'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
>  line 408, in _get_columns_to_convert_given_schema
> col = df[name]
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2688, in __getitem__
> return self._getitem_column(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2695, in _getitem_column
> return self._get_item_cache(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py",
>  line 2489, in _get_item_cache
> values = self._data.get(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py",
>  line 4115, in get
> loc = self.items.get_loc(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3080, in get_loc
> return self._engin

[jira] [Commented] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema

2019-10-29 Thread Tom Goodman (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962254#comment-16962254
 ] 

Tom Goodman commented on ARROW-6999:


[~jorisvandenbossche] thank you for the quick turn-around!

We store the partitions in parquet files, with directories defining partitions. 
 This allows us to use the [ParquetDataset 
filters|[https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html#pyarrow-parquet-parquetdataset]]
  ...
{noformat}
tree
.
|-- _common_metadata
|-- mm=201909
|   `-- e097411586b0460e860c331b63fecb2b.parquet
`-- mm=201910
`-- b8de9aa413194cc4af6f4802b5c4923f.parquet
.
.

{noformat}

> [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own 
> schema
> ---
>
> Key: ARROW-6999
> URL: https://issues.apache.org/jira/browse/ARROW-6999
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
> Environment: pandas==0.23.4
> pyarrow==0.15.0  # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0
>Reporter: Tom Goodman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
> Attachments: test3.hdf
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  # Generate any DataFrame's pyarrow Schema using Table.from_pandas
>  # Pass the generated schema as input into Table.from_pandas
>  # Causes KeyError: '__index_level_0__'
> We did not have this issue with pyarrow==0.11.0 which we used to write many 
> partitions across years.  Our goal now is to use pyarrow==0.15.0 and produce 
> schema going forward that are *backwards compatible* (i.e. also have 
> '__index_level_0__'), so we should not need to re-generate all prior years' 
> partitions when we migrate to 0.15.0.
> We cannot set _preserve_index=False_, since that effectively deletes 
> '__index_level_0__', causing inconsistent schema across earlier partitions 
> that had been written using pyarrow==0.11.0.
>  
> {code:java}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame() 
> schema = pa.Table.from_pandas(df).schema
> pa_table = pa.Table.from_pandas(df, schema=schema)
> {code}
> {noformat}
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3078, in get_loc
> return self._engine.get_loc(key)
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__index_level_0__'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
>  line 408, in _get_columns_to_convert_given_schema
> col = df[name]
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2688, in __getitem__
> return self._getitem_column(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2695, in _getitem_column
> return self._get_item_cache(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py",
>  line 2489, in _get_item_cache
> values = self._data.get(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py",
>  line 4115, in get
> loc = self.items.get_loc(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3080, in get_loc
> return self._engine.get_loc(self._maybe_cast_indexer(key))
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__index_level_0__'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>  

[jira] [Comment Edited] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema

2019-10-28 Thread Tom Goodman (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961317#comment-16961317
 ] 

Tom Goodman edited comment on ARROW-6999 at 10/28/19 7:24 PM:
--

[~jorisvandenbossche] please try this with the attached [^test3.hdf] (not empty)
{code:java}
df2 = pd.read_hdf('test3.hdf','foo')
pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code}
I still get KeyError: '__index_level_0__' (+without+ specifying 
preserve_index)._ 

This may be because the index on test3.hdf is Int64Index and I see [pyarrow 
docs|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.from_pandas]
 say default behavior is to "store the index as a column", except for rage 
indexes.  This unfortunately makes the bug more prevalent.


was (Author: goodiegoodman):
[~jorisvandenbossche] please try this with the attached [^test3.hdf] (not empty)
{code:java}
df2 = pd.read_hdf('test3.hdf','foo')
pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code}
I still get KeyError: '__index_level_0__' (+without+ specifying 
preserve_index)._ 

This may be because the index on test3.hdf is Int64Index and I see [pyarrow 
docs|#pyarrow.Table.from_pandas]] say default behavior is to "store the index 
as a column", except for rage indexes.  This unfortunately makes the bug more 
prevalent.

> [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own 
> schema
> ---
>
> Key: ARROW-6999
> URL: https://issues.apache.org/jira/browse/ARROW-6999
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
> Environment: pandas==0.23.4
> pyarrow==0.15.0  # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0
>Reporter: Tom Goodman
>Priority: Major
> Fix For: 1.0.0
>
> Attachments: test3.hdf
>
>
> Steps to reproduce:
>  # Generate any DataFrame's pyarrow Schema using Table.from_pandas
>  # Pass the generated schema as input into Table.from_pandas
>  # Causes KeyError: '__index_level_0__'
> We did not have this issue with pyarrow==0.11.0 which we used to write many 
> partitions across years.  Our goal now is to use pyarrow==0.15.0 and produce 
> schema going forward that are *backwards compatible* (i.e. also have 
> '__index_level_0__'), so we should not need to re-generate all prior years' 
> partitions when we migrate to 0.15.0.
> We cannot set _preserve_index=False_, since that effectively deletes 
> '__index_level_0__', causing inconsistent schema across earlier partitions 
> that had been written using pyarrow==0.11.0.
>  
> {code:java}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame() 
> schema = pa.Table.from_pandas(df).schema
> pa_table = pa.Table.from_pandas(df, schema=schema)
> {code}
> {noformat}
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3078, in get_loc
> return self._engine.get_loc(key)
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__index_level_0__'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
>  line 408, in _get_columns_to_convert_given_schema
> col = df[name]
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2688, in __getitem__
> return self._getitem_column(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2695, in _getitem_column
> return self._get_item_cache(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py",
>  line 2489, in _get_item_cache
> values = self._data.get(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py",
>  line 4115, in get
> loc = self.items.get_loc(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3080, in get_loc
> return se

[jira] [Comment Edited] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema

2019-10-28 Thread Tom Goodman (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961317#comment-16961317
 ] 

Tom Goodman edited comment on ARROW-6999 at 10/28/19 7:21 PM:
--

[~jorisvandenbossche] please try this with the attached [^test3.hdf] (not empty)
{code:java}
df2 = pd.read_hdf('test3.hdf','foo')
pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code}
I still get KeyError: '__index_level_0__' (+without+ specifying 
preserve_index)._ 

This may be because the index on test3.hdf is Int64Index and I see [pyarrow 
docs|#pyarrow.Table.from_pandas]] say default behavior is to "store the index 
as a column", except for rage indexes.  This unfortunately makes the bug more 
prevalent.


was (Author: goodiegoodman):
[~jorisvandenbossche] please try this with the attached [^test3.hdf] (not empty)
{code:java}
df2 = pd.read_hdf('test3.hdf','foo')
pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code}
I still get KeyError: '__index_level_0__' (+without+ specifying 
preserve_index)._ 

This may be because the index on test3.hdf is Int64Index and I see [pyarrow 
docs|[https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.from_pandas]]
 say default behavior is to "store the index as a column", except for rage 
indexes)

> [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own 
> schema
> ---
>
> Key: ARROW-6999
> URL: https://issues.apache.org/jira/browse/ARROW-6999
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
> Environment: pandas==0.23.4
> pyarrow==0.15.0  # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0
>Reporter: Tom Goodman
>Priority: Major
> Fix For: 1.0.0
>
> Attachments: test3.hdf
>
>
> Steps to reproduce:
>  # Generate any DataFrame's pyarrow Schema using Table.from_pandas
>  # Pass the generated schema as input into Table.from_pandas
>  # Causes KeyError: '__index_level_0__'
> We did not have this issue with pyarrow==0.11.0 which we used to write many 
> partitions across years.  Our goal now is to use pyarrow==0.15.0 and produce 
> schema going forward that are *backwards compatible* (i.e. also have 
> '__index_level_0__'), so we should not need to re-generate all prior years' 
> partitions when we migrate to 0.15.0.
> We cannot set _preserve_index=False_, since that effectively deletes 
> '__index_level_0__', causing inconsistent schema across earlier partitions 
> that had been written using pyarrow==0.11.0.
>  
> {code:java}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame() 
> schema = pa.Table.from_pandas(df).schema
> pa_table = pa.Table.from_pandas(df, schema=schema)
> {code}
> {noformat}
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3078, in get_loc
> return self._engine.get_loc(key)
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__index_level_0__'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
>  line 408, in _get_columns_to_convert_given_schema
> col = df[name]
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2688, in __getitem__
> return self._getitem_column(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2695, in _getitem_column
> return self._get_item_cache(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py",
>  line 2489, in _get_item_cache
> values = self._data.get(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py",
>  line 4115, in get
> loc = self.items.get_loc(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3080, in get_loc
> return self._engine.get_loc(self._maybe_cast_indexer(key)

[jira] [Comment Edited] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema

2019-10-28 Thread Tom Goodman (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961317#comment-16961317
 ] 

Tom Goodman edited comment on ARROW-6999 at 10/28/19 7:15 PM:
--

[~jorisvandenbossche] please try this with the attached [^test3.hdf] (not empty)
{code:java}
df2 = pd.read_hdf('test3.hdf','foo')
pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code}
I still get KeyError: '__index_level_0__' (+without+ specifying 
preserve_index)._ 

This may be because the index on test3.hdf is Int64Index and I see [pyarrow 
docs|[https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.from_pandas]]
 say default behavior is to "store the index as a column", except for rage 
indexes)


was (Author: goodiegoodman):
[~jorisvandenbossche] please try this with the attached [^test3.hdf] (not empty)
{code:java}
df2 = pd.read_hdf('test3.hdf','foo')
pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code}
I still get KeyError: '__index_level_0__' (+without+ specifying 
preserve_index)_, do you?

> [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own 
> schema
> ---
>
> Key: ARROW-6999
> URL: https://issues.apache.org/jira/browse/ARROW-6999
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
> Environment: pandas==0.23.4
> pyarrow==0.15.0  # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0
>Reporter: Tom Goodman
>Priority: Major
> Fix For: 1.0.0
>
> Attachments: test3.hdf
>
>
> Steps to reproduce:
>  # Generate any DataFrame's pyarrow Schema using Table.from_pandas
>  # Pass the generated schema as input into Table.from_pandas
>  # Causes KeyError: '__index_level_0__'
> We did not have this issue with pyarrow==0.11.0 which we used to write many 
> partitions across years.  Our goal now is to use pyarrow==0.15.0 and produce 
> schema going forward that are *backwards compatible* (i.e. also have 
> '__index_level_0__'), so we should not need to re-generate all prior years' 
> partitions when we migrate to 0.15.0.
> We cannot set _preserve_index=False_, since that effectively deletes 
> '__index_level_0__', causing inconsistent schema across earlier partitions 
> that had been written using pyarrow==0.11.0.
>  
> {code:java}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame() 
> schema = pa.Table.from_pandas(df).schema
> pa_table = pa.Table.from_pandas(df, schema=schema)
> {code}
> {noformat}
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3078, in get_loc
> return self._engine.get_loc(key)
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__index_level_0__'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
>  line 408, in _get_columns_to_convert_given_schema
> col = df[name]
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2688, in __getitem__
> return self._getitem_column(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2695, in _getitem_column
> return self._get_item_cache(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py",
>  line 2489, in _get_item_cache
> values = self._data.get(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py",
>  line 4115, in get
> loc = self.items.get_loc(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3080, in get_loc
> return self._engine.get_loc(self._maybe_cast_indexer(key))
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1

[jira] [Comment Edited] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema

2019-10-28 Thread Tom Goodman (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961317#comment-16961317
 ] 

Tom Goodman edited comment on ARROW-6999 at 10/28/19 6:32 PM:
--

[~jorisvandenbossche] please try this with the attached [^test3.hdf] (not empty)
{code:java}
df2 = pd.read_hdf('test3.hdf','foo')
pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code}
I still get KeyError: '__index_level_0__' (+without+ specifying 
preserve_index)_, do you?


was (Author: goodiegoodman):
[~jorisvandenbossche] please try this with the attached [^test3.hdf] (not empty)
{code:java}
df2 = pd.read_hdf('test3.hdf','foo')
pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code}
I still get KeyError: '__index_level_0__', do you?

> [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own 
> schema
> ---
>
> Key: ARROW-6999
> URL: https://issues.apache.org/jira/browse/ARROW-6999
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
> Environment: pandas==0.23.4
> pyarrow==0.15.0  # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0
>Reporter: Tom Goodman
>Priority: Major
> Fix For: 1.0.0
>
> Attachments: test3.hdf
>
>
> Steps to reproduce:
>  # Generate any DataFrame's pyarrow Schema using Table.from_pandas
>  # Pass the generated schema as input into Table.from_pandas
>  # Causes KeyError: '__index_level_0__'
> We did not have this issue with pyarrow==0.11.0 which we used to write many 
> partitions across years.  Our goal now is to use pyarrow==0.15.0 and produce 
> schema going forward that are *backwards compatible* (i.e. also have 
> '__index_level_0__'), so we should not need to re-generate all prior years' 
> partitions when we migrate to 0.15.0.
> We cannot set _preserve_index=False_, since that effectively deletes 
> '__index_level_0__', causing inconsistent schema across earlier partitions 
> that had been written using pyarrow==0.11.0.
>  
> {code:java}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame() 
> schema = pa.Table.from_pandas(df).schema
> pa_table = pa.Table.from_pandas(df, schema=schema)
> {code}
> {noformat}
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3078, in get_loc
> return self._engine.get_loc(key)
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__index_level_0__'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
>  line 408, in _get_columns_to_convert_given_schema
> col = df[name]
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2688, in __getitem__
> return self._getitem_column(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2695, in _getitem_column
> return self._get_item_cache(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py",
>  line 2489, in _get_item_cache
> values = self._data.get(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py",
>  line 4115, in get
> loc = self.items.get_loc(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3080, in get_loc
> return self._engine.get_loc(self._maybe_cast_indexer(key))
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__index_level_0__'
> During handling of the above exception, another exception occurred:
> Traceback (m

[jira] [Comment Edited] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema

2019-10-28 Thread Tom Goodman (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961317#comment-16961317
 ] 

Tom Goodman edited comment on ARROW-6999 at 10/28/19 6:13 PM:
--

[~jorisvandenbossche] please try this with the attached [^test3.hdf] (not empty)
{code:java}
df2 = pd.read_hdf('test3.hdf','foo')
pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code}
I still get KeyError: '__index_level_0__', do you?


was (Author: goodiegoodman):
[~jorisvandenbossche] please try this with the attached test3.hdf (not empty)
{code:java}
df2 = pd.read_hdf('test3.hdf','foo')
pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code}
I still get KeyError: '__index_level_0__', do you?

> [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own 
> schema
> ---
>
> Key: ARROW-6999
> URL: https://issues.apache.org/jira/browse/ARROW-6999
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
> Environment: pandas==0.23.4
> pyarrow==0.15.0  # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0
>Reporter: Tom Goodman
>Priority: Major
> Fix For: 1.0.0
>
> Attachments: test3.hdf
>
>
> Steps to reproduce:
>  # Generate any DataFrame's pyarrow Schema using Table.from_pandas
>  # Pass the generated schema as input into Table.from_pandas
>  # Causes KeyError: '__index_level_0__'
> We did not have this issue with pyarrow==0.11.0 which we used to write many 
> partitions across years.  Our goal now is to use pyarrow==0.15.0 and produce 
> schema going forward that are *backwards compatible* (i.e. also have 
> '__index_level_0__'), so we should not need to re-generate all prior years' 
> partitions when we migrate to 0.15.0.
> We cannot set _preserve_index=False_, since that effectively deletes 
> '__index_level_0__', causing inconsistent schema across earlier partitions 
> that had been written using pyarrow==0.11.0.
>  
> {code:java}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame() 
> schema = pa.Table.from_pandas(df).schema
> pa_table = pa.Table.from_pandas(df, schema=schema)
> {code}
> {noformat}
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3078, in get_loc
> return self._engine.get_loc(key)
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__index_level_0__'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
>  line 408, in _get_columns_to_convert_given_schema
> col = df[name]
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2688, in __getitem__
> return self._getitem_column(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2695, in _getitem_column
> return self._get_item_cache(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py",
>  line 2489, in _get_item_cache
> values = self._data.get(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py",
>  line 4115, in get
> loc = self.items.get_loc(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3080, in get_loc
> return self._engine.get_loc(self._maybe_cast_indexer(key))
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__index_level_0__'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/GAAR/F

[jira] [Commented] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema

2019-10-28 Thread Tom Goodman (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961317#comment-16961317
 ] 

Tom Goodman commented on ARROW-6999:


[~jorisvandenbossche] please try this with the attached test3.hdf (not empty)
{code:java}
df2 = pd.read_hdf('test3.hdf','foo')
pa.Table.from_pandas(df2, schema=pa.Table.from_pandas(df2).schema){code}
I still get KeyError: '__index_level_0__', do you?

> [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own 
> schema
> ---
>
> Key: ARROW-6999
> URL: https://issues.apache.org/jira/browse/ARROW-6999
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
> Environment: pandas==0.23.4
> pyarrow==0.15.0  # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0
>Reporter: Tom Goodman
>Priority: Major
> Fix For: 1.0.0
>
> Attachments: test3.hdf
>
>
> Steps to reproduce:
>  # Generate any DataFrame's pyarrow Schema using Table.from_pandas
>  # Pass the generated schema as input into Table.from_pandas
>  # Causes KeyError: '__index_level_0__'
> We did not have this issue with pyarrow==0.11.0 which we used to write many 
> partitions across years.  Our goal now is to use pyarrow==0.15.0 and produce 
> schema going forward that are *backwards compatible* (i.e. also have 
> '__index_level_0__'), so we should not need to re-generate all prior years' 
> partitions when we migrate to 0.15.0.
> We cannot set _preserve_index=False_, since that effectively deletes 
> '__index_level_0__', causing inconsistent schema across earlier partitions 
> that had been written using pyarrow==0.11.0.
>  
> {code:java}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame() 
> schema = pa.Table.from_pandas(df).schema
> pa_table = pa.Table.from_pandas(df, schema=schema)
> {code}
> {noformat}
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3078, in get_loc
> return self._engine.get_loc(key)
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__index_level_0__'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
>  line 408, in _get_columns_to_convert_given_schema
> col = df[name]
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2688, in __getitem__
> return self._getitem_column(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2695, in _getitem_column
> return self._get_item_cache(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py",
>  line 2489, in _get_item_cache
> values = self._data.get(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py",
>  line 4115, in get
> loc = self.items.get_loc(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3080, in get_loc
> return self._engine.get_loc(self._maybe_cast_indexer(key))
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__index_level_0__'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/IPython/core/interactiveshell.py",
>  line 3326, in run_code
> exec(code_obj, self.user_global_ns, self.user_ns)
>   File "", line 5, in 
> pa_table = pa.Table.from_pandas(df, 
> schema=pa.Table.from_pandas(df).schema)
>   File "pyarrow/table.pxi", 

[jira] [Updated] (ARROW-6999) [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own schema

2019-10-28 Thread Tom Goodman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Goodman updated ARROW-6999:
---
Attachment: test3.hdf

> [Python] KeyError: '__index_level_0__' passing Table.from_pandas its own 
> schema
> ---
>
> Key: ARROW-6999
> URL: https://issues.apache.org/jira/browse/ARROW-6999
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
> Environment: pandas==0.23.4
> pyarrow==0.15.0  # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0
>Reporter: Tom Goodman
>Priority: Major
> Fix For: 1.0.0
>
> Attachments: test3.hdf
>
>
> Steps to reproduce:
>  # Generate any DataFrame's pyarrow Schema using Table.from_pandas
>  # Pass the generated schema as input into Table.from_pandas
>  # Causes KeyError: '__index_level_0__'
> We did not have this issue with pyarrow==0.11.0 which we used to write many 
> partitions across years.  Our goal now is to use pyarrow==0.15.0 and produce 
> schema going forward that are *backwards compatible* (i.e. also have 
> '__index_level_0__'), so we should not need to re-generate all prior years' 
> partitions when we migrate to 0.15.0.
> We cannot set _preserve_index=False_, since that effectively deletes 
> '__index_level_0__', causing inconsistent schema across earlier partitions 
> that had been written using pyarrow==0.11.0.
>  
> {code:java}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame() 
> schema = pa.Table.from_pandas(df).schema
> pa_table = pa.Table.from_pandas(df, schema=schema)
> {code}
> {noformat}
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3078, in get_loc
> return self._engine.get_loc(key)
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__index_level_0__'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
>  line 408, in _get_columns_to_convert_given_schema
> col = df[name]
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2688, in __getitem__
> return self._getitem_column(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2695, in _getitem_column
> return self._get_item_cache(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py",
>  line 2489, in _get_item_cache
> values = self._data.get(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py",
>  line 4115, in get
> loc = self.items.get_loc(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3080, in get_loc
> return self._engine.get_loc(self._maybe_cast_indexer(key))
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__index_level_0__'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/IPython/core/interactiveshell.py",
>  line 3326, in run_code
> exec(code_obj, self.user_global_ns, self.user_ns)
>   File "", line 5, in 
> pa_table = pa.Table.from_pandas(df, 
> schema=pa.Table.from_pandas(df).schema)
>   File "pyarrow/table.pxi", line 1057, in pyarrow.lib.Table.from_pandas
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
>  line 517, in dataframe_to_arrays
> columns)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sf

[jira] [Updated] (ARROW-6999) KeyError: '__index_level_0__' passing Table.from_pandas its own schema

2019-10-26 Thread Tom Goodman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Goodman updated ARROW-6999:
---
Environment: 
pandas==0.23.4
pyarrow==0.15.0  # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0


  was:
pip freeze
certifi==2019.6.16
numpy==1.17.2
pandas==0.23.4
pyarrow==0.15.0  # Issue also seen in 0.14.0, 0.13.0, 0.12.0
python-dateutil==2.8.0
pytz==2019.2
six==1.12.0



> KeyError: '__index_level_0__' passing Table.from_pandas its own schema
> --
>
> Key: ARROW-6999
> URL: https://issues.apache.org/jira/browse/ARROW-6999
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
> Environment: pandas==0.23.4
> pyarrow==0.15.0  # Issue also with 0.14.0, 0.13.0 & 0.12.0. but not 0.11.0
>Reporter: Tom Goodman
>Priority: Major
>
> Steps to reproduce:
>  # Generate any DataFrame's pyarrow Schema using Table.from_pandas
>  # Pass the generated schema as input into Table.from_pandas
>  # Causes KeyError: '__index_level_0__'
> We did not have this issue with pyarrow==0.11.0 which we used to write many 
> partitions across years.  Our goal now is to use pyarrow==0.15.0 and produce 
> schema going forward that are *backwards compatible* (i.e. also have 
> '__index_level_0__'), so we should not need to re-generate all prior years' 
> partitions when we migrate to 0.15.0.
> We cannot set _preserve_index=False_, since that effectively deletes 
> '__index_level_0__', causing inconsistent schema across earlier partitions 
> that had been written using pyarrow==0.11.0.
>  
> {code:java}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame() 
> schema = pa.Table.from_pandas(df).schema
> pa_table = pa.Table.from_pandas(df, schema=schema)
> {code}
> {noformat}
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3078, in get_loc
> return self._engine.get_loc(key)
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__index_level_0__'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
>  line 408, in _get_columns_to_convert_given_schema
> col = df[name]
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2688, in __getitem__
> return self._getitem_column(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
>  line 2695, in _getitem_column
> return self._get_item_cache(key)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py",
>  line 2489, in _get_item_cache
> values = self._data.get(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py",
>  line 4115, in get
> loc = self.items.get_loc(item)
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3080, in get_loc
> return self._engine.get_loc(self._maybe_cast_indexer(key))
>   File "pandas/_libs/index.pyx", line 140, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 162, in 
> pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
> pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: '__index_level_0__'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/IPython/core/interactiveshell.py",
>  line 3326, in run_code
> exec(code_obj, self.user_global_ns, self.user_ns)
>   File "", line 5, in 
> pa_table = pa.Table.from_pandas(df, 
> schema=pa.Table.from_pandas(df).schema)
>   File "pyarrow/table.pxi", line 1057, in pyarrow.lib.Table.from_pandas
>   File 
> "/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2

[jira] [Created] (ARROW-6999) KeyError: '__index_level_0__' passing Table.from_pandas its own schema

2019-10-26 Thread Tom Goodman (Jira)
Tom Goodman created ARROW-6999:
--

 Summary: KeyError: '__index_level_0__' passing Table.from_pandas 
its own schema
 Key: ARROW-6999
 URL: https://issues.apache.org/jira/browse/ARROW-6999
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.15.0, 0.14.0, 0.13.0, 0.12.0
 Environment: pip freeze
certifi==2019.6.16
numpy==1.17.2
pandas==0.23.4
pyarrow==0.15.0  # Issue also seen in 0.14.0, 0.13.0, 0.12.0
python-dateutil==2.8.0
pytz==2019.2
six==1.12.0

Reporter: Tom Goodman


Steps to reproduce:
 # Generate any DataFrame's pyarrow Schema using Table.from_pandas
 # Pass the generated schema as input into Table.from_pandas
 # Causes KeyError: '__index_level_0__'

We did not have this issue with pyarrow==0.11.0 which we used to write many 
partitions across years.  Our goal now is to use pyarrow==0.15.0 and produce 
schema going forward that are *backwards compatible* (i.e. also have 
'__index_level_0__'), so we should not need to re-generate all prior years' 
partitions when we migrate to 0.15.0.

We cannot set _preserve_index=False_, since that effectively deletes 
'__index_level_0__', causing inconsistent schema across earlier partitions that 
had been written using pyarrow==0.11.0.

 
{code:java}
import pandas as pd
import pyarrow as pa
df = pd.DataFrame() 
schema = pa.Table.from_pandas(df).schema
pa_table = pa.Table.from_pandas(df, schema=schema)

{code}
{noformat}
Traceback (most recent call last):
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
 line 3078, in get_loc
return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 140, in 
pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in 
pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '__index_level_0__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
 line 408, in _get_columns_to_convert_given_schema
col = df[name]
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
 line 2688, in __getitem__
return self._getitem_column(key)
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/frame.py",
 line 2695, in _getitem_column
return self._get_item_cache(key)
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/generic.py",
 line 2489, in _get_item_cache
values = self._data.get(item)
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/internals.py",
 line 4115, in get
loc = self.items.get_loc(item)
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pandas/core/indexes/base.py",
 line 3080, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 140, in 
pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in 
pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in 
pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in 
pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '__index_level_0__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/IPython/core/interactiveshell.py",
 line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
  File "", line 5, in 
pa_table = pa.Table.from_pandas(df, schema=pa.Table.from_pandas(df).schema)
  File "pyarrow/table.pxi", line 1057, in pyarrow.lib.Table.from_pandas
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
 line 517, in dataframe_to_arrays
columns)
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
 line 337, in _get_columns_to_convert
return _get_columns_to_convert_given_schema(df, schema, preserve_index)
  File 
"/GAAR/FIAG/sandbox/software/miniconda3/envs/rc_sfi_2019.1/lib/python3.6/site-packages/pyarrow/pandas_compat.py",
 line 426, in _get_columns_to_convert_given_schema
"in the columns or index".format(name))
KeyError: "name '__index_