[ 
https://issues.apache.org/jira/browse/ARROW-436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17657470#comment-17657470
 ] 

Rok Mihevc commented on ARROW-436:
----------------------------------

This issue has been migrated to [issue 
#16083|https://github.com/apache/arrow/issues/16083] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Python] pandas-parquet roundtrip dtype mismatch
> ------------------------------------------------
>
>                 Key: ARROW-436
>                 URL: https://issues.apache.org/jira/browse/ARROW-436
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Wes McKinney
>            Priority: Major
>
> As a follow up to ARROW-434, I observed the following odd failure:
> {code}
> @parquet
> def test_pandas_parquet_pyfile_failure(tmpdir):
>     filename = tmpdir.join('pandas_pyfile_roundtrip.parquet').strpath
>     size = 5
>     np.random.seed(0)
>     df = pd.DataFrame({
>         'uint8': np.arange(size, dtype=np.uint8),
>         'uint16': np.arange(size, dtype=np.uint16),
>         'uint32': np.arange(size, dtype=np.uint32),
>         'uint64': np.arange(size, dtype=np.uint64),
>         'int8': np.arange(size, dtype=np.int16),
>         'int16': np.arange(size, dtype=np.int16),
>         'int32': np.arange(size, dtype=np.int32),
>         'int64': np.arange(size, dtype=np.int64),
>         'float32': np.arange(size, dtype=np.float32),
>         'float64': np.arange(size, dtype=np.float64),
>         'bool': np.random.randn(size) > 0
>     })
>     arrow_table = A.from_pandas_dataframe(df)
>     with open(filename, 'wb') as f:
>         A.parquet.write_table(arrow_table, f, version="1.0")
>     data = io.BytesIO(open(filename, 'rb').read())
>     table_read = pq.read_table(data)
>     df_read = table_read.to_pandas()
>     pdt.assert_frame_equal(df, df_read)
> {code}
> I see debugging locally:
> {code}
> (Pdb) df.dtypes
> bool          bool
> float32    float32
> float64    float64
> int16        int16
> int32        int32
> int64        int64
> int8         int16
> uint16      uint16
> uint32      uint32
> uint64      uint64
> uint8        uint8
> dtype: object
> (Pdb) df_read.dtypes
> bool          bool
> float32    float32
> float64    float64
> int16        int16
> int32        int32
> int64        int64
> int8         int16
> uint16      uint16
> uint32       int64
> uint64      uint64
> uint8        uint8
> dtype: object
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to