[ https://issues.apache.org/jira/browse/ARROW-10056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217904#comment-17217904 ]
Gert Hulselmans commented on ARROW-10056: ----------------------------------------- I just tested the code you posted, and indeed this works: {code:python} n_columns = 499999 table = pa.table([np.random.randn(1) for _ in range(n_columns)], names=['col' + str(i) for i in range(n_columns)]) from pyarrow import feather feather.write_feather(table, "test_wide.feather") result = feather.read_table("test_wide.feather") {code} But with e.g. : "n_columns = 599999" it fails again. So it is not specific to pandas to table conversion. > [Python] PyArrow writes invalid Feather v2 file: OSError: Verification of > flatbuffer-encoded Footer failed. > ----------------------------------------------------------------------------------------------------------- > > Key: ARROW-10056 > URL: https://issues.apache.org/jira/browse/ARROW-10056 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 1.0.1 > Environment: CentOS7 > conda environment with pyarrow 1.0.1, numpy 1.19.1 and pandas 1.1.1 > Reporter: Gert Hulselmans > Priority: Major > Fix For: 3.0.0 > > > pyarrow writes an invalid Feather v2 file, which it can't read afterwards. > {code:java} > OSError: Verification of flatbuffer-encoded Footer failed. > {code} > The following code reproduces the problem for me: > {code:python} > import pyarrow as pa > import numpy as np > import pandas as pd > nbr_regions = 1223024 > nbr_motifs = 4891 > # Create (big) dataframe. > df = pd.DataFrame( > np.arange(nbr_regions * nbr_motifs, > dtype=np.float32).reshape((nbr_regions, nbr_motifs)), > index=pd.Index(['region' + str(i) for i in range(nbr_regions)], > name='regions'), > columns=pd.Index(['motif' + str(i) for i in range(nbr_motifs)], > name='motifs') > ) > # Transpose dataframe > df_transposed = df.transpose() > # Write transposed dataframe to Feather v2 format. > pf.write_feather(df_transposed, 'df_transposed.feather') > # Trying to read the transposed dataframe from Feather v2 format, results in > this error: > df_transposed_read = pf.read_feather('df_transposed.feather') > {code} > {code:python} > --------------------------------------------------------------------------- > OSError Traceback (most recent call last) > <ipython-input-64-b41ad5157e77> in <module> > ----> 1 df_transposed_read = pf.read_feather('df_transposed.feather') > /software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/feather.py > in read_feather(source, columns, use_threads, memory_map) > 213 """ > 214 _check_pandas_version() > --> 215 return (read_table(source, columns=columns, memory_map=memory_map) > 216 .to_pandas(use_threads=use_threads)) > 217 > /software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/feather.py > in read_table(source, columns, memory_map) > 235 """ > 236 reader = ext.FeatherReader() > --> 237 reader.open(source, use_memory_map=memory_map) > 238 > 239 if columns is None: > /software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/feather.pxi > in pyarrow.lib.FeatherReader.open() > /software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/error.pxi > in pyarrow.lib.pyarrow_internal_check_status() > /software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/error.pxi > in pyarrow.lib.check_status() > OSError: Verification of flatbuffer-encoded Footer failed. > {code} > Later I discovered that it happens also if the original dataframe is created > in the transposed order: > {code:python} > # Create (big) dataframe. > df_without_transpose = pd.DataFrame( > np.arange(nbr_motifs * nbr_regions, > dtype=np.float32).reshape((nbr_motifs, nbr_regions)), > index=pd.Index(['motif' + str(i) for i in range(nbr_motifs)], > name='motifs'), > columns=pd.Index(['region' + str(i) for i in range(nbr_regions)], > name='regions'), > ) > pf.write_feather(df_without_transpose, 'df_without_transpose.feather') > df_without_transpose_read = pf.read_feather('df_without_transpose.feather') > --------------------------------------------------------------------------- > OSError Traceback (most recent call last) > <ipython-input-91-3cdad1d58c35> in <module> > ----> 1 df_without_transpose_read = > pf.read_feather('df_without_transpose.feather') > /software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/feather.py > in read_feather(source, columns, use_threads, memory_map) > 213 """ > 214 _check_pandas_version() > --> 215 return (read_table(source, columns=columns, memory_map=memory_map) > 216 .to_pandas(use_threads=use_threads)) > 217 > /software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/feather.py > in read_table(source, columns, memory_map) > 235 """ > 236 reader = ext.FeatherReader() > --> 237 reader.open(source, use_memory_map=memory_map) > 238 > 239 if columns is None: > /software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/feather.pxi > in pyarrow.lib.FeatherReader.open() > /software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/error.pxi > in pyarrow.lib.pyarrow_internal_check_status() > /software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/error.pxi > in pyarrow.lib.check_status() > OSError: Verification of flatbuffer-encoded Footer failed. > {code} > Writing to Feather v1 format works: > {code:python} > pf.write_feather(df_transposed, 'df_transposed.v1.feather', version=1) > df_transposed_read_v1 = pf.read_feather('df_transposed.v1.feather') > # Now do the same, but also save the index in the Feather v1 file. > df_transposed_reset_index = df_transposed.reset_index() > pf.write_feather(df_transposed_reset_index, > 'df_transposed_reset_index.v1.feather', version=1) > df_transposed_reset_index_read_v1 = > pf.read_feather('df_transposed_reset_index.v1.feather') > # Returns True > df_transposed_reset_index_read_v1.equals(df_transposed) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)