[ https://issues.apache.org/jira/browse/ARROW-9456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157231#comment-17157231 ]
Maarten Breddels commented on ARROW-9456: ----------------------------------------- This file gives me the same problem {code:java} import vaex df = vaex.example()[:10] df.export_parquet('/tmp/crash.parquet'){code} Output: {noformat} $ python pyarrow/crash.py dev terminate called after throwing an instance of 'parquet::ParquetException' what(): The file only has 11 columns, requested metadata for column: 1228010192 [1] 1570024 abort (core dumped) python pyarrow/crash.py {noformat} However: {code:java} import vaex df = vaex.example()[['x']][:10] df.export_parquet('/tmp/crash.parquet') {code} Doesn't give me output, except in the debugger: {noformat} terminate called after throwing an instance of 'parquet::ParquetException' what(): The file only has 1 columns, requested metadata for column: 61{noformat} git commit 658618ecd540bc6af76efa608cd1ff7b7938ba4c (2 days old) Hope that helps > [Python] Dataset segfault when not importing pyarrow.parquet > ------------------------------------------------------------- > > Key: ARROW-9456 > URL: https://issues.apache.org/jira/browse/ARROW-9456 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Reporter: Maarten Breddels > Priority: Major > Fix For: 1.0.0 > > > To reproduce: > # import pyarrow.parquet # if we skip this... > import pyarrow as pa > import pyarrow.dataset as ds > import glob > ds = pa.dataset.dataset('/data/taxi_parquet/data_0.parquet') > ds.to_table() # this will crash > > $ python pyarrow/crash.py dev > terminate called after throwing an instance of 'parquet::ParquetException' > what(): The file only has 19 columns, requested metadata for column: > 1049198736 > [1] 1559395 abort (core dumped) python pyarrow/crash.py > > When the import is there, it will work fine. > -- This message was sent by Atlassian Jira (v8.3.4#803005)