[ https://issues.apache.org/jira/browse/ARROW-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Uwe L. Korn updated ARROW-4267: ------------------------------- Fix Version/s: 0.12.1 > [Python/C++] Segfault when reading rowgroups with duplicated columns > -------------------------------------------------------------------- > > Key: ARROW-4267 > URL: https://issues.apache.org/jira/browse/ARROW-4267 > Project: Apache Arrow > Issue Type: Bug > Affects Versions: 0.11.1 > Reporter: Florian Jetter > Assignee: Uwe L. Korn > Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0, 0.12.1 > > Time Spent: 1.5h > Remaining Estimate: 0h > > When reading a row group using duplicated columns I receive a segfault. > {code:python} > import pandas as pd > import pyarrow as pa > import pyarrow.parquet as pq > df = pd.DataFrame({ > "col": ["A", "B"] > }) > table = pa.Table.from_pandas(df) > buf = pa.BufferOutputStream() > pq.write_table(table, buf) > parquet_file = pq.ParquetFile(buf.getvalue()) > parquet_file.read_row_group(0) > parquet_file.read_row_group(0, columns=["col"]) > # boom > parquet_file.read_row_group(0, columns=["col", "col"]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)