Joris Van den Bossche created ARROW-14287: ---------------------------------------------
Summary: [R] Selecting colums while reading Parquet file with nested types can give wrong column Key: ARROW-14287 URL: https://issues.apache.org/jira/browse/ARROW-14287 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Joris Van den Bossche I created two small files (using Python for my convenience): {code:python} import pyarrow as pa import pyarrow.parquet as pq table = pa.table({"a": [1, 2], "b": [3, 4]}) pq.write_table(table, "test1.parquet") table = pa.table({"a": [1, 2], "nested": [[{'f1': 1, 'f2': 3}, {'f1': 3, 'f2': 4}], None], "b": [3, 4]}) pq.write_table(table, "test2.parquet") {code} where the first is a simple file, and the second contains a column with a nested list of struct type. Reading that in R with a column selection works in the first case, but actually reads the second column instead of third in the second case: {code:r} > arrow::read_parquet("test1.parquet", col_select=c("b")) b 1 3 2 4 > arrow::read_parquet("test2.parquet", col_select=c("b")) nested 1 3, 4 2 NULL {code} This is due to the simple conversion of column names to integer indices in the R code, while Parquet counts the individual fields of nested columns separately. -- This message was sent by Atlassian Jira (v8.3.4#803005)