[ https://issues.apache.org/jira/browse/ARROW-16641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vladimir updated ARROW-16641: ----------------------------- Description: In the parquet data we have, there is a column with the array data type ({*}list<array_element <string>>{*}), which flags records that have different issues. For each record, multiple values could be stored in the column. For example, `{_}[A, B, C]{_}`. I'm trying to perform a data filtering step and exclude some flagged records. Filtering is trivial for the regular columns that contain just a single value. E.g., {code:java} flags_to_exclude <- c("A", "B") datt %>% filter(! col %in% flags_to_exclude) {code} Given the array column, is it possible to exclude records with at least one of the flags from `flags_to_exclude` using the arrow R package? I really appreciate any advice you can provide! was: In the parquet data we have, there is a column with the array data type ({*}list<array_element <string>>{*}), which flags records that have different issues. For each record, multiple values could be stored in the column. For example, `{_}[A, B, C]{_}`. I'm trying to perform a data filtering step and exclude some flagged records. Filtering is trivial for the regular columns that contain just a single value. E.g., {code:java} flags_to_exclude <- c("A", "B") datt %>% filter(! col %in% flags_to_exclude) {code} Given the array column, is it possible to exclude records with at least one of the flags from `flags_to_exclude` using the arrow R package? I really appreciate any advice you can provide! > [R] How to filter array columns? > -------------------------------- > > Key: ARROW-16641 > URL: https://issues.apache.org/jira/browse/ARROW-16641 > Project: Apache Arrow > Issue Type: Wish > Components: R > Reporter: Vladimir > Priority: Minor > Fix For: 8.0.0 > > > In the parquet data we have, there is a column with the array data type > ({*}list<array_element <string>>{*}), which flags records that have different > issues. For each record, multiple values could be stored in the column. For > example, `{_}[A, B, C]{_}`. > I'm trying to perform a data filtering step and exclude some flagged records. > Filtering is trivial for the regular columns that contain just a single > value. E.g., > {code:java} > flags_to_exclude <- c("A", "B") > datt %>% filter(! col %in% flags_to_exclude) > {code} > Given the array column, is it possible to exclude records with at least one > of the flags from `flags_to_exclude` using the arrow R package? > I really appreciate any advice you can provide! -- This message was sent by Atlassian Jira (v8.20.7#820007)