[ 
https://issues.apache.org/jira/browse/ARROW-16641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir updated ARROW-16641:
-----------------------------
    Description: 
In the parquet data we have, there is a column with the array data type 
({*}list<array_element <string>>{*}), which flags records that have different 
issues. For each record, multiple values could be stored in the column. For 
example, `{_}[A, B, C]{_}`.

I'm trying to perform a data filtering step and exclude some flagged records.

Filtering is trivial for the regular columns that contain just a single value. 
E.g.,
{code:java}
flags_to_exclude <- c("A", "B")
datt %>% filter(! col %in% flags_to_exclude)
{code}
Given the array column, is it possible to exclude records with at least one of 
the flags from `flags_to_exclude` using the arrow R package?

I really appreciate any advice you can provide!

  was:
In the parquet data we have, there is a column with the array data type 
({*}list<array_element <string>>{*}), which flags records that have different 
issues. For each record, multiple values could be stored in the column. For 
example, `{_}[A, B, C]{_}`.

I'm trying to perform a data filtering step and exclude some flagged records.

Filtering is trivial for the regular columns that contain just a single value. 
E.g.,

 
{code:java}
flags_to_exclude <- c("A", "B")
datt %>% filter(! col %in% flags_to_exclude)
{code}
 

 

Given the array column, is it possible to exclude records with at least one of 
the flags from `flags_to_exclude` using the arrow R package?

I really appreciate any advice you can provide!


> [R] How to filter array columns?
> --------------------------------
>
>                 Key: ARROW-16641
>                 URL: https://issues.apache.org/jira/browse/ARROW-16641
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: R
>            Reporter: Vladimir
>            Priority: Minor
>             Fix For: 8.0.0
>
>
> In the parquet data we have, there is a column with the array data type 
> ({*}list<array_element <string>>{*}), which flags records that have different 
> issues. For each record, multiple values could be stored in the column. For 
> example, `{_}[A, B, C]{_}`.
> I'm trying to perform a data filtering step and exclude some flagged records.
> Filtering is trivial for the regular columns that contain just a single 
> value. E.g.,
> {code:java}
> flags_to_exclude <- c("A", "B")
> datt %>% filter(! col %in% flags_to_exclude)
> {code}
> Given the array column, is it possible to exclude records with at least one 
> of the flags from `flags_to_exclude` using the arrow R package?
> I really appreciate any advice you can provide!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to