[
https://issues.apache.org/jira/browse/ARROW-14649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441715#comment-17441715
]
David Li commented on ARROW-14649:
----------------------------------
Somewhat related is ARROW-14177 where I'd like to first unify dictionaries
(instead of unifying on the fly).
> [R] Include unused factor levels in coalesce() output
> -----------------------------------------------------
>
> Key: ARROW-14649
> URL: https://issues.apache.org/jira/browse/ARROW-14649
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Reporter: Ian Cook
> Priority: Minor
>
> ARROW-14167 added support for factors in {{{}coalesce(){}}}, but the factors
> that are returned will not necessarily retain the factor levels like
> {{coalesce()}} does when used on an R data frame.
> For example, compare these, noticing the difference in the levels:
> {code:r}
> # R data frame
> tibble(x = factor(c("a", NA_character_)), y = factor(c("b", "c"))) %>%
> mutate(y = coalesce(x, y)) %>%
> pull(y)
> #> [1] a c
> #> Levels: a b c{code}
> {code:r}
> # Arrow Table
> tibble(x = factor(c("a", NA_character_)), y = factor(c("b", "c"))) %>%
> Table$create() %>%
> mutate(y = coalesce(x, y)) %>%
> pull(y)
> #> [1] a c
> #> Levels: a c{code}
> I'm not sure if it is practical to make Arrow return the factors with the
> unused levels included like R does. If so, we should do it.
> See the test in {{test-dplyr-funcs-conditional.R}} that refers to this Jira.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)