[ https://issues.apache.org/jira/browse/ARROW-15838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510544#comment-17510544 ]
Joris Van den Bossche commented on ARROW-15838: ----------------------------------------------- As a reference, in {{dplyr}} this is controlled by a {{keep}} argument (https://dplyr.tidyverse.org/reference/mutate-joins.html) > [C++] Key column behavior in joins > ---------------------------------- > > Key: ARROW-15838 > URL: https://issues.apache.org/jira/browse/ARROW-15838 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Jonathan Keane > Priority: Major > > By default in dplyr (and possibly in pandas too?) coalesces the key column > for full joins to be the (non-null) values from both key columns: > {code} > > left <- tibble::tibble( > key = c(1, 2), > A = c(0, 1), > ) > left_tab <- Table$create(left) > > right <- tibble::tibble( > key = c(2, 3), > B = c(0, 1), > ) > right_tab <- Table$create(right) > > left %>% full_join(right) > Joining, by = "key" > # A tibble: 3 × 3 > key A B > <dbl> <dbl> <dbl> > 1 1 0 NA > 2 2 1 0 > 3 3 NA 1 > > left_tab %>% full_join(right_tab) %>% collect() > # A tibble: 3 × 3 > key A B > <dbl> <dbl> <dbl> > 1 2 1 0 > 2 1 0 NA > 3 NA NA 1 > {code} > And right join, we would expect the key from the right table to be in the > result, but we get the key from the left instead: > {code} > > left <- tibble::tibble( > key = c(1, 2), > A = c(0, 1), > ) > left_tab <- Table$create(left) > > right <- tibble::tibble( > key = c(2, 3), > B = c(0, 1), > ) > right_tab <- Table$create(right) > > left %>% right_join(right) > Joining, by = "key" > # A tibble: 2 × 3 > key A B > <dbl> <dbl> <dbl> > 1 2 1 0 > 2 3 NA 1 > > left_tab %>% right_join(right_tab) %>% collect() > # A tibble: 2 × 3 > key A B > <dbl> <dbl> <dbl> > 1 2 1 0 > 2 NA NA 1 > {code} > Additionally, we should be able to keep both key columns with an option (cf > https://github.com/apache/arrow/blob/9719eae66dcf38c966ae769215d27020a6dd5550/r/R/dplyr-join.R#L32) > -- This message was sent by Atlassian Jira (v8.20.1#820001)