[ 
https://issues.apache.org/jira/browse/ARROW-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17439427#comment-17439427
 ] 

Michal Nowakiewicz commented on ARROW-14519:
--------------------------------------------

We cannot easily support more types in hash join right now. That is because we 
transform and encode all the input values, key and non-key (row_encoder.h), so 
it would need another specialization for each additional type.

But we can return an error (from HashJoinSchema::ValidateSchemas where we check 
data types from input schemas and keys) instead of asserting.

> [C++] joins segfault when data contains list column
> ---------------------------------------------------
>
>                 Key: ARROW-14519
>                 URL: https://issues.apache.org/jira/browse/ARROW-14519
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Nicola Crane
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When I run the R code below, it results in a segfault if one of the tables 
> contains a list column.
> {code:r}
> library(arrow)
> library(dplyr)
> basic_tbl <- arrow_table(
>   tibble::tibble(
>     x = 1:3,
>     y = c("a", "b", "c")
>   )
> )
> basic_tbl2 <- arrow_table(
>   tibble::tibble(
>     x = 1:3,
>     z = c(T, F, T)
>   )
> )
> list_tbl <- arrow_table(
>   tibble::tibble(
>     z = list(c("first", "list", "col", "row"), c("second row ", "here")),
>     x = 1:2
>   )
> )
> # works
> left_join(basic_tbl, basic_tbl2) %>%
>   collect()
> # segfaults
> left_join(basic_tbl, list_tbl) %>%
>   collect()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to