[ 
https://issues.apache.org/jira/browse/ARROW-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17439307#comment-17439307
 ] 

David Li commented on ARROW-14519:
----------------------------------

This is because joining on lists is not supported, but the code path triggers 
an assertion instead of reporting an error. Also, it looks like the join code 
needs to pre-process all columns, so the presence of any unsupported type will 
cause this (as you found). We should at least raise an error instead of 
crashing, but I'm not familiar enough with the join code to know if we can 
handle unsupported types when they're not being used as the key.

> [C++] joins segfault when data contains list column
> ---------------------------------------------------
>
>                 Key: ARROW-14519
>                 URL: https://issues.apache.org/jira/browse/ARROW-14519
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Nicola Crane
>            Priority: Major
>
> When I run the R code below, it results in a segfault if one of the tables 
> contains a list column.
> {code:r}
> library(arrow)
> library(dplyr)
> basic_tbl <- arrow_table(
>   tibble::tibble(
>     x = 1:3,
>     y = c("a", "b", "c")
>   )
> )
> basic_tbl2 <- arrow_table(
>   tibble::tibble(
>     x = 1:3,
>     z = c(T, F, T)
>   )
> )
> list_tbl <- arrow_table(
>   tibble::tibble(
>     z = list(c("first", "list", "col", "row"), c("second row ", "here")),
>     x = 1:2
>   )
> )
> # works
> left_join(basic_tbl, basic_tbl2) %>%
>   collect()
> # segfaults
> left_join(basic_tbl, list_tbl) %>%
>   collect()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to