[
https://issues.apache.org/jira/browse/ARROW-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17439427#comment-17439427
]
Michal Nowakiewicz commented on ARROW-14519:
--------------------------------------------
We cannot easily support more types in hash join right now. That is because we
transform and encode all the input values, key and non-key (row_encoder.h), so
it would need another specialization for each additional type.
But we can return an error (from HashJoinSchema::ValidateSchemas where we check
data types from input schemas and keys) instead of asserting.
> [C++] joins segfault when data contains list column
> ---------------------------------------------------
>
> Key: ARROW-14519
> URL: https://issues.apache.org/jira/browse/ARROW-14519
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Reporter: Nicola Crane
> Priority: Major
> Labels: pull-request-available
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> When I run the R code below, it results in a segfault if one of the tables
> contains a list column.
> {code:r}
> library(arrow)
> library(dplyr)
> basic_tbl <- arrow_table(
> tibble::tibble(
> x = 1:3,
> y = c("a", "b", "c")
> )
> )
> basic_tbl2 <- arrow_table(
> tibble::tibble(
> x = 1:3,
> z = c(T, F, T)
> )
> )
> list_tbl <- arrow_table(
> tibble::tibble(
> z = list(c("first", "list", "col", "row"), c("second row ", "here")),
> x = 1:2
> )
> )
> # works
> left_join(basic_tbl, basic_tbl2) %>%
> collect()
> # segfaults
> left_join(basic_tbl, list_tbl) %>%
> collect()
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)