[
https://issues.apache.org/jira/browse/ARROW-15731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ian Cook updated ARROW-15731:
-----------------------------
Description:
Currently Arrow joins with data that contain a list column errors, even when
the list column is not a join key. Here's an example using the R bindings:
{code}
library(arrow)
library(dplyr)
jedi <- data.frame(name = c("C-3PO", "Luke Skywalker"),
jedi = c(FALSE, TRUE))
arrow_table(starwars) %>%
left_join(jedi) %>%
collect()
#> Error in `handle_csv_read_error()`:
#> ! Invalid: Data type list<item: string> is not supported in join non-key
field
{code}
The ability to join would be a useful enhancement for workflows with tabular
data where list columns can be common, and for geospatial workflows where
geometry columns are stored as `list` or `fixed_size_list` (thanks
[~paleolimbot] for mentioning that use case).
Related discussion here: ARROW-14519
was:
Currently Arrow joins with data that contain a list column errors, even when
the list column is not a join key:
{code}
library(arrow)
#>
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#>
#> timestamp
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
jedi <- data.frame(name = c("C-3PO", "Luke Skywalker"),
jedi = c(FALSE, TRUE))
arrow_table(starwars) %>%
left_join(jedi) %>%
collect()
#> Error in `handle_csv_read_error()`:
#> ! Invalid: Data type list<item: string> is not supported in join non-key
field
{code}
The ability to join would be a useful enhancement for workflows with tabular
data where list columns can be common, and for geospatial workflows where
geometry columns are stored as `list` or `fixed_size_list` (thanks
[~paleolimbot] for mentioning that use case).
Related discussion here: ARROW-14519
> [C++] Enable joins when data contains a list column
> ---------------------------------------------------
>
> Key: ARROW-15731
> URL: https://issues.apache.org/jira/browse/ARROW-15731
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Stephanie Hazlitt
> Priority: Major
> Labels: query-engine
>
> Currently Arrow joins with data that contain a list column errors, even when
> the list column is not a join key. Here's an example using the R bindings:
> {code}
> library(arrow)
> library(dplyr)
> jedi <- data.frame(name = c("C-3PO", "Luke Skywalker"),
> jedi = c(FALSE, TRUE))
> arrow_table(starwars) %>%
> left_join(jedi) %>%
> collect()
> #> Error in `handle_csv_read_error()`:
> #> ! Invalid: Data type list<item: string> is not supported in join non-key
> field
> {code}
> The ability to join would be a useful enhancement for workflows with tabular
> data where list columns can be common, and for geospatial workflows where
> geometry columns are stored as `list` or `fixed_size_list` (thanks
> [~paleolimbot] for mentioning that use case).
> Related discussion here: ARROW-14519
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)