[ 
https://issues.apache.org/jira/browse/ARROW-16695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544517#comment-17544517
 ] 

Jonathan Keane commented on ARROW-16695:
----------------------------------------

Thanks for the reprex! 

cc [~westonpace]

> [R][C++] Extension types are not supported in joins
> ---------------------------------------------------
>
>                 Key: ARROW-16695
>                 URL: https://issues.apache.org/jira/browse/ARROW-16695
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, R
>            Reporter: Dewey Dunnington
>            Priority: Major
>
> It looks like extension types are not supported in joins (even if the 
> underlying type is supproted)! Reported by [~jonkeane] while making a demo 
> for Arrow + Query engine + geoarrow (R package), which uses extension types 
> liberally:
> {code:R}
> library(arrow, warn.conflicts = FALSE)
> library(dplyr, warn.conflicts = FALSE)
> rb_non_ext <- record_batch(
>   a = 1:5, 
>   b = letters[1:5]
> )
> rb_ext_storage <- record_batch(
>   b = letters[1:5],
>   c = Array$create(list(as.raw(1:5)), type = binary())
> )
> rb_ext <- record_batch(
>   b = letters[1:5],
>   c = vctrs_extension_array(rb_ext_storage$c$as_vector())
> )
> rb_non_ext %>% 
>   left_join(rb_ext_storage) %>% 
>   collect()
> #> # A tibble: 5 × 3
> #>       a b                      c
> #>   <int> <chr>         <arrw_bnr>
> #> 1     1 a     01, 02, 03, 04, 05
> #> 2     2 b     01, 02, 03, 04, 05
> #> 3     3 c     01, 02, 03, 04, 05
> #> 4     4 d     01, 02, 03, 04, 05
> #> 5     5 e     01, 02, 03, 04, 05
> rb_non_ext %>% 
>   left_join(rb_ext) %>% 
>   collect()
> #> Error in `collect()`:
> #> ! Invalid: Data type <arrow_binary[0]> is not supported in join non-key 
> field
> #> 
> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/hash_join_node.cc:121
>   ValidateSchemas(join_type, left_schema, left_keys, left_output, 
> right_schema, right_keys, right_output, left_field_name_suffix, 
> right_field_name_suffix)
> #> 
> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/hash_join_node.cc:499
>   schema_mgr->Init( join_options.join_type, left_schema, 
> join_options.left_keys, join_options.left_output, right_schema, 
> join_options.right_keys, join_options.right_output, join_options.filter, 
> join_options.output_suffix_for_left, join_options.output_suffix_for_right)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to