[ https://issues.apache.org/jira/browse/ARROW-16695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544517#comment-17544517 ]
Jonathan Keane commented on ARROW-16695: ---------------------------------------- Thanks for the reprex! cc [~westonpace] > [R][C++] Extension types are not supported in joins > --------------------------------------------------- > > Key: ARROW-16695 > URL: https://issues.apache.org/jira/browse/ARROW-16695 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, R > Reporter: Dewey Dunnington > Priority: Major > > It looks like extension types are not supported in joins (even if the > underlying type is supproted)! Reported by [~jonkeane] while making a demo > for Arrow + Query engine + geoarrow (R package), which uses extension types > liberally: > {code:R} > library(arrow, warn.conflicts = FALSE) > library(dplyr, warn.conflicts = FALSE) > rb_non_ext <- record_batch( > a = 1:5, > b = letters[1:5] > ) > rb_ext_storage <- record_batch( > b = letters[1:5], > c = Array$create(list(as.raw(1:5)), type = binary()) > ) > rb_ext <- record_batch( > b = letters[1:5], > c = vctrs_extension_array(rb_ext_storage$c$as_vector()) > ) > rb_non_ext %>% > left_join(rb_ext_storage) %>% > collect() > #> # A tibble: 5 × 3 > #> a b c > #> <int> <chr> <arrw_bnr> > #> 1 1 a 01, 02, 03, 04, 05 > #> 2 2 b 01, 02, 03, 04, 05 > #> 3 3 c 01, 02, 03, 04, 05 > #> 4 4 d 01, 02, 03, 04, 05 > #> 5 5 e 01, 02, 03, 04, 05 > rb_non_ext %>% > left_join(rb_ext) %>% > collect() > #> Error in `collect()`: > #> ! Invalid: Data type <arrow_binary[0]> is not supported in join non-key > field > #> > /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/hash_join_node.cc:121 > ValidateSchemas(join_type, left_schema, left_keys, left_output, > right_schema, right_keys, right_output, left_field_name_suffix, > right_field_name_suffix) > #> > /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/hash_join_node.cc:499 > schema_mgr->Init( join_options.join_type, left_schema, > join_options.left_keys, join_options.left_output, right_schema, > join_options.right_keys, join_options.right_output, join_options.filter, > join_options.output_suffix_for_left, join_options.output_suffix_for_right) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)