[ https://issues.apache.org/jira/browse/ARROW-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536350#comment-17536350 ]
Ariana Villegas commented on ARROW-14314: ----------------------------------------- [~apitrou] I have a couple of questions about this issue: * Currently, how are nulls handled in a dictionary array? * Can we do something like this? ** Given the following dictionary: {code:java} values: ['a', 'c', 'b'] indices: [0, 1, 2, 2, 0]{code} ** Get sort_indices of values, replace indices and values {code:java} values_sorted_idx: [0, 2, 1] indices: [0, 2, 1, 1, 0] values: ['a', 'b', 'c']{code} ** And finally, sort the indices {code:java} indices: [0, 1, 1, 2, 0] {code} And, if I remember correctly, current sort_indices handles nulls and send them to the end of the array. Please let me know if I'm misunderstanding something. > [C++] Sorting dictionary array not implemented > ---------------------------------------------- > > Key: ARROW-14314 > URL: https://issues.apache.org/jira/browse/ARROW-14314 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Neal Richardson > Priority: Major > Labels: kernel > Fix For: 9.0.0 > > > From R, taking the stock {{mtcars}} dataset and giving it a dictionary type > column: > {code} > mtcars %>% > mutate(cyl = as.factor(cyl)) %>% > Table$create() %>% > arrange(cyl) %>% > collect() > Error: Type error: Sorting not supported for type dictionary<values=string, > indices=int8, ordered=0> > ../src/arrow/compute/kernels/vector_array_sort.cc:427 VisitTypeInline(type, > this) > ../src/arrow/compute/kernels/vector_sort.cc:148 > GetArraySorter(*physical_type_) > ../src/arrow/compute/kernels/vector_sort.cc:1206 sorter.Sort() > ../src/arrow/compute/api_vector.cc:259 CallFunction("sort_indices", {datum}, > &options, ctx) > ../src/arrow/compute/exec/order_by_impl.cc:53 SortIndices(table, options_, > ctx_) > ../src/arrow/compute/exec/sink_node.cc:292 impl_->DoFinish() > ../src/arrow/compute/exec/exec_plan.cc:297 iterator_.Next() > ../src/arrow/record_batch.cc:318 ReadNext(&batch) > ../src/arrow/record_batch.cc:329 ReadAll(&batches) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)