[ https://issues.apache.org/jira/browse/ARROW-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537728#comment-17537728 ]
Ariana Villegas edited comment on ARROW-14314 at 5/16/22 6:52 PM: ------------------------------------------------------------------ Ok, I got it. [~apitrou] In that case, I think we can do something like this: * Given the following dictionary: {code:java} values: ['c', 'a', 'b', 'b'] indices: [0, 1, 3, 2, 3, 0] {code} * Get sort_idx from values and transform it to give the same idx to same values {code:java} values_sort_idx = [1, 2, 3, 0] transformed_sort_idx = [3, 0, 1, 1]{code} * Get sort_idx from transformed indices {code:java} transformed_indices = [3, 0, 1, 1, 1, 3] sort_indices = [1, 2, 3, 4, 0, 5]{code} With nulls, it will work similarly: * Given the following dictionary: {code:java} values: ['a', null, 'b', 'c'] indices: [0, 1, null, 0, 2, 3] {code} * Get sort_idx from values and transform it to give the same idx to same values {code:java} values_sort_idx = [0, 2, 3, 1] transformed_sort_idx = [0, 3, 1, 2] {code} * Get sort_idx from transformed indices {code:java} transformed_indices = [0, 3, null, 0, 1, 2] sort_indices = [0, 3, 4, 5, 1, 2]{code} was (Author: JIRAUSER280694): Ok, I got it. In that case, I think we can do something like this: * Given the following dictionary: {code:java} values: ['c', 'a', 'b', 'b'] indices: [0, 1, 3, 2, 3, 0] {code} * Get sort_idx from values and transform it to give the same idx to same values {code:java} values_sort_idx = [1, 2, 3, 0] transformed_sort_idx = [3, 0, 1, 1]{code} * Get sort_idx from transformed indices {code:java} transformed_indices = [3, 0, 1, 1, 1, 3] sort_indices = [1, 2, 3, 4, 0, 5]{code} With nulls, it will work similarly: * Given the following dictionary: {code:java} values: ['a', null, 'b', 'c'] indices: [0, 1, null, 0, 2, 3] {code} * Get sort_idx from values and transform it to give the same idx to same values {code:java} values_sort_idx = [0, 2, 3, 1] transformed_sort_idx = [0, 3, 1, 2] {code} * Get sort_idx from transformed indices {code:java} transformed_indices = [0, 3, null, 0, 1, 2] sort_indices = [0, 3, 4, 5, 1, 2]{code} > [C++] Sorting dictionary array not implemented > ---------------------------------------------- > > Key: ARROW-14314 > URL: https://issues.apache.org/jira/browse/ARROW-14314 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Neal Richardson > Priority: Major > Labels: kernel > Fix For: 9.0.0 > > > From R, taking the stock {{mtcars}} dataset and giving it a dictionary type > column: > {code} > mtcars %>% > mutate(cyl = as.factor(cyl)) %>% > Table$create() %>% > arrange(cyl) %>% > collect() > Error: Type error: Sorting not supported for type dictionary<values=string, > indices=int8, ordered=0> > ../src/arrow/compute/kernels/vector_array_sort.cc:427 VisitTypeInline(type, > this) > ../src/arrow/compute/kernels/vector_sort.cc:148 > GetArraySorter(*physical_type_) > ../src/arrow/compute/kernels/vector_sort.cc:1206 sorter.Sort() > ../src/arrow/compute/api_vector.cc:259 CallFunction("sort_indices", {datum}, > &options, ctx) > ../src/arrow/compute/exec/order_by_impl.cc:53 SortIndices(table, options_, > ctx_) > ../src/arrow/compute/exec/sink_node.cc:292 impl_->DoFinish() > ../src/arrow/compute/exec/exec_plan.cc:297 iterator_.Next() > ../src/arrow/record_batch.cc:318 ReadNext(&batch) > ../src/arrow/record_batch.cc:329 ReadAll(&batches) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)