[ https://issues.apache.org/jira/browse/ARROW-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537492#comment-17537492 ]
Antoine Pitrou commented on ARROW-14314: ---------------------------------------- [~ArianaVillegas] Nulls in a dictionary array can be represented in two different ways: * nulls in the dictionary values, e.g.: {code} values: ['a', null, 'b', 'c'] indices: [0, 1, 1, 0, 2, 3] {code} * nulls in the dictionary indices, e.g.: {code} values: ['a', 'b', 'c'] indices: [0, null, null, 0, 1, 2] {code} Also, it can be a mixture of both, such as: {code} values: ['a', null, 'b', 'c'] indices: [0, 1, null, 0, 2, 3] {code} > [C++] Sorting dictionary array not implemented > ---------------------------------------------- > > Key: ARROW-14314 > URL: https://issues.apache.org/jira/browse/ARROW-14314 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Neal Richardson > Priority: Major > Labels: kernel > Fix For: 9.0.0 > > > From R, taking the stock {{mtcars}} dataset and giving it a dictionary type > column: > {code} > mtcars %>% > mutate(cyl = as.factor(cyl)) %>% > Table$create() %>% > arrange(cyl) %>% > collect() > Error: Type error: Sorting not supported for type dictionary<values=string, > indices=int8, ordered=0> > ../src/arrow/compute/kernels/vector_array_sort.cc:427 VisitTypeInline(type, > this) > ../src/arrow/compute/kernels/vector_sort.cc:148 > GetArraySorter(*physical_type_) > ../src/arrow/compute/kernels/vector_sort.cc:1206 sorter.Sort() > ../src/arrow/compute/api_vector.cc:259 CallFunction("sort_indices", {datum}, > &options, ctx) > ../src/arrow/compute/exec/order_by_impl.cc:53 SortIndices(table, options_, > ctx_) > ../src/arrow/compute/exec/sink_node.cc:292 impl_->DoFinish() > ../src/arrow/compute/exec/exec_plan.cc:297 iterator_.Next() > ../src/arrow/record_batch.cc:318 ReadNext(&batch) > ../src/arrow/record_batch.cc:329 ReadAll(&batches) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)