[ 
https://issues.apache.org/jira/browse/ARROW-18344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634877#comment-17634877
 ] 

Antoine Pitrou edited comment on ARROW-18344 at 11/16/22 3:11 PM:
------------------------------------------------------------------

We don't actually sort data in Arrow, we produce indices that would sort the 
(untouched, unsorted) data.

Here we should follow the same approach, which means it can't be part of 
Concatenate.

We probably want something like a "merge_indices" compute function, similar to 
"sort_indices". The building blocks required for implementation are already 
there, since that's how "sort_indices" is implemented for chunked inputs.

One limitation is that this requires physical chunking to be aligned with 
logical sortedness? Unless we optionally allow the user to pass a vector of the 
boundaries between (logical) sorted chunks.




was (Author: pitrou):
We don't actually sort data in Arrow, we produce indices that would sort the 
(untouched, unsorted) data.

Here we should follow the same approach, which means it can't be part of 
Concatenate.

We probably want something like a "merge_indices" compute function, similar to 
"sort_indices". The building blocks required for implementation are already 
there, since that's how "sort_indices" is implemented for chunked inputs.

One limitation is that this requires physical chunking to be aligned with 
logical sortedness?


> [C++] Use input pre-sortedness to create sorted table with ConcatenateTables
> ----------------------------------------------------------------------------
>
>                 Key: ARROW-18344
>                 URL: https://issues.apache.org/jira/browse/ARROW-18344
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Rok Mihevc
>            Priority: Major
>              Labels: kernel
>
> In case of concatenating large sorted tables (e.g. sorted timeseries data) 
> the resulting table is no longer sorted. However the input sortedness can be 
> used to significantly speed up post concatenation sorting. A potential API 
> could be to add ConcatenateTablesOptions.inputs_sorted and implement the 
> logic in ConcatenateTables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to