Hi,

Multi-column sort_indices on record batch has been implemented:
  https://github.com/apache/arrow/pull/8612

You'll be able to use it with Apache Arrow 3.0.0.


Thanks,
--
kou

In <CALQ9KxD-t7YS5U7x_5=k6nm1owhu8dyotro-ipulxom4qae...@mail.gmail.com>
  "Sort int tuples across Arrow arrays in C++" on Thu, 3 Sep 2020 14:26:09 
+0200,
  Rares Vernica <rvern...@gmail.com> wrote:

> Hello,
> 
> I have a set of integer tuples that need to be collected and sorted at a
> coordinator. Here is an example with tuples of length 2:
> 
> [(1, 10),
>  (1, 15),
>  (2, 10),
>  (2, 15)]
> 
> I am considering storing each column in an Arrow array, e.g., [1, 1, 2, 2]
> and [10, 15, 10, 15], and have the Arrow arrays grouped in a Record Batch.
> Then I would serialize, transfer, and deserialize each record batch. The
> coordinator would collect all the record batches and concatenate them.
> Finally, the coordinator needs to sort the tuples by value in the
> sequential order of the columns, e.g., (1, 10), (1, 15), (2, 10).
> 
> Could I accomplish the sort using the Arrow API? I looked at sort_indices
> but it does not work on record batches. With a set of sort indices for each
> array, sorting the tuples does not seem to be straightforward, right?
> 
> Thanks!
> Rares

Reply via email to