On Wed, Dec 3, 2025 at 3:22 PM Chao Li <[email protected]> wrote: > I played with this again today and found an optimization that seems to > dramatically improve the performance: > > ``` > +static void > +radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate > *state) > +{ > + RadixPartitionInfo partitions[256] = {0}; > + uint8_t remaining_partitions[256] = {0}; > ``` > > Here partitions and remaining_partitions are just temporary buffers, > allocating memory from stack and initialize them seems slow. By passing them > as function parameters are much faster. See attached diff for my change.
The lesson here is: you can make it as fast as you like if you accidentally blow away the state that we needed for this to work correctly. -- John Naylor Amazon Web Services
