On Sun, Apr 07, 2024 at 10:54:46AM +0800, Kuan-Wei Chiu wrote:
> On Sat, Apr 06, 2024 at 10:05:44PM -0400, Kent Overstreet wrote:
> > On Sun, Apr 07, 2024 at 09:33:49AM +0800, Kuan-Wei Chiu wrote:
> > > This optimization reduces the average number of comparisons required
> > > from 2*n*log2(n) - 3*n + o(n) to n*log2(n) + 0.37*n + o(n). When n is
> > > sufficiently large, it results in approximately 50% fewer comparisons.
> > > 
> > > Currently, eytzinger0_sort employs the textbook version of heapsort,
> > > where during the heapify process, each level requires two comparisons
> > > to determine the maximum among three elements. In contrast, the
> > > bottom-up heapsort, during heapify, only compares two children at each
> > > level until reaching a leaf node. Then, it backtracks from the leaf
> > > node to find the correct position. Since heapify typically continues
> > > until very close to the leaf node, the standard heapify requires about
> > > 2*log2(n) comparisons, while the bottom-up variant only needs log2(n)
> > > comparisons.
> > > 
> > > The experimental data presented below is based on an array generated
> > > by get_random_u32().
> > > 
> > > |   N   | comparisons(old) | comparisons(new) | time(old) | time(new) |
> > > |-------|------------------|------------------|-----------|-----------|
> > > | 10000 |     235381       |     136615       |  25545 us |  20366 us |
> > > | 20000 |     510694       |     293425       |  31336 us |  18312 us |
> > > | 30000 |     800384       |     457412       |  35042 us |  27386 us |
> > > | 40000 |    1101617       |     626831       |  48779 us |  38253 us |
> > > | 50000 |    1409762       |     799637       |  62238 us |  46950 us |
> > > | 60000 |    1721191       |     974521       |  75588 us |  58367 us |
> > > | 70000 |    2038536       |    1152171       |  90823 us |  68778 us |
> > > | 80000 |    2362958       |    1333472       | 104165 us |  78625 us |
> > > | 90000 |    2690900       |    1516065       | 116111 us |  89573 us |
> > > | 100000|    3019413       |    1699879       | 133638 us | 100998 us |
> > > 
> > > Refs:
> > >   BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average,
> > >   QUICKSORT (if n is not very small)
> > >   Ingo Wegener
> > >   Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993
> > >   https://doi.org/10.1016/0304-3975(93)90364-Y
> > > 
> > > Signed-off-by: Kuan-Wei Chiu <[email protected]>
> > > ---
> > > 
> > > This is the same as the patch I submitted previously [1]. Since we
> > > decided not to move eytzinger.h to generic library code, I resubmitted
> > > this patch.
> > > 
> > > This patch has undergone unit testing and micro benchmarking using the
> > > following code [2].
> > > 
> > > [1]: 
> > > https://lore.kernel.org/lkml/[email protected]/
> > > [2]:
> > > static u64 cmp_count = 0;
> > > 
> > > static int mycmp(const void *a, const void *b)
> > > {
> > >   u32 _a = *(u32 *)a;
> > >   u32 _b = *(u32 *)b;
> > > 
> > >   cmp_count++;
> > >   if (_a < _b)
> > >           return -1;
> > >   else if (_a > _b)
> > >           return 1;
> > >   else
> > >           return 0;
> > > }
> > > 
> > > static int test(void)
> > > {
> > >   size_t N, i, L, R;
> > >   ktime_t start, end;
> > >   s64 delta;
> > >   u32 *arr;
> > > 
> > >   for (N = 10000; N <= 100000; N += 10000) {
> > >           arr = kmalloc_array(N, sizeof(u32), GFP_KERNEL);
> > >           cmp_count = 0;
> > > 
> > >           for (i = 0; i < N; i++)
> > >                   arr[i] = get_random_u32();
> > >           
> > >           start = ktime_get();
> > >           eytzinger0_sort(arr, N, sizeof(u32), mycmp, NULL);
> > >           end = ktime_get();
> > > 
> > >           delta = ktime_us_delta(end, start);
> > >           printk(KERN_INFO "time: %lld\n", delta);
> > >           printk(KERN_INFO "comparisons: %lld\n", cmp_count);
> > > 
> > >           for (int i = 0; i < N; i++) {
> > >                   L = 2 * i + 1;
> > >                   R = 2 * i + 2;
> > >                   if (L < N && arr[i] < arr[L])
> > >                           goto err;
> > >                   if (R < N && arr[i] > arr[R])
> > >                           goto err;
> > >           }
> > 
> > Use eytzinger0_for_each() to just compare every element against the
> > previous element, and check in the test code.
> >
> 
> I rewrote the testing part as follows:
> 
>       u32 prev = 0;
>       eytzinger0_for_each(i, N) {
>               if (prev > arr[i])
>                       goto err;
>               prev = arr[i];
>       }
> 
> And the test still passed.

Great, can you include that? I'd be fine with it #if 0'd out, I just
want it there.

Reply via email to