I need to order an native array. The comparison function needs some context to do the comparison.

For test purposes, I defined _GNU_SOURCE and used qsort_r. But since this is obviously not acceptable I looked for alternatives.

PHP includes at least two sorting functions:

zend_qsort (in Zend/zend_qsort.c)
php_mergesort (in main/mergesort.c), which is not used anywhere, including PECL:
http://lxr.php.net/search?q=&defs=&refs=php_mergesort&path=&hist=&project=PECL&project=PHP_TRUNK

Both take comparison functions with the type
int (*)(void *, void * TSRMLS_DC)

So I *could* pass the context through a global variable, but I'd rather not do it. I think it'd be good to have a zend_qsort_r function with a different comparison function type:
int (*)(void *, void * TSRMLS_DC, void *)

With this we could implement zend_qsort in terms of zend_qsort_r very simply, though with technically undefined behavior:
https://github.com/cataphract/php-src/commit/c6aa2faa83dc8787d7cefcb27faa7f07f4aae6f0

We could export zend_qsort_r starting in 5.5; I need it ext/standard so it need not be exported, which would be problematic on a stable branch. It's undefined behavior because the comparison function is being called with one extra pointer argument. This is the same technique glibc uses though, so it should be safe. I could not find any measurable performance penalty.

***

A different topic: the current zend_qsort function can be further optimized. I got these results (test script below, best of 5 runs) with the current implementation:
random: 0.6453
sorted:0.6091
reverse sorted: 0.6203

And bionic's quick sort implementation:
random: 0.5361
sorted:0.5769
reverse sorted: 0.5920

I also tested with musl's smooth sort implementation:
random: 1.3554
sorted:0.2584
reverse sorted: 0.6439

Bionic's implementation also has the advantage that it appears to be stable:
<?php
$a = [2,3,3,4,3,3,3];
uasort($a, function($a, $b) {return $a-$b;});
print_r(implode(' ', array_keys(array_filter($a, function ($e) {return $e === 3;}))));

prints "4 5 2 1 6" with the current implementation, "1 2 4 5 6" with bionic's.

What do you think about replacing the implementation for 5.5?

***

TEST SCRIPT:
<?php

$a = [];
function sep(&$a) {}
for ($i = 0; $i < 500000; $i++) {
    $a[] = rand(0,20000);
}

$t1 = $a; sep($t1);
$start = microtime( true );
sort($t1);
echo "random: " . number_format( microtime( true ) - $start, 4 ) . "\n";
unset($t1);

$t2 = $a; sort($t2);
$start = microtime( true );
sort($t2);
echo "sorted:" . number_format( microtime( true ) - $start, 4 ) . "\n";
unset($t2);

$t3 = $a; rsort($t3);
$start = microtime( true );
sort($t3);
echo "reverse sorted: " . number_format( microtime( true ) - $start, 4 ) . "\n";

--
Gustavo Lopes

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to