I need to order an native array. The comparison function needs some
context to do the comparison.
For test purposes, I defined _GNU_SOURCE and used qsort_r. But since
this is obviously not acceptable I looked for alternatives.
PHP includes at least two sorting functions:
zend_qsort (in Zend/zend_qsort.c)
php_mergesort (in main/mergesort.c), which is not used anywhere,
including PECL:
http://lxr.php.net/search?q=&defs=&refs=php_mergesort&path=&hist=&project=PECL&project=PHP_TRUNK
Both take comparison functions with the type
int (*)(void *, void * TSRMLS_DC)
So I *could* pass the context through a global variable, but I'd rather
not do it. I think it'd be good to have a zend_qsort_r function with a
different comparison function type:
int (*)(void *, void * TSRMLS_DC, void *)
With this we could implement zend_qsort in terms of zend_qsort_r very
simply, though with technically undefined behavior:
https://github.com/cataphract/php-src/commit/c6aa2faa83dc8787d7cefcb27faa7f07f4aae6f0
We could export zend_qsort_r starting in 5.5; I need it ext/standard so
it need not be exported, which would be problematic on a stable branch.
It's undefined behavior because the comparison function is being called
with one extra pointer argument. This is the same technique glibc uses
though, so it should be safe. I could not find any measurable
performance penalty.
***
A different topic: the current zend_qsort function can be further
optimized. I got these results (test script below, best of 5 runs) with
the current implementation:
random: 0.6453
sorted:0.6091
reverse sorted: 0.6203
And bionic's quick sort implementation:
random: 0.5361
sorted:0.5769
reverse sorted: 0.5920
I also tested with musl's smooth sort implementation:
random: 1.3554
sorted:0.2584
reverse sorted: 0.6439
Bionic's implementation also has the advantage that it appears to be
stable:
<?php
$a = [2,3,3,4,3,3,3];
uasort($a, function($a, $b) {return $a-$b;});
print_r(implode(' ', array_keys(array_filter($a, function ($e) {return
$e === 3;}))));
prints "4 5 2 1 6" with the current implementation, "1 2 4 5 6" with
bionic's.
What do you think about replacing the implementation for 5.5?
***
TEST SCRIPT:
<?php
$a = [];
function sep(&$a) {}
for ($i = 0; $i < 500000; $i++) {
$a[] = rand(0,20000);
}
$t1 = $a; sep($t1);
$start = microtime( true );
sort($t1);
echo "random: " . number_format( microtime( true ) - $start, 4 ) .
"\n";
unset($t1);
$t2 = $a; sort($t2);
$start = microtime( true );
sort($t2);
echo "sorted:" . number_format( microtime( true ) - $start, 4 ) . "\n";
unset($t2);
$t3 = $a; rsort($t3);
$start = microtime( true );
sort($t3);
echo "reverse sorted: " . number_format( microtime( true ) - $start, 4
) . "\n";
--
Gustavo Lopes
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php