Edit report at http://bugs.php.net/bug.php?id=52832&edit=1
ID: 52832 Updated by: ka...@php.net Reported by: galaxy dot mipt at gmail dot com Summary: unserialize() performance -Status: Assigned +Status: Closed Type: Feature/Change Request Package: Performance problem Operating System: Linux PHP Version: 5.3.3 Assigned To: kalle Block user comment: N Previous Comments: ------------------------------------------------------------------------ [2010-09-18 18:09:45] ka...@php.net Implemented in trunk, thanks for your work. ------------------------------------------------------------------------ [2010-09-15 04:46:59] ka...@php.net Hi we cannot merge this into 5.3, as it changes a structure (php_unserialize_data) thats exported to extensions in a type, breaking the ABI. But without a doubt it should go in trunk atleast. ------------------------------------------------------------------------ [2010-09-14 18:36:53] galaxy dot mipt at gmail dot com Added a patch against latest SVN version, did things in a way that required least code modification. Here goes the test script: <?php ini_set('memory_limit', '512M'); $sizes = array(100000, 200000, 500000, 1000000); foreach($sizes as $N) { $data = array(); for($i=0; $i < $N; $i++) $data[] = mt_rand(); $timeSerialize = 0; $timeUnserialize = 0; for($run=0; $run < 10; $run++) { $ts = microtime(1); $ser = serialize($data); $timeSerialize += microtime(1) - $ts; $ts = microtime(1); $unser = unserialize($ser); $timeUnserialize += microtime(1) - $ts; if (count($data) != count($unser)) print "Error: array sizes mismatch\n"; for($i=0; $i < $N; $i++) if (!isset($unser[$i]) || $data[$i] != $unser[$i]) print "Error: array elements mismatch\n"; unset($ser); unset($unser); } print "Size: $N\t\tSerialize: " . (floor(1000*$timeSerialize)) . "ms\t\tUnserialize: " . (floor(1000*$timeUnserialize)) . "ms\n\n"; } ?> It's a bit memory consuming, so array sizes might need to be reduced depending on available hardware. My test results: Original PHP: Size: 100000 Serialize: 483ms Unserialize: 470ms Size: 200000 Serialize: 1047ms Unserialize: 1308ms Size: 500000 Serialize: 2638ms Unserialize: 14360ms Size: 1000000 Serialize: 6319ms Unserialize: 72744ms Patched PHP: Size: 100000 Serialize: 500ms Unserialize: 357ms Size: 200000 Serialize: 870ms Unserialize: 703ms Size: 500000 Serialize: 2212ms Unserialize: 1315ms Size: 1000000 Serialize: 4898ms Unserialize: 2823ms ------------------------------------------------------------------------ [2010-09-14 02:58:37] cataphr...@php.net > In my tests doing so reduced the unserialize time from 7 secs to ~0.3 sec on 1000000-size array and size dependency apparently changed to something more like O(n*log(n)) Could you submit a patch with that modification and a test script that exemplifies the speedup? ------------------------------------------------------------------------ [2010-09-14 02:46:32] galaxy dot mipt at gmail dot com Description: ------------ Performance of built-in unserializer degrades at unexpectedly high rate with the increase of unserialized data size (rather, with number of serialized items). Say, unserializing a plain array of ~1000000 integers might take somewhat 10 secs on average P4 machine, and the worst part is that the time raises quadratically (O(n^2)) with the array size, i.e. ~2000000-ish array would take 40 secs or so. The main performance killer is var_hash linked list where every extracted variable is pushed. It is looked up sequentally from the very beginning up to, in fact, the very end during every push operation (var_push() in ext/standard/var_unserializer.c). It appears that looking from the end (or just storing last used element elsewhere) would save a lot of cycles. In my tests doing so reduced the unserialize time from 7 secs to ~0.3 sec on 1000000-size array and size dependency apparently changed to something more like O(n*log(n)) ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/bug.php?id=52832&edit=1