Edit report at http://bugs.php.net/bug.php?id=52832&edit=1

 ID:                 52832
 Updated by:         ka...@php.net
 Reported by:        galaxy dot mipt at gmail dot com
 Summary:            unserialize() performance
-Status:             Assigned
+Status:             Closed
 Type:               Feature/Change Request
 Package:            Performance problem
 Operating System:   Linux
 PHP Version:        5.3.3
 Assigned To:        kalle
 Block user comment: N



Previous Comments:
------------------------------------------------------------------------
[2010-09-18 18:09:45] ka...@php.net

Implemented in trunk, thanks for your work.

------------------------------------------------------------------------
[2010-09-15 04:46:59] ka...@php.net

Hi we cannot merge this into 5.3, as it changes a structure
(php_unserialize_data) thats exported to extensions in a type, breaking
the ABI. But without a doubt it should go in trunk atleast.

------------------------------------------------------------------------
[2010-09-14 18:36:53] galaxy dot mipt at gmail dot com

Added a patch against latest SVN version, did things in a way that
required least code modification.



Here goes the test script:



<?php

ini_set('memory_limit', '512M');



$sizes = array(100000, 200000, 500000, 1000000);



foreach($sizes as $N) {



    $data = array();

    for($i=0; $i < $N; $i++) $data[] = mt_rand();



    $timeSerialize = 0;

    $timeUnserialize = 0;



    for($run=0; $run < 10; $run++) {



        $ts = microtime(1);

        $ser = serialize($data);

        $timeSerialize +=  microtime(1) - $ts;



        $ts =  microtime(1);

        $unser = unserialize($ser);

        $timeUnserialize +=  microtime(1) - $ts;



        if (count($data) != count($unser)) print "Error: array sizes
mismatch\n";

        for($i=0; $i < $N; $i++)

            if (!isset($unser[$i]) || $data[$i] != $unser[$i])

                print "Error: array elements mismatch\n";



        unset($ser);

        unset($unser);

    }



    print "Size: $N\t\tSerialize: " . (floor(1000*$timeSerialize)) .
"ms\t\tUnserialize: " . (floor(1000*$timeUnserialize)) . "ms\n\n";



}

?>





It's a bit memory consuming, so array sizes might need to be reduced
depending on available hardware.



My test results:



Original PHP:

Size: 100000            Serialize: 483ms                Unserialize:
470ms



Size: 200000            Serialize: 1047ms               Unserialize:
1308ms



Size: 500000            Serialize: 2638ms               Unserialize:
14360ms



Size: 1000000           Serialize: 6319ms               Unserialize:
72744ms



Patched PHP:

Size: 100000            Serialize: 500ms                Unserialize:
357ms



Size: 200000            Serialize: 870ms                Unserialize:
703ms



Size: 500000            Serialize: 2212ms               Unserialize:
1315ms



Size: 1000000           Serialize: 4898ms               Unserialize:
2823ms

------------------------------------------------------------------------
[2010-09-14 02:58:37] cataphr...@php.net

> In my tests doing so reduced the unserialize time from 7 secs to ~0.3
sec on 1000000-size array and size dependency apparently changed to
something more like O(n*log(n))



Could you submit a patch with that modification and a test script that
exemplifies the speedup?

------------------------------------------------------------------------
[2010-09-14 02:46:32] galaxy dot mipt at gmail dot com

Description:
------------
Performance of built-in unserializer degrades at unexpectedly high rate
with the increase of unserialized data size (rather, with number of
serialized items). Say, unserializing a plain array of ~1000000 integers
might take somewhat 10 secs on average P4 machine, and the worst part is
that the time raises quadratically (O(n^2)) with the array size, i.e.
~2000000-ish array would take 40 secs or so.



The main performance killer is var_hash linked list where every
extracted variable is pushed. It is looked up sequentally from the very
beginning up to, in fact, the very end during every push operation
(var_push() in ext/standard/var_unserializer.c). It appears that looking
from the end (or just storing last used element elsewhere) would save a
lot of cycles.



In my tests doing so reduced the unserialize time from 7 secs to ~0.3
sec on 1000000-size array and size dependency apparently changed to
something more like O(n*log(n))



------------------------------------------------------------------------



-- 
Edit this bug report at http://bugs.php.net/bug.php?id=52832&edit=1

Reply via email to