Edit report at http://bugs.php.net/bug.php?id=53655&edit=1

 ID:                 53655
 Updated by:         rricha...@php.net
 Reported by:        olav dot morken at uninett dot no
 Summary:            Improve speed of DOMNode::C14N() on large XML
                     documents
-Status:             Open
+Status:             Assigned
 Type:               Feature/Change Request
 Package:            DOM XML related
 PHP Version:        5.3.4
-Assigned To:        
+Assigned To:        rrichards
 Block user comment: N
 Private report:     N



Previous Comments:
------------------------------------------------------------------------
[2011-01-05 08:33:02] olav dot morken at uninett dot no

Description:
------------
The C14N() function appears to have a runtime that is O(N^2) (or
possibly worse?) depending on input size, which means that it becomes
very slow as the input grows. For example, an input with around 196000
nodes takes about 290 seconds, while an input with 486000 nodes takes
2200 seconds.



Note that this problem only occurs when canonicalizing a subtree of the
docuemnt. If we canonicalize the whole document, it completes almost
immediately.



The problem is that canonicalization uses an XPath expression to find
the nodeset that should be canonicalized. Evaluation of the XPath
expression takes a lot of time as the input size grows, but the libxml2
xmlC14NDocSaveTo() function also has to do a lookup in the nodeset
returned by the XPath expression for every node it encounters.



I believe a better solution would be to do this like it is done in the
xmlsec library. This library use the xmlC14NExecute()-function instead,
which accepts a callback that determines whether a node should be
included in the result. This should make the speed of canonicalization
linear with the input size.



Test script:
---------------
<?php

$doc = new DOMDocument();

$doc->load('some-large-xml-file.xml');

$start = microtime(TRUE);

$doc->documentElement->C14N(FALSE, FALSE);

echo "Done in " . (microtime(TRUE) - $start) . " seconds.\n";





------------------------------------------------------------------------



-- 
Edit this bug report at http://bugs.php.net/bug.php?id=53655&edit=1

Reply via email to