Hello,

I want to remove the avl_tree-related stuff from DOM unit. The reasons are:

1) It is targeted to optimize one particular type of documents, "configs",
   that require all node names within a single parent to be unique (strictly
   speaking, this isn't xml, it is more like ini-files or Windows registry).

   For all other xml documents (which have repeating names) it simply doesn't 
work.

2) It is a severe performance and memory bottleneck. This maybe wasn't too 
noticeable
   in the past, but things had improved over time. Here are some numbers:

 - Speed:  here is the sample calltree of TDOMNode_WithChildren.InsertBefore
  (the main DOM tree building procedure):

  650,963  * 
DOM_TDOMNODE_WITHCHILDREN_$__INSERTBEFORE$TDOMNODE$TDOMNODE$$TDOMNODE
   25,890  >   DOM_TDOMNODE_WITHCHILDREN_$__GETFIRSTCHILD$$TDOMNODE (2589x)
7,671,657  >   DOM_TDOMNODE_WITHCHILDREN_$__ADDTOCHILDNODETREE$TDOMNODE (8471x)
   76,239  >   DOM_TDOMNODE_$__CHANGING (8471x)
        8  >   DOM_TDOMDOCUMENT_$__GETNODETYPE$$LONGINT (1x)
   51,616  >   DOM_TDOMELEMENT_$__GETNODETYPE$$LONGINT (6452x)
   47,056  >   DOM_TDOMTEXT_$__GETNODETYPE$$LONGINT (5882x)
   36,856  >   DOM_TDOMATTR_$__GETNODETYPE$$LONGINT (4607x)

As one may see, the avl_tree consumes more than 90% of total CPU time.

 - Memory (numbers are for i386-linux):

TDOMAttr.InstanceSize = 56
TDOMElement.InstanceSize = 60
TDOMText.InstanceSize = 32

TAVLTree.InstanceSize = 20
TAVLTreeNode.InstanceSize = 24

Since each textnode has an associated TAVLTreeNode, and each Element/Attribute has both TAVLTreeNode and TAVLTree, the overall memory usage grows by about 75%.

3) It is known to have yet unresolved multithreading-related problems (Mantis 
12984).


Given the above, I see no sense in further keeping this feature. However, the 
configs likely
need some sort of optimization anyway. I see the following options:

1) Since all names in DOM document are now stored in a hashtable, it is easy to 
modify
   FindNode() so it compares pointers rather than strings.

2) The new xmlconf unit has the OpenKey/CloseKey methods which improve 
performance by limiting
   the search range to a specific element. Utilizing this feature, however, 
need modifications to
   the user code so it actually calls OpenKey/CloseKey (note that is works 
perfectly without the
   modifications, it's only the matter of performance).

3) It is possible to create even faster xmlconfig class, consisting of 
TXMLDocument + THashTable,
   offering O(1) access to any element. The problem is that it is somewhat 
mutually exclusive
   to the previous solution (OpenKey/CloseKey).

4) Finally, the xpath, being sufficiently mature, can supersede it all and 
provide
   a much more powerful solution.

Well... Thank you for reading this long message :-). I'd be happy to hear 
other's comment on it.


Regards,
Sergei
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to