Petr Pajas wrote:
> Depends on what you call slow...

About 10 seconds to process a document with 72318 nodes, or 40500 nodes
after you skip the non-element, non-attribute nodes that don't require a
schema check.


> ...and it does not make much sense using XPath lookup for this.

What would you suggest then?


> If your XPath expressions involve sorting of largish node sets (.//
> may trigger sorting)....

Looks like I am using .// in a couple spots, but for relatively small
fragments of the schema, not the whole document. Here are the
expressions I am using:

/xs:schema/xs:eleme...@name='$name']
./xs:complexType | ./xs:simpleType
/xs:schema/xs:complexty...@name='$type_name'] |
  /xs:schema/xs:simplety...@name='$type_name']
../xs:complexType/xs:simpleContent/xs:extension | ../xs:simpleType
/xs:schema/xs:simplety...@name='$base_name']
../xs:simpleType/xs:restriction/xs:enumeration/@value
.//xs:attribu...@name='$att_name']
.//xs:restriction/xs:enumeration/@value


> then calling $doc->indexElements() once for the
> WSD schema file could help...

Makes sense, but no appreciable difference when looping over my test
document 10 times (2nd one w/indexElements()):

timethis 10: 110 wallclock secs (107.84 usr +  0.25 sys = 108.09 CPU) @
 0.09/s (n=10)

timethis 10: 109 wallclock secs (108.11 usr +  0.17 sys = 108.28 CPU) @
 0.09/s (n=10)


> Traversing the tree in Perl and using $node->attributes may or may not
> be faster (note that the Perl-XS-Perl transitions are surprisingly
> expensive even if the function you call in C is a noop, so introducing
> more such calls may slow your program down as well).

If the schema is traversed only once, and cached in a Perl data
structure, then it should result in fewer overall XS calls. (The schema
is small - about 350 nodes - relative to the document.)


> I'd suggest: if the above don't help (and even if they do), look at
> the cases where you use the XPath and for stuff like the one you
> mentioned consider scanning once through all xs:attribute using @name
> and $type_def as keys.

Yeah, I think some sort of native Perl caching is going to be the way to go.

Thanks for the reply. (I forwarded a copy to the list.)

 -Tom

-- 
Tom Metro
Venture Logic, Newton, MA, USA
"Enterprise solutions through open source."
Professional Profile: http://tmetro.venturelogic.com/

_______________________________________________
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to