XML Digital Signature requires a rigorous solution to the
canonicalization problem in order to make hashing work.  (See
http://www.w3.org/TR/2008/REC-xmldsig-core-20080610/ and
http://www.w3.org/TR/2001/REC-xml-c14n-20010315.)  One implementation is
Apache Santuario (http://santuario.apache.org/cindex.html).  It might be
useful.

If you decide to do your own thing, it's worth reviewing the DSig spec
to make sure you handle all the cases.

You'll need to do some sort of serialization in order to do a hash.
"Write it out" sounds like you mean to write to disk, which is not
necessary.

-----Original Message-----
From: Ben Griffin [mailto:[email protected]] 
Sent: Friday, May 06, 2011 9:03 AM
To: [email protected]
Subject: A xercesc api access for a digest ?

Within any of the the DOM/etc frameworks that Xercesc implements, is
there a digest of a DOMDocument available, or will I have to write the
document out and then digest it myself?
Primarily, I am looking for a means of being able to identify if a
particular DOMDocument is the same as another as a part of a
rapid-access hashmap - so I need something that is fast.
Typically, there will be not more than a few hundred hashmap insertions,
of which 80% will be insertion clashes (duplicate documents), but there
will be hundreds of thousands of finds.

So, my current implementation involves digesting each hashmap candidate,
which entails having to write it out.  (This is necessary so as to
ensure that the encoding is consistent - the sources use inconsistent
encodings, and they cannot be preprocessed, as some of them are
availalble via eg URLs )


Reply via email to