Hello,
Although I haven't tried to puzzle through all the details, it appears
to me that there is a serious problem with calculating the md5 sums of
long documents. Namely, the calls that compute the sums look like
md5(bhash, doc.Contents(), doc.Length(), &ddate, debug);
but doc.Length() returns doc.document_length, which I believe reflects
the true length of the document rather than the length of the part of
the document that is in memory. And the whole document is not stored
if its length exceeds _max_document_size. Therefore, the md5 routines
could potentially be trying to read from memory far beyond the length
of the data actually available. Aside from giving false negatives on
md5 comparisons, this could cause core dumps.
Am I wrong?
The obvious fix is to compute the checksum based on the truncated
contents that are actually in memory.
Michael
--
Michael Haggerty
[EMAIL PROTECTED]
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/htdig-dev