Hi,

Marjolein noted a bug in the Document code. If you do a search on
htdig.org, you can see it in action. Search for any attribute, say
pdf_parser and look at the results for attrs.html. The document's size is
reported as max_doc_size when the document has been trimmed. In this case,
attrs.html is reported as 100K, when it's 155+K.

I'm not sure this is the best fix, but it seems to work. The document size
is now reported as the size sent by the server (if available) or by stat()
when retrieving locally. In particular, I don't know much about the library
calls -- is st_size a field of all stat types?

-Geoff

Index: Document.cc
===================================================================
RCS file: /opt/htdig/cvs/htdig3/htdig/Document.cc,v
retrieving revision 1.34
diff -r1.34 Document.cc
447a448,450
>
>     if (document_length < contentLength)
>       document_length = contentLength;
598,599c601,602
<     document_length = contents.length();
<     contentLength = document_length;
---
>     document_length = stat_buf.st_size;
>     contentLength = contents.length();


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to