According to Stefan Seiz:
> On 17.9.2002 20:21 Uhr, Geoff Hutchison <[EMAIL PROTECTED]> wrote:
> > 2) Where do you determine that ref->DocTime() is returning 0?
> > I ask, in part because htdump is going to access this as well:
> >           fprintf(fl, "\tm:%d", (int) ref->DocTime());
> I added a traceprint to Retreiver.cc inside the Retriever::parse_url(URLRef
> &urlRef) routine like so:
> 
> --- snip ---
> if (ref)
>     {
>         //
>         // We already have an entry for this document in our database.
>         // This means we can get the document ID and last modification
>         // time from there.
>         //
>         current_id = ref->DocID();
>         date = ref->DocTime();
>         if (debug > 2)
>           {
>             cout << "\nDOC MATCHED DB!!! \n" << endl;
>             cout << "DocTime Date is: " << date << endl;
>           }
> --- snap ---

I think you might need a cast up there, i.e.:

            cout << "DocTime Date is: " << (int) date << endl;

I don't know if there's a (ostream) << (time_t) operator defined, and it
might not be automatically casting date to (int) on its own.  See if that
makes a difference.

> > 3) Are you sure the server is returning a Last-Modified header for files?
> Yes, I snooped the wire ;-)
> 
> > 4) Does the server properly handle the If-Modified-Since header?
> > (To see that this header is sent, check in Document.cc line 525 or so for
> > the output sent by htdig.)
> It's apache 1.3.26, so I guess it should. But I think htdig only sends the
> if-Modified since header if it finds a date for an url in the current
> database and as I assume that doesn't happen, so the If-Modified-Since
> header never makes it's way out.
> 
> Here's an example url from my htdump file to prove a date is in there:
> 
> 0       u:http://www.CENSORED.com/YADDA.html    t:CENSORED        a:0
> m:873819058     s:280   H:  CENSORED         h:      l:1031854616
> L:0     b:1     c:0     g:0     e:      n:      S:      d:      A:

Well, that "m:" value is definitely non-zero, and definitely not as large
as the current time, so it does seem to be getting, parsing, and storing
Last-Modified header dates.  But in your reply to my message on [htdig],
you said...

According to Stefan Seiz:
> On 17.9.2002 20:23 Uhr, Gilles Detillieux <[EMAIL PROTECTED]> wrote:
> > If that's the part you suspect is failing, then you should be able
> > to confirm that by running htdig -vvv.  Look for the messages where
> > it outputs the Last-Modified header, and then says something like
> > "Converted ... to ...", which shows the original and regenerated date
> > string after parsing.  If the second one is wrong, then you are right in
> > that the problem is somewhere in the parsing.  In that case, try adding
> > trace prints in the parsedate() function in htdig/Document.cc (minimal
> > programming skills required, just look at how other debug output is done).
> 
> I already tested this but unfortunately I don't get any Dates output when
> running with -vvv

That doesn't add up.  If you are indeed running htdig version 3.1.6
with -vvv, then it MUST be showing the Last-Modified headers if your
server is returning them.  Are you sure you're running a vanilla 3.1.6
installation, and not some severely modified variant of this, or another
version altogether?  Can you show us a complete excerpt of htdig -vvv
output for one file, from one "Retrieval command for ..." message to
the next?

If you're not seeing those either, is it possible you're retrieving
via local_urls?  In this case, there's not date parsing involved, as
htdig gets the modtime as a time_t already from the local filesystem.
But you did say you snooped the wire, so I'd guess this isn't the case.

> > If the second date string is fine, it could be a problem related to
> > refetching this info from the database, or some memory leak somewhere.
> > I thought you mentioned that an htdump showed correct, non-zero modtimes.
> > Such a problem would be harder to track down.
> 
> Yes, datestamps (seconds since epoch) are in the dumped file.
> 
> I'll add debug prints (already did some and always got a date of 0) to the
> files you mentioned and report back.
> 
> Could you tell me which subroutine is responsible for parsing the timestamp
> from the local database (I guess reading and parsing that one is the
> problem)?

It's already parsed by the time it gets into the database.  It starts out
as a date string from the server, in RFC850 or RFC1123 format, and the
parsedate() function in htdig/Document.cc converts that to a time_t, i.e.
a 32-bit integer representing seconds since Jan 1, 1970, 00:00:00 GMT.
It goes through some encoding and decoding as it gets stored in the
database (see DocumentRef::Serialize() and DocumentRef::Deserialize in
htcommon/DocumentRef.cc), but then it goes through those same routines
when you get the number via htdump.  It doesn't get converted back into
a date string until htsearch processes it, using strftime().

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This SF.NET email is sponsored by: AMD - Your access to the experts
on Hammer Technology! Open Source & Linux Developers, register now
for the AMD Developer Symposium. Code: EX8664
http://www.developwithamd.com/developerlab
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to