On Tue, 3 Aug 1999, Geoff Hutchison wrote:
> Date: Tue, 3 Aug 1999 22:08:49 -0400 (EDT)
> From: Geoff Hutchison <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: Re: [htdig3-dev] 3.2 goals (was On vacation this weekend)
>
>
> On Tue, 3 Aug 1999, Gilles Detillieux wrote:
>
> > When I e-mailed Mike Grommet about this a couple weeks ago, he said the
> > last patch he posted to the list was complete enough for his needs. I
>
> OK, it seems like there are some patches that need to be dug up and dusted
> off. I'm glad to merge in whatever is sent to me, but the patch queue is
> pretty slim right now. (That means if you haven't seen it so far, send it
> again.)
The forwarded message was sent by Mike Grommet in April.
Joe
--
_/ _/_/_/ _/ ____________ __o
_/ _/ _/ _/ ______________ _-\<,_
_/ _/ _/_/_/ _/ _/ ......(_)/ (_)
_/_/ oe _/ _/. _/_/ ah [EMAIL PROTECTED]
---------- Forwarded message ----------
Date: Wed, 7 Apr 1999 15:04:51 -0500
From: mike grommet <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: RE: [htdig3-dev] either a bug, or my ignorance, both are definately possible
:)
Geoff, I've run into some interesting details here. I'm going to forward a
copy of this to the list as well.
Sorry for the long message, but I wasnt quite sure how else to get the info
across.
I thought I might fill you in with some details...
as the original note below would indicate, I was getting some really funky
date information.
The major problem I am having occurs with the addition of
Retriever::got_time(char *time) in the patch file you sent...
here is the routine you sent me:
Retriever::got_time(char *time)
{
time_t new_time;
struct tm tm;
if (debug > 1)
cout << "\ntime: " << time << endl;
//
// As defined by the Dublin Core, this should be YYYY-MM-DD
// In the future, we'll need to deal with the scheme portion
// in case someone picks a different format.
//
if (mystrptime(time, "%Y-%m-%d", &tm))
{
#if HAVE_TIMEGM
new_time = timegm(&tm);
#else
new_time = mytimegm(&tm);
#endif
current_time = new_time;
}
// If we can't convert it, current_time stays the same and we get
// the default--the date returned by the server...
}
Ok, I hacked in some debug output to go echo out when debug options are
on...
here are the details ( it should be noted that the date in the mega tag is
2001-04-05
I am going through and printing out the individual values of new_time:
time: 2001-04-05 - this is from your original code
hour: 1 - why 1? doesnt really matter tho
min: 0 - checks fine here
sec: 134799364 - WOW! thats a lot of seconds.
month: 3 - this one is right
day: 5 - here too
year: 101 - just fine here
othertime: 1121145364 ---- translates to sometime in July 2005 I think
Ok, well the seconds info just screams out at me so I've hacked the code
abit to initialize new_time... here is my final routine (so far) with my
debug code too:
void
Retriever::got_time(char *time)
{
time_t new_time;
struct tm tm;
// added by me
tm.tm_hour = 0;
tm.tm_min = 0;
tm.tm_sec = 0;
tm.tm_mon = 0;
tm.tm_mday = 1;
tm.tm_year = 0;
if (debug > 1)
cout << "\ntime: " << time << endl;
//
// As defined by the Dublin Core, this should be YYYY-MM-DD
// In the future, we'll need to deal with the scheme portion
// in case someone picks a different format.
//
if (mystrptime(time, "%Y-%m-%d", &tm))
{
if (debug > 1)
{
cout << "\nhour: " << tm.tm_hour << endl;
cout << "\nmin: " << tm.tm_min << endl;
cout << "\nsec: " << tm.tm_sec << endl;
cout << "\nmonth: " << tm.tm_mon << endl;
cout << "\nday: " << tm.tm_mday << endl;
cout << "\nyear: " << tm.tm_year << endl;
}
//#if HAVE_TIMEGM
new_time = timegm(&tm);
//#else
// new_time = mytimegm(&tm);
//#endif
current_time = new_time;
if (debug > 1)
cout << "\nothertime: " << current_time << endl;
}
// If we can't convert it, current_time stays the same and we get
// the default--the date returned by the server...
}
Ok, and now here is the new output:
time: 2001-04-05
hour: 0
min: 0
sec: 0
month: 3
day: 5
year: 101
othertime: 986428800
Note, in my code above, I have disabled the mytimegm. Strangely enough,
timegm and mytimegm do NOT
return the same values, they are exactly 24 hours off from one another.
Which brings me to my second and more minor problem:
when the list of results are displayed, instead of treating the time as UTC,
as the rest of
my code for search ranges does already, it seems that
the code for outputting date information on search results:
if (t)
{
struct tm *tm = localtime(&t);
// strftime(buffer, sizeof(buffer), "%e-%h-%Y", tm);
if (config.Boolean("iso_8601"))
{
strftime(buffer, sizeof(buffer), "%Y-%m-%d %H:%M:%S %Z",
tm);
}
else
{
strftime(buffer, sizeof(buffer), "%D", tm);
}
*str << buffer;
}
is using localtime instead. Just wondering if should be default or not...
its definately a problem with
my particular implementation, since date ranges are searched by using UTC
time, so the date appears
short by one day. If you think I should change my current search routines so
that time zones are taken
into account, I can probably do that, but I'm not sure exactly where to
begin.
What do you think?
-----Original Message-----
From: Mike Grommet [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, April 06, 1999 4:22 PM
To: 'Geoff Hutchison'
Subject: RE: [htdig3-dev] Search by date ranges: some success, a few
more
Geoff, I patched my source, and am getting some really really weird
results...
Ok, for instance,
look at this link:
http://www.weaselweb.com/viewarchivenewssports.php3?db=newsarchive&idnum=1
Ok, if you view the source, you will see the meta tag for the date.
piece of cake. Now, when I run htdig with debug codes, it echoes out the
right date
but its only the result of the meta name="date" contents, and nothing has
been performed
on it yet. the content is of the proper format, as far as I can tell.
Ok, now that you have seen this, go to this address
http://omega.insolwwb.net/htdig
search the news archive (on the left) and for keywords, enter arkansas.
You dont have to bother with a search range.
Ok, it will bring up 1 document. look at the date on the document:
http://www.weaselweb.com/viewarchivenewssports.php3?db=newsarchive&idnum=1
07/12/05, 6965 bytes
2005-07-12?????
where in the world is this coming from?
-----Original Message-----
From: Geoff Hutchison [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, April 06, 1999 2:31 PM
To: mike grommet
Subject: RE: [htdig3-dev] Search by date ranges: some success, a few
more
On Tue, 6 Apr 1999, mike grommet wrote:
> My thoughts are to take a meta tag, named something like "Document-date"
and
> a value
> just like the standard GMT time returned by a web server for a Last
> Modification
There is already a standard for this, specified by the Dublin Core
standard. The tag is named "DATE" and has the ISO-8601 format YYYY-MM-DD.
> Would you happen to have this code handy? It would be useful to me at
least
Here you go... I should probably make this an option with something like
'use_doc_date' when I commit it.
Index: htdig/HTML.cc
===================================================================
RCS file: /opt/htdig/cvs/htdig3/htdig/HTML.cc,v
retrieving revision 1.39
diff -c -3 -p -r1.39 HTML.cc
*** htdig/HTML.cc 1999/03/23 20:09:22 1.39
--- htdig/HTML.cc 1999/04/02 01:37:20
*************** HTML::do_tag(Retriever &retriever, Strin
*** 841,846 ****
--- 841,850 ----
{
retriever.got_meta_email(conf["content"]);
}
+ else if (mystrcasecmp(cache, "date") == 0)
+ {
+ retriever.got_time(conf["content"]);
+ }
else if (mystrcasecmp(cache, "htdig-notification-date") == 0)
{
retriever.got_meta_notification(conf["content"]);
Index: htdig/Retriever.cc
===================================================================
RCS file: /opt/htdig/cvs/htdig3/htdig/Retriever.cc,v
retrieving revision 1.39
diff -c -3 -p -r1.39 Retriever.cc
*** htdig/Retriever.cc 1999/03/16 02:04:28 1.39
--- htdig/Retriever.cc 1999/04/02 01:37:20
*************** Retriever::RetrievedDocument(Document &d
*** 543,548 ****
--- 543,549 ----
current_ref = ref;
current_anchor_number = 0;
current_title = 0;
+ current_time = 0;
current_head = 0;
current_meta_dsc = 0;
*************** Retriever::RetrievedDocument(Document &d
*** 565,571 ****
//
ref->DocHead(current_head);
ref->DocMetaDsc(current_meta_dsc);
! ref->DocTime(doc.ModTime());
ref->DocTitle(current_title);
ref->DocSize(doc.Length());
ref->DocAccessed(time(0));
--- 566,575 ----
//
ref->DocHead(current_head);
ref->DocMetaDsc(current_meta_dsc);
! if (current_time == 0)
! ref->DocTime(doc.ModTime());
! else
! ref->DocTime(current_time);
ref->DocTitle(current_title);
ref->DocSize(doc.Length());
ref->DocAccessed(time(0));
*************** Retriever::got_title(char *title)
*** 891,896 ****
--- 895,930 ----
current_title = title;
}
+
//**************************************************************************
***
+ // void Retriever::got_time(char *time)
+ //
+ void
+ Retriever::got_time(char *time)
+ {
+ time_t new_time;
+ struct tm tm;
+
+ if (debug > 1)
+ cout << "\ntime: " << time << endl;
+
+ //
+ // As defined by the Dublin Core, this should be YYYY-MM-DD
+ // In the future, we'll need to deal with the scheme portion
+ // in case someone picks a different format.
+ //
+ if (mystrptime(time, "%Y-%m-%d", &tm))
+ {
+ #if HAVE_TIMEGM
+ new_time = timegm(&tm);
+ #else
+ new_time = mytimegm(&tm);
+ #endif
+ current_time = new_time;
+ }
+
+ // If we can't convert it, current_time stays the same and we get
+ // the default--the date returned by the server...
+ }
//**************************************************************************
***
// void Retriever::got_anchor(char *anchor)
Index: htdig/Retriever.h
===================================================================
RCS file: /opt/htdig/cvs/htdig3/htdig/Retriever.h,v
retrieving revision 1.9
diff -c -3 -p -r1.9 Retriever.h
*** htdig/Retriever.h 1999/03/16 02:04:29 1.9
--- htdig/Retriever.h 1999/04/02 01:37:20
*************** public:
*** 50,55 ****
--- 50,56 ----
void got_word(char *word, int location, int heading);
void got_href(URL &url, char *description);
void got_title(char *title);
+ void got_time(char *time);
void got_head(char *head);
void got_meta_dsc(char *md);
void got_anchor(char *anchor);
*************** private:
*** 75,80 ****
--- 76,82 ----
String current_title;
String current_head;
String current_meta_dsc;
+ time_t current_time;
int current_id;
DocumentRef *current_ref;
int current_anchor_number;
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.