On Fri, 5 Feb 1999, U.O. Telematica Municipale - Comune di Prato wrote:
> I think I have fixed the trouble I previously explained about the iso-8601
> format date for htnotify (Gilles was right when he said about '-').
>
> I have introduced a function called parse_date() that scans the date string
> and return the year, the month and the day. I have set it to allow these
> tokens for the date:
>
> / - . and space too
>
> So I could write (with iso-8601 option set):
>
> <META NAME="htdig-notification-date" CONTENT="1999-02-10">
>
> or
>
> <META NAME="htdig-notification-date" CONTENT="1999 02 10">
>
> and so on ...
>
> I made some tests. I include the htnotify.cc source (zipped). Please try it.
>
> It also notify (if verbose is set) every document found with an
> htdig-notication meta specified. And also send a notification message for
> malformed date parameters.
I liked the test and notification for malformed dates. However, I was
working in parallel with Gabriele on this problem, and I think I have a
solution that's more flexible. It allows any punctuation and/or white
space as separators, and doesn't care about the number of digits in
each field of the date. It also does some checking to see if the user
placed fields in an order other than what was expected, and tries to do
something sensible. Gabriele gave me the idea of allowing more than just
"/" & "-" as separators, but I decided to take it a step further.
> I'm gonna make further changes (for example I want to add the possibility
> to specify the month by string: April, Apr, etc ...).
Hmmm. Then we get into the whole internationalization issue, don't we? ;-)
According to Geoff Hutchison:
> I haven't looked at the source. Does it assume Y-M-D based on iso_8601?
> Otherwise, how do you figure out month v. date? What about short years?
> "01 02 03"? Here's a place where we could have some sort of format
> specifier (while we don't with servers returning odd formats).
The handling of short years was still there. At least it still worked on
my system. Gabriele's code used %4d to scan the year, but I don't think
scanf is rigid about field widths. I've also changed the the short year
handling to make a different assumption: years less than 70 (the Unix
epoch) are treated as occuring after 2000, not 1900. As for the order
of y m d or m d y, it does it based on iso_8601. I stuck to that as an
initial guess, but try to do something sensible with unambiguous dates
that don't follow the selected standard. Ambiguous dates that don't
follow the selected standard are a lost cause, though.
> Maybe:
> <meta name="htdig-notification-date-format" content="%y %m %d">
> <meta name="htdig-notification-date" content="99 02 05">
That would require changing the database format (again), and htdig.
It's not a bad idea, but I wonder how much it would get used in practise.
I think the average user of this feature would rather stick to a standard
format (especially if it's flexible) than to have to explicitly specify
the format (and have to learn about the format of format specifications).
> > I'm very proud to have contributed to htdig, even if in a very very very
> > little part.
>
> No part is "little."
That's how it starts! You get your feet wet with a small patch or two, and
before you know it, you're on the developer's list, spending way too much
time diving into places you never thought you'd venture into before. ;-)
It is a great feeling, though, to contribute to the development of
something you and thousand of others will use. I agree with Geoff that
all contributions are worthwhile, whether it's a little bug fix or a
big new feature, or even just a good idea!
Anyway, here's my patch, to be applied to the 020399 snapshot. I ended up
gutting parse_date (sorry, Gabriele!), and rewriting it. I also changed
the year, month and day back to int, from unsigned, because it's consistent
with the type usually used for the fields of the tm structure against which
we're comparing. I also threw in a bit of unrelated cleaning up...
--- ./htnotify/htnotify.cc.datebug2 Tue Jan 26 14:10:22 1999
+++ ./htnotify/htnotify.cc Fri Feb 5 14:07:36 1999
@@ -5,6 +5,10 @@
// Send e-mail to addresses mentioned in documents if the doc
// has "expired"
//
+// Added function parse_date() for correct date parsing (1999/02/05)
+// Gabriele Bartolini - U.O. Telematica Municipale Comune di Prato - ITALIA
+// and Gilles Detillieux
+//
// $Log: htnotify.cc,v $
// Revision 1.18 1999/01/25 05:13:22 ghutchis
// Fix comiler errors.
@@ -71,17 +75,18 @@
static char RCSid[] = "$Id: htnotify.cc,v 1.18 1999/01/25 05:13:22 ghutchis Exp $";
#endif
-#include <Configuration.h>
-#include <DocumentDB.h>
-#include <DocumentRef.h>
-#include <defaults.h>
+#include "Configuration.h"
+#include "DocumentDB.h"
+#include "DocumentRef.h"
+#include "defaults.h"
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <fstream.h>
#include <time.h>
#include <stdio.h>
-#include <HtURLCodec.h>
+#include <ctype.h>
+#include "HtURLCodec.h"
// If we have this, we probably want it.
#ifdef HAVE_GETOPT_H
@@ -91,6 +96,7 @@
void htnotify(DocumentRef &);
void usage();
void send_notification(char *date, char *email, char *url, char *subject);
+int parse_date(char *date, int &year, int &month, int &day);
int verbose = 0;
@@ -109,7 +115,7 @@
int main(int ac, char **av)
{
int c;
- extern char *optarg;
+ extern char *optarg;
String base;
String configFile = DEFAULT_CONFIG_FILE;
@@ -177,7 +183,8 @@
while ((str = (String *) docs->Get_Next()))
{
ref = docdb[str->get()];
- htnotify(*ref);
+ if (ref)
+ htnotify(*ref);
delete ref;
}
delete docs;
@@ -207,17 +214,29 @@
}
int month, day, year;
- if (config.Boolean("iso_8601"))
- {
- sscanf(date, "%d-%d-%d", &year, &month, &day);
- }
- else
- {
- sscanf(date, "%d/%d/%d", &month, &day, &year);
- }
+ if (!parse_date(date, year, month, day))
+ {
+ // Parsing Failed
+ if (verbose > 1)
+ {
+ cout << "Malformed date: " << date << endl;
+ }
+
+ send_notification(date, email, ref.DocURL(), "Malformed Date");
- if (year > 1900)
- year -= 1900;
+ if (verbose)
+ {
+ cout << "Message sent." << endl;
+ cout << "Date: " << date << endl;
+ cout << "URL: " << ref.DocURL() << endl;
+ cout << "Subject: Malformed Date" << endl;
+ cout << "Email: " << email << endl;
+ cout << endl;
+ }
+ return;
+ }
+
+ year -= 1900;
month--;
//
@@ -243,6 +262,16 @@
cout << endl;
}
}
+ else
+ {
+ // Page not yet expired
+ if (verbose)
+ {
+ cout << "htnotify: URL " << ref.DocURL()
+ << " (" << year+1900 << "-" << month+1
+ << "-" << day << ")" << endl;
+ }
+ }
}
}
@@ -315,7 +344,7 @@
//*****************************************************************************
-// Display usage information for the htdig program
+// Display usage information for the htnotify program
//
void usage()
{
@@ -333,3 +362,59 @@
}
+//*****************************************************************************
+// Parse the notification date string from the user's document
+//
+int parse_date(char *date, int &year, int &month, int &day)
+{
+ int mm = -1, dd = -1, yy = -1, t;
+ String scandate = date;
+
+ for (char *s = scandate.get(); *s; s++)
+ if (ispunct(*s))
+ *s = ' ';
+
+ if (config.Boolean("iso_8601"))
+ {
+ // conf file specified ISO standard, so expect [yy]yy mm dd.
+ sscanf(scandate.get(), "%d%d%d", &yy, &mm, &dd);
+ }
+ else
+ {
+ // Default to American standard when not specified in conf,
+ // so expect mm dd [yy]yy.
+ sscanf(scandate.get(), "%d%d%d", &mm, &dd, &yy);
+ }
+
+ // OK, we took our best guess at the order the y, m & d should be.
+ // Now let's see if we guessed wrong, and fix it. This won't work
+ // for ambiguous dates (e.g. 01/02/03), which must be given in the
+ // expected format.
+ if (dd > 31 && yy <= 31)
+ {
+ t = yy; yy = dd; dd = t;
+ }
+ if (mm > 31 && yy <= 31)
+ {
+ t = yy; yy = mm; mm = t;
+ }
+ if (mm > 12 && dd <= 12)
+ {
+ t = dd; dd = mm; mm = t;
+ }
+ if (yy < 0 || mm < 1 || mm > 12 || dd < 1 || dd > 31)
+ return 0; // Invalid date
+
+ if (yy < 70) // before UNIX Epoch
+ yy += 2000;
+ else if (yy < 1900) // before computer age
+ yy += 1900;
+ if (verbose > 1)
+ cout << "Date used (y-m-d): " << yy << '-' << mm << '-' << dd << endl;
+
+ year = yy;
+ month = mm;
+ day = dd;
+
+ return 1;
+}
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.