A standard format for delivering archived discussions (with metadata) would allow mailing lists to be searchable in conjunction with newsgroups, thus creating integrated search services (imagine Google Groups with more than just newsgroups).
Let's start with making the archives follow existing standards, like the robots meta tag, and descriptions.
Indexes to messages should have NOINDEX, FOLLOW.
Messages should have a title which reflects the mailing list and the title of the message. The beginning of the message (or first section without "> ") should be the description meta tag. The author should be in a dc.creator meta tag. The date should be in dc.date, with the date in ISO 8601 web profile, if possible, RFC 822, otherwise. The message ID probably should be dc.identifier. The mailing list address/name could be in dc.publisher.
The navcrap and message header fields should be marked off with all of the noindex sectional tags we can find.
Ultraseek already has an NNTP spider. It works OK, and parsing the mail/news format directly gives better results than trying to reverse engineer it from the pretty-printed HTML. It also allows sane handling of attachments. But it gives NNTP URLs, which surprise some users.
Update: Verity Ultraseek is the product formerly know as Inktomi Enterprise search and Inktomi Search/Enterprise. That product was earlier known as Ultraseek Server. The name remains the same, eh?
wunder -- Walter Underwood Software Architect Verity Ultraseek
_______________________________________________ Robots mailing list [EMAIL PROTECTED] http://www.mccmedia.com/mailman/listinfo/robots