At 11:28 AM +0100 9/29/99, Rzepa, Henry wrote:
>Actually, on this final point, can someone let me know whether htdig
>exactly conforms to any W3 spec of HTML?  For example, does it
>track TITLE attributes in the various elements that have them, eg
><object> etc etc.

Sadly, no. If anything, it's probably pretty close to HTML 2.0. For 
one, no one has made much of an effort to check which tags need to be 
added to the HTML parser. For another, the later standards (esp. 4.0) 
are very flexible in terms of metadata. This is nice, but it makes it 
very hard for an indexer like htdig.

For example, you could include metadata about a document in another 
URL entirely:

<LINK REL="author" HREF="author.html">

For compliance, it would need to download the nested metadata.

It would obviously be very useful to go through the HTML parser and 
compare it to the HTML 4.0 standard. I would recommend doing this in 
the 3.2 code since the parser is a little cleaner and easier to add 
new tags.

(To answer your question, it would be pretty easy to add TITLE 
parsing for <a href> tags.)

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.

Reply via email to