According to Geoff Hutchison:
> A few weeks ago, someone mentioned that we don't index <img alt="...">
> text. I figured it would be a pretty easy addition to the HTML parser.
> Along the way, I think we might be able to significantly clean up the
> do_tag method in the HTML parser.
>
> So here's how we do meta tags:
>
> case 20: // "meta"
> { position += length;
> Configuration conf;
> conf.NameValueSeparators("=");
> conf.Add(position);
>
> So this seems like a really good way to parse the tags in general. After
> all, what are tag attributes but key-value pairs. Thus, can't we just use
> this for most of the tags where we want the attributes? Then I could get
> the alt text like this:
>
> Configuration attrs;
> attrs.NameValueSeparators("=");
> conf.Add(position);
> ...
> // "img"
> got_word(attrs["alt"]...);
>
> Are there any hitches I'm ignoring? Since the configuration files deal
> with quoted values, shouldn't this work for even src attributes?
Here are a few hitches I can think of. First of all, do_tag should strip
out the final ">", so that an unquoted value at the end of the tag will be
picked up correctly. Secondly, Configuration::Add(char *) doesn't parse
values quoted with single quotes, whereas handling of quoted values in
do_tag() currently allows either single or double. If single quotes are
valid in HTML, then perhaps Configuration::Add(char *) should be rewritten
to allow them as well. Finally, the alt attribute on the image tag may
contain much more than just one word, so it will need to be parsed to
separate the words, and also add them to the head variable if appropriate.
This was on my to-do list, but I doubt I'll find the time in the next
couple weeks to tackle it, so if you want to beat me to it, more power
to you! :)
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.