I'm new to Lucene and I'm trying to index an HTML file parsed with
NekoHTML.
With text between HTML tags, its easy enough to have an overloaded
getText() method which either recursively indexes all text, or which
accepts the name of a tag (like "title") and only finds text between
<title></title> tags.
Unfortunately I'm trying to index URL's, image names, and ALT text,
all of which remain inside the tag and I can't figure out how to
access that data. I realize this is more of a NekoHTML question than
a Lucene question, but I know Lucene is often used for indexing web
content and was hoping someone on this list could help.
Cheers.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]