[lucy-user] Indexing HTML documents

Grant McLean Sun, 10 Jul 2011 20:28:58 -0700

Hi all

I'm just getting started with trying out Lucy. Installation went without
a hitch and I've successfully worked my way through the tutorials.
Congratulations on getting the project to this level of quality.


My main interest is indexing HTML documents for web sites.  It seems
that if I feed the HTML file contents to the Lucy indexer, all the
markup (tags and attributes) ends up in the index and consequently comes
back out in the highlighted excerpts. Is it my responsibility to strip
the tags out before passing the text to the indexer? Or is there a
simple option I can enable somewhere to have this happen automatically?

Regards
Grant

[lucy-user] Indexing HTML documents

Reply via email to