Hello, This is probably due to some bad HTML. The application you are using is just a demo, and uses a JavaCC-based HTML parser, which may not be resilient to invalid HTML. For Lucene in Action we developed a little extensible indexing framework, and for HTML indexing we used 2 tools to handle HTML parsing: JTidy and NekoHTML. Since the code for the book is freely available....... http://www.manning.com. NekoHTML knows how to deal with some bad HTML, that's why I'm suggesting this. The indexing framework could come handy for those working on various 'desktop search' applications (Roosster, LDesktop (if that's really happening), Lucidity, etc.)
Otis --- Hetan Shah <[EMAIL PROTECTED]> wrote: > java org.apache.lucene.demo.IndexHTML -create -index > /source/workarea/hs152827/newIndex .. > adding ../0/10037.html > adding ../0/10050.html > adding ../0/1006132.html > adding ../0/1013223.html > Parse Aborted: Encountered "\"" at line 5, column 1. > Was expecting one of: > <ArgName> ... > "=" ... > <TagEnd> ... > > And then the indexing hangs on this line. Earlier it used to go on > and > index remaining pages in the directory. Any idea why would the > indexer > stop at this error. > > Pointers are much needed and appreciated. > -H > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]