RE: Parsing issue
I use it and have yet to have a problem with it. It uses the Xerces API so you parse and access html files just like xml files. Very cool, Chuck > -Original Message- > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > Sent: Tuesday, January 04, 2005 2:05 PM > To: Lucene Users List > Subject: Re: Parsing issue > > That's the correct place to look and it includes code samples. > Yes, it's a Jar file that you add to the CLASSPATH and use ... hm, > normally programmatically, yes :). > > Otis > > --- Hetan Shah <[EMAIL PROTECTED]> wrote: > > > Has any one used NekoHTML ? If so how do I use it. Is it a stand > > alone > > jar file that I include in my classpath and start using just like > > IndexHTML ? > > Can some one share syntax and or code if it is supposed to be used > > programetically. I am looking at > > http://www.apache.org/~andyc/neko/doc/html/ for more information is > > that > > the correct place to look? > > > > Thanks, > > -H > > > > > > Erik Hatcher wrote: > > > > > Sure... clean up your HTML and it'll parse fine :) Perhaps use > > JTidy > > > to clean up the HTML. Or switch to using a more forgiving parser > > like > > > NekoHTML. > > > > > > Erik > > > > > > On Jan 4, 2005, at 3:59 PM, Hetan Shah wrote: > > > > > >> Hello All, > > >> > > >> Does any one know how to handle the following parsing error? > > >> > > >> thanks for pointers/code snippets. > > >> > > >> -H > > >> > > >> While trying to parse a HTML file using IndexHTML I get > > >> > > >> Parse Aborted: Encountered "\"" at line 8, column 1162. > > >> Was expecting one of: > > >> ... > > >> "=" ... > > >> ... > > >> > > >> > > >> > > >> > > - > > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > > >> For additional commands, e-mail: > > [EMAIL PROTECTED] > > > > > > > > > > > > > > - > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: > > [EMAIL PROTECTED] > > > > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parsing issue
That's the correct place to look and it includes code samples. Yes, it's a Jar file that you add to the CLASSPATH and use ... hm, normally programmatically, yes :). Otis --- Hetan Shah <[EMAIL PROTECTED]> wrote: > Has any one used NekoHTML ? If so how do I use it. Is it a stand > alone > jar file that I include in my classpath and start using just like > IndexHTML ? > Can some one share syntax and or code if it is supposed to be used > programetically. I am looking at > http://www.apache.org/~andyc/neko/doc/html/ for more information is > that > the correct place to look? > > Thanks, > -H > > > Erik Hatcher wrote: > > > Sure... clean up your HTML and it'll parse fine :) Perhaps use > JTidy > > to clean up the HTML. Or switch to using a more forgiving parser > like > > NekoHTML. > > > > Erik > > > > On Jan 4, 2005, at 3:59 PM, Hetan Shah wrote: > > > >> Hello All, > >> > >> Does any one know how to handle the following parsing error? > >> > >> thanks for pointers/code snippets. > >> > >> -H > >> > >> While trying to parse a HTML file using IndexHTML I get > >> > >> Parse Aborted: Encountered "\"" at line 8, column 1162. > >> Was expecting one of: > >> ... > >> "=" ... > >> ... > >> > >> > >> > >> > - > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parsing issue
Has any one used NekoHTML ? If so how do I use it. Is it a stand alone jar file that I include in my classpath and start using just like IndexHTML ? Can some one share syntax and or code if it is supposed to be used programetically. I am looking at http://www.apache.org/~andyc/neko/doc/html/ for more information is that the correct place to look? Thanks, -H Erik Hatcher wrote: Sure... clean up your HTML and it'll parse fine :) Perhaps use JTidy to clean up the HTML. Or switch to using a more forgiving parser like NekoHTML. Erik On Jan 4, 2005, at 3:59 PM, Hetan Shah wrote: Hello All, Does any one know how to handle the following parsing error? thanks for pointers/code snippets. -H While trying to parse a HTML file using IndexHTML I get Parse Aborted: Encountered "\"" at line 8, column 1162. Was expecting one of: ... "=" ... ... - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parsing issue
Sure... clean up your HTML and it'll parse fine :) Perhaps use JTidy to clean up the HTML. Or switch to using a more forgiving parser like NekoHTML. Erik On Jan 4, 2005, at 3:59 PM, Hetan Shah wrote: Hello All, Does any one know how to handle the following parsing error? thanks for pointers/code snippets. -H While trying to parse a HTML file using IndexHTML I get Parse Aborted: Encountered "\"" at line 8, column 1162. Was expecting one of: ... "=" ... ... - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]