Re: Parsing issue

2005-01-04 Thread Erik Hatcher
Sure... clean up your HTML and it'll parse fine :)   Perhaps use JTidy 
to clean up the HTML.  Or switch to using a more forgiving parser like 
NekoHTML.

Erik
On Jan 4, 2005, at 3:59 PM, Hetan Shah wrote:
Hello All,
Does any one know how to handle the following parsing error?
thanks for pointers/code snippets.
-H
While trying to parse a HTML file using IndexHTML I get
Parse Aborted: Encountered \ at line 8, column 1162.
Was expecting one of:
ArgName ...
= ...
TagEnd ...

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Parsing issue

2005-01-04 Thread Hetan Shah
Has any one used NekoHTML ? If so how do I use it. Is it a stand alone 
jar file that I include in my classpath and start using just like 
IndexHTML ?
Can some one share syntax and or code if it is supposed to be used 
programetically. I am looking at 
http://www.apache.org/~andyc/neko/doc/html/ for more information is that 
the correct place to look?

Thanks,
-H
Erik Hatcher wrote:
Sure... clean up your HTML and it'll parse fine :)   Perhaps use JTidy 
to clean up the HTML.  Or switch to using a more forgiving parser like 
NekoHTML.

Erik
On Jan 4, 2005, at 3:59 PM, Hetan Shah wrote:
Hello All,
Does any one know how to handle the following parsing error?
thanks for pointers/code snippets.
-H
While trying to parse a HTML file using IndexHTML I get
Parse Aborted: Encountered \ at line 8, column 1162.
Was expecting one of:
ArgName ...
= ...
TagEnd ...

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Parsing issue

2005-01-04 Thread Otis Gospodnetic
That's the correct place to look and it includes code samples.
Yes, it's a Jar file that you add to the CLASSPATH and use ... hm,
normally programmatically, yes :).

Otis

--- Hetan Shah [EMAIL PROTECTED] wrote:

 Has any one used NekoHTML ? If so how do I use it. Is it a stand
 alone 
 jar file that I include in my classpath and start using just like 
 IndexHTML ?
 Can some one share syntax and or code if it is supposed to be used 
 programetically. I am looking at 
 http://www.apache.org/~andyc/neko/doc/html/ for more information is
 that 
 the correct place to look?
 
 Thanks,
 -H
 
 
 Erik Hatcher wrote:
 
  Sure... clean up your HTML and it'll parse fine :)   Perhaps use
 JTidy 
  to clean up the HTML.  Or switch to using a more forgiving parser
 like 
  NekoHTML.
 
  Erik
 
  On Jan 4, 2005, at 3:59 PM, Hetan Shah wrote:
 
  Hello All,
 
  Does any one know how to handle the following parsing error?
 
  thanks for pointers/code snippets.
 
  -H
 
  While trying to parse a HTML file using IndexHTML I get
 
  Parse Aborted: Encountered \ at line 8, column 1162.
  Was expecting one of:
  ArgName ...
  = ...
  TagEnd ...
 
 
 
 
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 
 
 
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Parsing issue

2005-01-04 Thread Chuck Williams
I use it and have yet to have a problem with it.  It uses the Xerces API
so you parse and access html files just like xml files.  Very cool,

Chuck

   -Original Message-
   From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
   Sent: Tuesday, January 04, 2005 2:05 PM
   To: Lucene Users List
   Subject: Re: Parsing issue
   
   That's the correct place to look and it includes code samples.
   Yes, it's a Jar file that you add to the CLASSPATH and use ... hm,
   normally programmatically, yes :).
   
   Otis
   
   --- Hetan Shah [EMAIL PROTECTED] wrote:
   
Has any one used NekoHTML ? If so how do I use it. Is it a stand
alone
jar file that I include in my classpath and start using just like
IndexHTML ?
Can some one share syntax and or code if it is supposed to be used
programetically. I am looking at
http://www.apache.org/~andyc/neko/doc/html/ for more information
is
that
the correct place to look?
   
Thanks,
-H
   
   
Erik Hatcher wrote:
   
 Sure... clean up your HTML and it'll parse fine :)   Perhaps use
JTidy
 to clean up the HTML.  Or switch to using a more forgiving
parser
like
 NekoHTML.

 Erik

 On Jan 4, 2005, at 3:59 PM, Hetan Shah wrote:

 Hello All,

 Does any one know how to handle the following parsing error?

 thanks for pointers/code snippets.

 -H

 While trying to parse a HTML file using IndexHTML I get

 Parse Aborted: Encountered \ at line 8, column 1162.
 Was expecting one of:
 ArgName ...
 = ...
 TagEnd ...




   
-
 To unsubscribe, e-mail:
[EMAIL PROTECTED]
 For additional commands, e-mail:
[EMAIL PROTECTED]




   
-
 To unsubscribe, e-mail:
[EMAIL PROTECTED]
 For additional commands, e-mail:
[EMAIL PROTECTED]

   
   
   
   
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]
   
   
   
   
  
-
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]