RE: Parsing issue

2005-01-04 Thread Chuck Williams
I use it and have yet to have a problem with it.  It uses the Xerces API
so you parse and access html files just like xml files.  Very cool,

Chuck

  > -Original Message-
  > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
  > Sent: Tuesday, January 04, 2005 2:05 PM
  > To: Lucene Users List
  > Subject: Re: Parsing issue
  > 
  > That's the correct place to look and it includes code samples.
  > Yes, it's a Jar file that you add to the CLASSPATH and use ... hm,
  > normally programmatically, yes :).
  > 
  > Otis
  > 
  > --- Hetan Shah <[EMAIL PROTECTED]> wrote:
  > 
  > > Has any one used NekoHTML ? If so how do I use it. Is it a stand
  > > alone
  > > jar file that I include in my classpath and start using just like
  > > IndexHTML ?
  > > Can some one share syntax and or code if it is supposed to be used
  > > programetically. I am looking at
  > > http://www.apache.org/~andyc/neko/doc/html/ for more information
is
  > > that
  > > the correct place to look?
  > >
  > > Thanks,
  > > -H
  > >
  > >
  > > Erik Hatcher wrote:
  > >
  > > > Sure... clean up your HTML and it'll parse fine :)   Perhaps use
  > > JTidy
  > > > to clean up the HTML.  Or switch to using a more forgiving
parser
  > > like
  > > > NekoHTML.
  > > >
  > > > Erik
  > > >
  > > > On Jan 4, 2005, at 3:59 PM, Hetan Shah wrote:
  > > >
  > > >> Hello All,
  > > >>
  > > >> Does any one know how to handle the following parsing error?
  > > >>
  > > >> thanks for pointers/code snippets.
  > > >>
  > > >> -H
  > > >>
  > > >> While trying to parse a HTML file using IndexHTML I get
  > > >>
  > > >> Parse Aborted: Encountered "\"" at line 8, column 1162.
  > > >> Was expecting one of:
  > > >>  ...
  > > >> "=" ...
  > > >>  ...
  > > >>
  > > >>
  > > >>
  > > >>
  > >
-
  > > >> To unsubscribe, e-mail:
[EMAIL PROTECTED]
  > > >> For additional commands, e-mail:
  > > [EMAIL PROTECTED]
  > > >
  > > >
  > > >
  > > >
  > >
-
  > > > To unsubscribe, e-mail:
[EMAIL PROTECTED]
  > > > For additional commands, e-mail:
  > > [EMAIL PROTECTED]
  > > >
  > >
  > >
  > >
  > >
-
  > > To unsubscribe, e-mail: [EMAIL PROTECTED]
  > > For additional commands, e-mail:
[EMAIL PROTECTED]
  > >
  > >
  > 
  > 
  >
-
  > To unsubscribe, e-mail: [EMAIL PROTECTED]
  > For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Parsing issue

2005-01-04 Thread Otis Gospodnetic
That's the correct place to look and it includes code samples.
Yes, it's a Jar file that you add to the CLASSPATH and use ... hm,
normally programmatically, yes :).

Otis

--- Hetan Shah <[EMAIL PROTECTED]> wrote:

> Has any one used NekoHTML ? If so how do I use it. Is it a stand
> alone 
> jar file that I include in my classpath and start using just like 
> IndexHTML ?
> Can some one share syntax and or code if it is supposed to be used 
> programetically. I am looking at 
> http://www.apache.org/~andyc/neko/doc/html/ for more information is
> that 
> the correct place to look?
> 
> Thanks,
> -H
> 
> 
> Erik Hatcher wrote:
> 
> > Sure... clean up your HTML and it'll parse fine :)   Perhaps use
> JTidy 
> > to clean up the HTML.  Or switch to using a more forgiving parser
> like 
> > NekoHTML.
> >
> > Erik
> >
> > On Jan 4, 2005, at 3:59 PM, Hetan Shah wrote:
> >
> >> Hello All,
> >>
> >> Does any one know how to handle the following parsing error?
> >>
> >> thanks for pointers/code snippets.
> >>
> >> -H
> >>
> >> While trying to parse a HTML file using IndexHTML I get
> >>
> >> Parse Aborted: Encountered "\"" at line 8, column 1162.
> >> Was expecting one of:
> >>  ...
> >> "=" ...
> >>  ...
> >>
> >>
> >>
> >>
> -
> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >> For additional commands, e-mail:
> [EMAIL PROTECTED]
> >
> >
> >
> >
> -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail:
> [EMAIL PROTECTED]
> >
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Parsing issue

2005-01-04 Thread Hetan Shah
Has any one used NekoHTML ? If so how do I use it. Is it a stand alone 
jar file that I include in my classpath and start using just like 
IndexHTML ?
Can some one share syntax and or code if it is supposed to be used 
programetically. I am looking at 
http://www.apache.org/~andyc/neko/doc/html/ for more information is that 
the correct place to look?

Thanks,
-H
Erik Hatcher wrote:
Sure... clean up your HTML and it'll parse fine :)   Perhaps use JTidy 
to clean up the HTML.  Or switch to using a more forgiving parser like 
NekoHTML.

Erik
On Jan 4, 2005, at 3:59 PM, Hetan Shah wrote:
Hello All,
Does any one know how to handle the following parsing error?
thanks for pointers/code snippets.
-H
While trying to parse a HTML file using IndexHTML I get
Parse Aborted: Encountered "\"" at line 8, column 1162.
Was expecting one of:
 ...
"=" ...
 ...

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Parsing issue

2005-01-04 Thread Erik Hatcher
Sure... clean up your HTML and it'll parse fine :)   Perhaps use JTidy 
to clean up the HTML.  Or switch to using a more forgiving parser like 
NekoHTML.

Erik
On Jan 4, 2005, at 3:59 PM, Hetan Shah wrote:
Hello All,
Does any one know how to handle the following parsing error?
thanks for pointers/code snippets.
-H
While trying to parse a HTML file using IndexHTML I get
Parse Aborted: Encountered "\"" at line 8, column 1162.
Was expecting one of:
 ...
"=" ...
 ...

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]