Hi,
I recommend CyberNeko too.

You can use it as described here:
http://www.mail-archive.com/j-users@xerces.apache.org/msg00631.html

It's pretty simple.

Regards,
--
Alessio Pace.
http://www.jroller.com/page/alessiopace


On 8/7/06, Alfredo Ledezma Melendez <[EMAIL PROTECTED]>
wrote:

Using a XML parser to process HTML info won't work (until the html is
well-formed).

To handle this info using Digester, first make your html a well-formed
document (there are some java libraries to do this). Some time ago (I
guess
last week) the same topic was treated and some members recommended such
tools.

NekoParser
http://java-source.net/open-source/html-parsers/nekohtml


Regards,
____________________________________________
Alfredo Ledezma Meléndez.
Gerencia Implantación S.A.P.
Supervisor Técnico WEB-ABAP
Radiomóvil DIPSA, S. A. de C. V.
Lago Alberto No. 366, Col. Anáhuac, C.P. 11320
México D.F.

> -----Original Message-----
> From: Fabian Sergio de Rosa [mailto:[EMAIL PROTECTED]
> Sent: Lunes, 07 de Agosto de 2006 03:41 p.m.
> To: Jakarta Commons Users List
> Subject: Re: Use Digester against a htm file
>
> i don't know if html is compatible with sax, and digester uses sax to
> parse
> xml. But if you try, you will know.
> but i recomend that you try to use xml because the html format isn't
> restrict and it's most oriented to show information.
>
> 2006/8/7, Marcos Hass W <[EMAIL PROTECTED]>:
> >
> > Hi all,
> >
> > I've been using digester for regular xml files and now I have a
> different
> > use case .., I need to feed a database from an .htm file.
> > Is it possible to use digester against a .htm file ? I mean ... a file
> > that
> > doesn't have all tags closed, for example.:
> >
> > <li>
> >     <ol>Item X
> > </li>
> > <p>This is the text I want to insert into a database
> > <br>
> >
> >
> > Thank you very much
> > Marcos
> >
> >



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to