Terry, Check out the contribution sections of the lucene site. It has a few xml document parsers.
--Peter On 3/5/02 9:08 PM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote: > Terry, > > These are really not Lucene questions. Lucene will let you index text, > but you need to figure out how to parse your XHTML files. > Take a look at Jtidy on sf.net, I think Jtidy can help you with parsing > XHTML, or perhaps Xerces from xml.apache.org can. > > Otis > > --- Terry McGregor <[EMAIL PROTECTED]> wrote: >> >> Hi, >> >> I'm new to Lucene, and I was wondering how I should parse XHTML >> files. >> Should I name them with the .HTML file extention and use >> org.apache.lucene.demo.IndexHTML or name them with the .XML file >> extention >> and use an XML parser? >> >> Also, I would like to keep my XHTML files with a .XHTML file >> extention, if >> possible, but that's not so important. >> >> Thanks, >> Terry. >> >> _________________________________________________________________ >> Join the world?s largest e-mail service with MSN Hotmail. >> http://www.hotmail.com >> >> >> -- >> To unsubscribe, e-mail: >> <mailto:[EMAIL PROTECTED]> >> For additional commands, e-mail: >> <mailto:[EMAIL PROTECTED]> >> > > > __________________________________________________ > Do You Yahoo!? > Try FREE Yahoo! Mail - the world's greatest free email! > http://mail.yahoo.com/ > > -- > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>