Terry,

Check out the contribution sections of the lucene site. It has a few xml
document parsers.

--Peter

On 3/5/02 9:08 PM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote:

> Terry,
> 
> These are really not Lucene questions.  Lucene will let you index text,
> but you need to figure out how to parse your XHTML files.
> Take a look at Jtidy on sf.net, I think Jtidy can help you with parsing
> XHTML, or perhaps Xerces from xml.apache.org can.
> 
> Otis
> 
> --- Terry McGregor <[EMAIL PROTECTED]> wrote:
>> 
>> Hi,
>> 
>> I'm new to Lucene, and I was wondering how I should parse XHTML
>> files. 
>> Should I name them with the .HTML file extention and use
>> org.apache.lucene.demo.IndexHTML or name them with the .XML file
>> extention 
>> and use an XML parser?
>> 
>> Also, I would like to keep my XHTML files with a .XHTML file
>> extention, if 
>> possible, but that's not so important.
>> 
>> Thanks,
>> Terry.
>> 
>> _________________________________________________________________
>> Join the world?s largest e-mail service with MSN Hotmail.
>> http://www.hotmail.com
>> 
>> 
>> --
>> To unsubscribe, e-mail:
>> <mailto:[EMAIL PROTECTED]>
>> For additional commands, e-mail:
>> <mailto:[EMAIL PROTECTED]>
>> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Try FREE Yahoo! Mail - the world's greatest free email!
> http://mail.yahoo.com/
> 
> --
> To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
> 
> 


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to