Thanks a lot Andy. -Pradeep On Wednesday, February 13, 2002, at 09:50 PM, Andrew Libby wrote:
> > Pradeep, > Currently Lucene does not provide the ability to convert documents > to text for indexing. There is talk of adding this kind of thing to the > goal of the project, along with providing crawlers to traverse web, > local disk, ftp, and RDBMS sources of data. > > The problem with indexining irrespective of file type is that each > document > format contains embedded information that must be stripped out (or > ignored) > and the text needs to be retrieved for indexing. An extreeme example is > a PDF which has a considerably complicated document format. > > On the contributions page there are some pointers that may provide > information > about processing the types of documents you're interested in. > > http://jakarta.apache.org/lucene/docs/contributions.html > > If you've not taken the time to do so, look at the FAQs, they are very > informative: > > http://www.lucene.com/cgi-bin/faq/faqmanager.cgi > http://jakarta.apache.org/lucene/docs/gettingstarted.html > http://www.jguru.com/faq/Lucene > > Good luck! > > Andy > > > On Wed, Feb 13, 2002 at 09:24:33PM +0530, Pradeep Kumar K wrote: >> Hi Lucene friends! >> >> How the files of different format can be indexed and searched? >> ( As I >> know lucene is having HTML indexer and searcher, which comes along with >> it and also XML indexer, but is there any way to index files >> irrespective of the file type) >> Any suggestions will be greatly appreciated.. >> >> Thanks in advance. >> Pradeep >> >> >> -------------------------------------------------------------- >> Robosoft Technologies, Mangalore, India >> >> >> >> -- >> To unsubscribe, e-mail: <mailto:lucene-user- >> [EMAIL PROTECTED]> >> For additional commands, e-mail: <mailto:lucene-user- >> [EMAIL PROTECTED]> >> > > -- > -------------------------------------------------- > Andrew Libby > CommNav, Inc > [EMAIL PROTECTED] > > > -- > To unsubscribe, e-mail: <mailto:lucene-user- > [EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:lucene-user- > [EMAIL PROTECTED]> > -------------------------------------------------------------- Robosoft Technologies, Mangalore, India -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>