Pradeep, Currently Lucene does not provide the ability to convert documents to text for indexing. There is talk of adding this kind of thing to the goal of the project, along with providing crawlers to traverse web, local disk, ftp, and RDBMS sources of data.
The problem with indexining irrespective of file type is that each document format contains embedded information that must be stripped out (or ignored) and the text needs to be retrieved for indexing. An extreeme example is a PDF which has a considerably complicated document format. On the contributions page there are some pointers that may provide information about processing the types of documents you're interested in. http://jakarta.apache.org/lucene/docs/contributions.html If you've not taken the time to do so, look at the FAQs, they are very informative: http://www.lucene.com/cgi-bin/faq/faqmanager.cgi http://jakarta.apache.org/lucene/docs/gettingstarted.html http://www.jguru.com/faq/Lucene Good luck! Andy On Wed, Feb 13, 2002 at 09:24:33PM +0530, Pradeep Kumar K wrote: > Hi Lucene friends! > > How the files of different format can be indexed and searched? ( As I > know lucene is having HTML indexer and searcher, which comes along with > it and also XML indexer, but is there any way to index files > irrespective of the file type) > Any suggestions will be greatly appreciated.. > > Thanks in advance. > Pradeep > > > -------------------------------------------------------------- > Robosoft Technologies, Mangalore, India > > > > -- > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > -- -------------------------------------------------- Andrew Libby CommNav, Inc [EMAIL PROTECTED] -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>