I am new to py-lucene but have worked on java lucene 1.4.3.
How I can index following types of files by using py-lucene?
[word files, pdf , excel, xsl, xml, open office files]
is there any support of 3rd party lib in py-lucene also?

(As for java lucene 3rd party libraries are available)

As part of the "Lucene in Action" samples and test cases porting effort I got support for some non-plain text formats with PyLucene:


  - html, via the HTMLParser module in python
  - xml, via the xml.sax parser module in python
  - pdf, via the pdftotext and pdfinfo programs available from the xpdf
    package at http://www.foolabs.com/xpdf
  - msword, via the antiword program available from the antiword package at
    http://www.winfield.demon.nl

For examples on how to use these with PyLucene, please refer to the
samples/LuceneInAction/FileIndexer.py sample and the samples/LuceneInAction/lia/handlingtypes code tree.


Andi..
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to