On Thu, 7 Sep 2006, Tomi NA wrote:
On 9/7/06, Venkateshprasanna <[EMAIL PROTECTED]> wrote:
Is there any filter available for extracting text from MS Powerpoint files
and indexing them?
The lucene website suggests the POI project, which, it seems does not
support PPT files as of now.

http://jakarta.apache.org/poi/hslf/index.html

It doesn't say poi doesn't support ppt. It just says support is limited. Don't know exactly how limited, but certainly not useless for indexing purposes.

Support for editing and adding things to PowerPoint files is limited, as is getting out the finer points of fonts and positioning.

Getting text out should "just work" for you. The only thing you'll need to decide is if you want hslf.PowerPointExtractor to give you slide and notes text, or just slide text :)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to