On Sun, 17 Feb 2008, Jukka Zitting wrote:
I'm currently looking at enhancing the Word and PowerPoint parsing functionality in Apache Tika, and I was wondering about the current status of the relevant code in POI scratchpad. Is anyone working on the code, and are there plans to release the code at some point?

As Yegor has mentioned, HSLF is undergoing development, but HWPF is not. You'll find both of them in the scratchpad jar of the recent 3.0.2 release


The snag with HWPF is that there was only one main developer working on it (Ryan Ackley), and when he moved to a new firm who'd licensed the file format documentation from Micrsoft, he was no longer able to contribute.

However, as of this weekend, Microsoft have publically released the file format docs for word, excel and powerpoint. While Ryan still wouldn't be able to contribute (I suspect his NDA will still apply even though most of the information is now public), it makes it much easier for someone else to pick up the code and improve it. Don't suppose you're interested? :)


Oh, and since you're an ASF member, I'm sure we could sort you out with svn commit if you're interested in working on HSLF or HWPF

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to