On Tue, 9 Jan 2007, Joerg Hohwiller wrote:
Besides I used the official POI release which is very old. I did NOT try the
HEAD from svn.

You should probably try with the svn head, you will generally have more luck with HWPF and HSLF from there.

I did NOT even open most of the documents. The constructor caused an exception. Something like illegal fileformat or magic-number or something.

I use hslf for a web spider that tries lots of random documents, and it's ok on almost all of them, so it's odd that you're having such problems

(Normally you want to catch CorruptPowerPointFileException and
EncryptedPowerPointFileException, and skip over them, and catch
ArrayIndexOutOfBoundsException, and report bugs for those)

If an ArrayIndexOutOfBoundException is thrown by a method where the user did not supply an index as parameter the implementation looks like a hack to me. Same applies to NullPointerExceptions.

These two are caused by powerpoint files containing things that we didn't know they might, and which our test documents don't. If you report bugs for them, and include the problem document, we can try and figure out which of our assumptions on the file format are wrong, and work to fix them.

My problem is that I extract many parts of text twice from the file. It seems to me that they are really in there twice even though not visible to the powerpoint application user.

Yup, that's to be expected on quicksaved files. QuickButCruddyTextExtractor will do something similar.

Your only option if you want to avoid that is to implement all the PersistPtr stuff, then parse SlideListWithTexts, and DoTheRightThing(tm) with it all. At which point, you've re-implemented most of hslf....

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/

Reply via email to