I've been playing with extracting images.

I've found a few 'wierdnesses' (I know, that's not a real word) in the 
org.apache.pdfbox.ExtractText class and If I can clear some time, I'll try to 
submit something on that.

Ignoring the 'wierdnesses' (which have more to do with options parsing and 
filenaming), it does successfully extract images to separate files.

However, the color table is apparently not being handled properly.

All the images end up displaying with the default Windows palette, which tells 
me that they probably are missing their own.

I assume that what probably needs to be done is that the color space needs to 
be rebuilt and reset on each image object prior to writing the image out to 
file, but I'm not entirely certain how to proceed with that.

Does anybody have any familiarity with the PDXObjectImage and its related APIs?

If someone can point me in the right direction, I don't mind doing the work of 
fixing this.

Mel

Dr. Mel Martinez
[email protected]



Reply via email to