Re: Extracting Images

Daniel Wilson Tue, 15 Sep 2009 16:33:33 -0700

I've done battle with the PDXObjectImage, but it has usually defeated me!
Sections 4.7 and 4.8 of the PDF spec address it.


Daniel

On Tue, Sep 15, 2009 at 6:01 PM, Martinez, Mel <[email protected]>wrote:

> I've been playing with extracting images.
>
> I've found a few 'wierdnesses' (I know, that's not a real word) in the
> org.apache.pdfbox.ExtractText class and If I can clear some time, I'll try
> to submit something on that.
>
> Ignoring the 'wierdnesses' (which have more to do with options parsing and
> filenaming), it does successfully extract images to separate files.
>
> However, the color table is apparently not being handled properly.
>
> All the images end up displaying with the default Windows palette, which
> tells me that they probably are missing their own.
>
> I assume that what probably needs to be done is that the color space needs
> to be rebuilt and reset on each image object prior to writing the image out
> to file, but I'm not entirely certain how to proceed with that.
>
> Does anybody have any familiarity with the PDXObjectImage and its related
> APIs?
>
> If someone can point me in the right direction, I don't mind doing the work
> of fixing this.
>
> Mel
>
> Dr. Mel Martinez
> [email protected]
>
>
>
>

Re: Extracting Images

Reply via email to