Thank you for all your work. I haven't had the time to test much, but
the few things I tried looked very promising. I did already notice one
"bad" file that is now rendered properly.
I suggest that you post the (very good) text below in the JIRA issue.
Tilman
Am 20.02.2014 11:17, schrieb John Hewson:
Hi All
I have just committed a significant refactoring of color spaces to trunk. The
main purpose of the change is to encapsulate all color space handling code
within PDColorSpace and its subclasses. Until now there was color handling code
in many different places, including separate code for each image format. Due to
the close link between images, color, and performance it has been necessary to
rewrite much of the image reading code.
Here's a summary of the changes:
- PDCcitt has been removed, its reading capability has moved to CCITTFaxFilter
and writing capability has moved to CCITTFactory.
- PDJpeg has been removed. JPEG reading is now done by new code in DCTFilter
which correctly handles CMYK/YCCK color. This fixes various files where images
appeared like negatives. JPEG writing is done by new code in JPEGFactory.
- cleaned up JBIG2Filter
- cleaned up JPXFilter, in particular calling decode() caused the stream
dictionary to be updated, which was unsafe. I've also added a special
JPXColorSpace which wraps the embedded AWT color space of a JPX BufferedImage,
this replaces the need for the awkward mapping of ColorSpace to PDColorSpace.
- Added better error messages for missing JAI plugins (JPX, JBIG2). A special
exception, MissingImageReaderException is now thrown.
- PDXObjectForm has been renamed to PDFormXObject to match the PDF spec.
- PDXObjectImage has been renamed in the same manner.
- PDInlinedImage has been renamed to PDInlineImage for the same reason.
- CCITTFaxDecodeFilter has been renamed to CCITTFaxFilter for consistency with
the other filters.
- ImageParameters has been removed, it was used to represent inline image
parameters which are now simply members of PDInlineImage.
- added PDColor which represents a color value, including patterns, it is
immutable for ease of use.
- removed PDColorState which was a container for both a color and a color
space, in almost every case it was used to represent a color and so has been
replaced by PDColor and occasionally PDColorSpace.
- moved most of the functionality of PDXObject into its subclasses
- rewrote almost all color handling code in all PDColorSpace subclasses,
including fixing the calculations for l*a*b, DeviceN, and indexed color spaces.
- all color spaces now implement a toRGB(float[]) function for color
conversion, so external consumers of color spaces no longer have to know about
internals such as tint transforms.
- image color conversion is now performed in one operation, using
ColorConvertOp, rather than pixel-by-pixel, this speeds up ICC transforms by
many orders of magnitude. Color spaces now expose a special method
toImageRGB(Raster) for this purpose. This fixes some known performance issues
with certain files.
- updated Type1, Axial, Radial, and Gouraud shading contexts to call the new
toRGB functions. This is an interim measure, for better performance the color
conversion should instead be done using toImageRGB after the entire gradient is
drawn to the raster.
- creation of AWT Paint has been moved inside color spaces, hiding the details
from the caller. It is no longer possible to get an AWT Color from a color
space, only a Paint may be obtained.
- removed PDColorSpaceFactory and moved its functionality into PDColorSpace.
- moved some of the new shading and tiling pattern code to PDPattern so that
toPaint() is encapsulated in the color space.
- new PDImage interface which is implemented by both PDInlineImage and
PDImageXObject
- Image XObject image reading, masking and stencilling code has been
rewritten, resulting in the removal of CompositeImage.
- new SampledImageReader performs image reading for all formats, including JPEG
and CCITT. The format itself is simply a filter, as is the case in the PDF
spec. New image reading handles decode arrays, interpolation, and conversion of
all image types to efficient 8bpp rasters. This replaces PDPixelMap as well as
reading code from PDJpeg and PDCcitt. Handling of decod arrays fixes various
issues where images were inverted, especially inline images in Type 3 fonts.
- removed SetNonStrokingICCBasedColor, SetNonStrokingIndexed,
SetNonStrokingPattern, SetNonStrokingSeparation, SetStrokingICCBasedColor,
SetStrokingIndexed, SetStrokingPattern, SetStrokingSeparation, and replaced
them with SetColor.
There will no doubt be some regressions, please post a comment on PDFBOX-1893
to let me know.
Thanks
-- John