[ https://issues.apache.org/jira/browse/PDFBOX-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr updated PDFBOX-1893: ------------------------------------ Attachment: jbig2test.pdf-1.png jbig2test.pdf The attached jbig2 file is without the jbig2 encoded part. > Refactor color spaces > --------------------- > > Key: PDFBOX-1893 > URL: https://issues.apache.org/jira/browse/PDFBOX-1893 > Project: PDFBox > Issue Type: Improvement > Components: Rendering > Affects Versions: 2.0.0 > Reporter: John Hewson > Assignee: John Hewson > Labels: color > Fix For: 2.0.0 > > Attachments: jbig2test.pdf, jbig2test.pdf-1.png > > > I'm currently working on this, so I wanted to open an issue to let everyone > know. > Color spaces need to be refactored in 2.0.0. Tilman noticed slowness in > PDFBOX-1851 due to using ICC profiles and calling ColorSpace#toRGB for every > pixel. For example, the file from PDFBOX-1851 went from rendering in 4 > seconds to taking over 60 seconds. > The solution is to use ColorConvertOp to convert an entire BufferedImage in > one go, taking advantage of AWT's native color management module. Color > conversions done this way are almost instantaneous, even for large images. > The current design of color spaces within PDFBox depends upon conversions > being done on a per-pixel basis, so a significant refactoring is needed in > order to convert images using ColorConvertOp without having to resort to > per-pixel calls in cases such as a Separation color space which uses a CMYK > alternate color space via a tint-transform. > The color space handling code is also tightly coupled to image handling. The > various classes which read images each have their own color handling code > which rely on per-pixel conversions. For this reason any color space > refactoring must also included a significant refactoring of image handling > code. This is an opportunity to refactor all color handling so that it is > encapsulated within the color space classes, allowing downstream users to > call toRGB(float[]) or toRGB(BufferedImage) and not need to worry about tint > transforms and the like. > =========== > Here's a summary of the changes: > - PDCcitt has been removed, its reading capability has moved to > CCITTFaxFilter and writing capability has moved to CCITTFactory. > - PDJpeg has been removed. JPEG reading is now done by new code in DCTFilter > which correctly handles CMYK/YCCK color. This fixes various files where > images appeared like negatives. JPEG writing is done by new code in > JPEGFactory. > - cleaned up JBIG2Filter > - cleaned up JPXFilter, in particular calling decode() caused the stream > dictionary to be updated, which was unsafe. I've also added a special > JPXColorSpace which wraps the embedded AWT color space of a JPX > BufferedImage, this replaces the need for the awkward mapping of ColorSpace > to PDColorSpace. > - Added better error messages for missing JAI plugins (JPX, JBIG2). A special > exception, MissingImageReaderException is now thrown. > - PDXObjectForm has been renamed to PDFormXObject to match the PDF spec. > - PDXObjectImage has been renamed in the same manner. > - PDInlinedImage has been renamed to PDInlineImage for the same reason. > - CCITTFaxDecodeFilter has been renamed to CCITTFaxFilter for consistency > with the other filters. > - ImageParameters has been removed, it was used to represent inline image > parameters which are now simply members of PDInlineImage. > - added PDColor which represents a color value, including patterns, it is > immutable for ease of use. > - removed PDColorState which was a container for both a color and a color > space, in almost every case it was used to represent a color and so has been > replaced by PDColor and occasionally PDColorSpace. > - moved most of the functionality of PDXObject into its subclasses > - rewrote almost all color handling code in all PDColorSpace subclasses, > including fixing the calculations for l*a*b, DeviceN, and indexed color > spaces. > - all color spaces now implement a toRGB(float[]) function for color > conversion, so external consumers of color spaces no longer have to know > about internals such as tint transforms. > - image color conversion is now performed in one operation, using > ColorConvertOp, rather than pixel-by-pixel, this speeds up ICC transforms by > many orders of magnitude. Color spaces now expose a special method > toImageRGB(Raster) for this purpose. This fixes some known performance issues > with certain files. > - updated Type1, Axial, Radial, and Gouraud shading contexts to call the new > toRGB functions. This is an interim measure, for better performance the color > conversion should instead be done using toImageRGB after the entire gradient > is drawn to the raster. > - creation of AWT Paint has been moved inside color spaces, hiding the > details from the caller. It is no longer possible to get an AWT Color from a > color space, only a Paint may be obtained. > - removed PDColorSpaceFactory and moved its functionality into PDColorSpace. > - moved some of the new shading and tiling pattern code to PDPattern so that > toPaint() is encapsulated in the color space. > - new PDImage interface which is implemented by both PDInlineImage and > PDImageXObject > - Image XObject image reading, masking and stencilling code has been > rewritten, resulting in the removal of CompositeImage. > - new SampledImageReader performs image reading for all formats, including > JPEG and CCITT. The format itself is simply a filter, as is the case in the > PDF spec. New image reading handles decode arrays, interpolation, and > conversion of all image types to efficient 8bpp rasters. This replaces > PDPixelMap as well as reading code from PDJpeg and PDCcitt. Handling of decod > arrays fixes various issues where images were inverted, especially inline > images in Type 3 fonts. > - removed SetNonStrokingICCBasedColor, SetNonStrokingIndexed, > SetNonStrokingPattern, SetNonStrokingSeparation, SetStrokingICCBasedColor, > SetStrokingIndexed, SetStrokingPattern, SetStrokingSeparation, and replaced > them with SetColor. -- This message was sent by Atlassian JIRA (v6.1.5#6160)