[ 
https://issues.apache.org/jira/browse/PDFBOX-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1893:
--------------------------------

    Description: 
I'm currently working on this, so I wanted to open an issue to let everyone 
know.

Color spaces need to be refactored in 2.0.0. Tilman noticed slowness in 
PDFBOX-1851 due to using ICC profiles and calling ColorSpace#toRGB for every 
pixel. For example, the file from PDFBOX-1851 went from rendering in 4 seconds 
to taking over 60 seconds.

The solution is to use ColorConvertOp to convert an entire BufferedImage in one 
go, taking advantage of AWT's native color management module. Color conversions 
done this way are almost instantaneous, even for large images.

The current design of color spaces within PDFBox depends upon conversions being 
done on a per-pixel basis, so a significant refactoring is needed in order to 
convert images using ColorConvertOp without having to resort to per-pixel calls 
in cases such as a Separation color space which uses a CMYK alternate color 
space via a tint-transform.

The color space handling code is also tightly coupled to image handling. The 
various classes which read images each have their own color handling code which 
rely on per-pixel conversions. For this reason any color space refactoring must 
also included a significant refactoring of image handling code. This is an 
opportunity to refactor all color handling so that it is encapsulated within 
the color space classes, allowing downstream users to call toRGB(float[]) or 
toRGB(BufferedImage) and not need to worry about tint transforms and the like.

===========

Here's a summary of the changes:

- PDCcitt has been removed, its reading capability has moved to CCITTFaxFilter 
and writing capability has moved to CCITTFactory.

- PDJpeg has been removed. JPEG reading is now done by new code in DCTFilter 
which correctly handles CMYK/YCCK color. This fixes various files where images 
appeared like negatives. JPEG writing is done by new code in JPEGFactory.

- cleaned up JBIG2Filter

- cleaned up JPXFilter, in particular calling decode() caused the stream 
dictionary to be updated, which was unsafe. I've also added a special 
JPXColorSpace which wraps the embedded AWT color space of a JPX BufferedImage, 
this replaces the need for the awkward mapping of ColorSpace to PDColorSpace.

- Added better error messages for missing JAI plugins (JPX, JBIG2). A special 
exception, MissingImageReaderException is now thrown.

- PDXObjectForm has been renamed to PDFormXObject to match the PDF spec.
- PDXObjectImage has been renamed in the same manner.
- PDInlinedImage has been renamed to PDInlineImage for the same reason.
- CCITTFaxDecodeFilter has been renamed to CCITTFaxFilter for consistency with 
the other filters.

- ImageParameters has been removed, it was used to represent inline image 
parameters which are now simply members of PDInlineImage.

- added PDColor which represents a color value, including patterns, it is 
immutable for ease of use.

- removed PDColorState which was a container for both a color and a color 
space, in almost every case it was used to represent a color and so has been 
replaced by PDColor and occasionally PDColorSpace.

- moved most of the functionality of PDXObject into its subclasses

- rewrote almost all color handling code in all PDColorSpace subclasses, 
including fixing the calculations for l*a*b, DeviceN, and indexed color spaces. 

- all color spaces now implement a toRGB(float[]) function for color 
conversion, so external consumers of color spaces no longer have to know about 
internals such as tint transforms.

- image color conversion is now performed in one operation, using 
ColorConvertOp, rather than pixel-by-pixel, this speeds up ICC transforms by 
many orders of magnitude. Color spaces now expose a special method 
toImageRGB(Raster) for this purpose. This fixes some known performance issues 
with certain files.

- updated Type1, Axial, Radial, and Gouraud shading contexts to call the new 
toRGB functions. This is an interim measure, for better performance the color 
conversion should instead be done using toImageRGB after the entire gradient is 
drawn to the raster.

- creation of AWT Paint has been moved inside color spaces, hiding the details 
from the caller. It is no longer possible to get an AWT Color from a color 
space, only a Paint may be obtained.

- removed PDColorSpaceFactory and moved its functionality into PDColorSpace.

- moved some of the new shading and tiling pattern code to PDPattern so that 
toPaint() is encapsulated in the color space.

- new PDImage interface which is implemented by both PDInlineImage and 
PDImageXObject

- Image XObject image reading, masking  and stencilling code has been 
rewritten, resulting in the removal of CompositeImage.

- new SampledImageReader performs image reading for all formats, including JPEG 
and CCITT. The format itself is simply a filter, as is the case in the PDF 
spec. New image reading handles decode arrays, interpolation, and conversion of 
all image types to efficient 8bpp rasters. This replaces PDPixelMap as well as 
reading code from PDJpeg and PDCcitt. Handling of decod arrays fixes various 
issues where images were inverted, especially inline images in Type 3 fonts.

- removed SetNonStrokingICCBasedColor, SetNonStrokingIndexed, 
SetNonStrokingPattern, SetNonStrokingSeparation, SetStrokingICCBasedColor, 
SetStrokingIndexed, SetStrokingPattern, SetStrokingSeparation, and replaced 
them with SetColor.


  was:
I'm currently working on this, so I wanted to open an issue to let everyone 
know.

Color spaces need to be refactored in 2.0.0. Tilman noticed slowness in 
PDFBOX-1851 due to using ICC profiles and calling ColorSpace#toRGB for every 
pixel. For example, the file from PDFBOX-1851 went from rendering in 4 seconds 
to taking over 60 seconds.

The solution is to use ColorConvertOp to convert an entire BufferedImage in one 
go, taking advantage of AWT's native color management module. Color conversions 
done this way are almost instantaneous, even for large images.

The current design of color spaces within PDFBox depends upon conversions being 
done on a per-pixel basis, so a significant refactoring is needed in order to 
convert images using ColorConvertOp without having to resort to per-pixel calls 
in cases such as a Separation color space which uses a CMYK alternate color 
space via a tint-transform.

The color space handling code is also tightly coupled to image handling. The 
various classes which read images each have their own color handling code which 
rely on per-pixel conversions. For this reason any color space refactoring must 
also included a significant refactoring of image handling code. This is an 
opportunity to refactor all color handling so that it is encapsulated within 
the color space classes, allowing downstream users to call toRGB(float[]) or 
toRGB(BufferedImage) and not need to worry about tint transforms and the like.


> Refactor color spaces
> ---------------------
>
>                 Key: PDFBOX-1893
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1893
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Rendering
>    Affects Versions: 2.0.0
>            Reporter: John Hewson
>            Assignee: John Hewson
>              Labels: color
>             Fix For: 2.0.0
>
>
> I'm currently working on this, so I wanted to open an issue to let everyone 
> know.
> Color spaces need to be refactored in 2.0.0. Tilman noticed slowness in 
> PDFBOX-1851 due to using ICC profiles and calling ColorSpace#toRGB for every 
> pixel. For example, the file from PDFBOX-1851 went from rendering in 4 
> seconds to taking over 60 seconds.
> The solution is to use ColorConvertOp to convert an entire BufferedImage in 
> one go, taking advantage of AWT's native color management module. Color 
> conversions done this way are almost instantaneous, even for large images.
> The current design of color spaces within PDFBox depends upon conversions 
> being done on a per-pixel basis, so a significant refactoring is needed in 
> order to convert images using ColorConvertOp without having to resort to 
> per-pixel calls in cases such as a Separation color space which uses a CMYK 
> alternate color space via a tint-transform.
> The color space handling code is also tightly coupled to image handling. The 
> various classes which read images each have their own color handling code 
> which rely on per-pixel conversions. For this reason any color space 
> refactoring must also included a significant refactoring of image handling 
> code. This is an opportunity to refactor all color handling so that it is 
> encapsulated within the color space classes, allowing downstream users to 
> call toRGB(float[]) or toRGB(BufferedImage) and not need to worry about tint 
> transforms and the like.
> ===========
> Here's a summary of the changes:
> - PDCcitt has been removed, its reading capability has moved to 
> CCITTFaxFilter and writing capability has moved to CCITTFactory.
> - PDJpeg has been removed. JPEG reading is now done by new code in DCTFilter 
> which correctly handles CMYK/YCCK color. This fixes various files where 
> images appeared like negatives. JPEG writing is done by new code in 
> JPEGFactory.
> - cleaned up JBIG2Filter
> - cleaned up JPXFilter, in particular calling decode() caused the stream 
> dictionary to be updated, which was unsafe. I've also added a special 
> JPXColorSpace which wraps the embedded AWT color space of a JPX 
> BufferedImage, this replaces the need for the awkward mapping of ColorSpace 
> to PDColorSpace.
> - Added better error messages for missing JAI plugins (JPX, JBIG2). A special 
> exception, MissingImageReaderException is now thrown.
> - PDXObjectForm has been renamed to PDFormXObject to match the PDF spec.
> - PDXObjectImage has been renamed in the same manner.
> - PDInlinedImage has been renamed to PDInlineImage for the same reason.
> - CCITTFaxDecodeFilter has been renamed to CCITTFaxFilter for consistency 
> with the other filters.
> - ImageParameters has been removed, it was used to represent inline image 
> parameters which are now simply members of PDInlineImage.
> - added PDColor which represents a color value, including patterns, it is 
> immutable for ease of use.
> - removed PDColorState which was a container for both a color and a color 
> space, in almost every case it was used to represent a color and so has been 
> replaced by PDColor and occasionally PDColorSpace.
> - moved most of the functionality of PDXObject into its subclasses
> - rewrote almost all color handling code in all PDColorSpace subclasses, 
> including fixing the calculations for l*a*b, DeviceN, and indexed color 
> spaces. 
> - all color spaces now implement a toRGB(float[]) function for color 
> conversion, so external consumers of color spaces no longer have to know 
> about internals such as tint transforms.
> - image color conversion is now performed in one operation, using 
> ColorConvertOp, rather than pixel-by-pixel, this speeds up ICC transforms by 
> many orders of magnitude. Color spaces now expose a special method 
> toImageRGB(Raster) for this purpose. This fixes some known performance issues 
> with certain files.
> - updated Type1, Axial, Radial, and Gouraud shading contexts to call the new 
> toRGB functions. This is an interim measure, for better performance the color 
> conversion should instead be done using toImageRGB after the entire gradient 
> is drawn to the raster.
> - creation of AWT Paint has been moved inside color spaces, hiding the 
> details from the caller. It is no longer possible to get an AWT Color from a 
> color space, only a Paint may be obtained.
> - removed PDColorSpaceFactory and moved its functionality into PDColorSpace.
> - moved some of the new shading and tiling pattern code to PDPattern so that 
> toPaint() is encapsulated in the color space.
> - new PDImage interface which is implemented by both PDInlineImage and 
> PDImageXObject
> - Image XObject image reading, masking  and stencilling code has been 
> rewritten, resulting in the removal of CompositeImage.
> - new SampledImageReader performs image reading for all formats, including 
> JPEG and CCITT. The format itself is simply a filter, as is the case in the 
> PDF spec. New image reading handles decode arrays, interpolation, and 
> conversion of all image types to efficient 8bpp rasters. This replaces 
> PDPixelMap as well as reading code from PDJpeg and PDCcitt. Handling of decod 
> arrays fixes various issues where images were inverted, especially inline 
> images in Type 3 fonts.
> - removed SetNonStrokingICCBasedColor, SetNonStrokingIndexed, 
> SetNonStrokingPattern, SetNonStrokingSeparation, SetStrokingICCBasedColor, 
> SetStrokingIndexed, SetStrokingPattern, SetStrokingSeparation, and replaced 
> them with SetColor.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to