[
https://issues.apache.org/jira/browse/PDFBOX-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
John Hewson updated PDFBOX-1893:
--------------------------------
Description:
I'm currently working on this, so I wanted to open an issue to let everyone
know.
Color spaces need to be refactored in 2.0.0. Tilman noticed slowness in
PDFBOX-1851 due to using ICC profiles and calling ColorSpace#toRGB for every
pixel. For example, the file from PDFBOX-1851 went from rendering in 4 seconds
to taking over 60 seconds.
The solution is to use ColorConvertOp to convert an entire BufferedImage in one
go, taking advantage of AWT's native color management module. Color conversions
done this way are almost instantaneous, even for large images.
The current design of color spaces within PDFBox depends upon conversions being
done on a per-pixel basis, so a significant refactoring is needed in order to
convert images using ColorConvertOp without having to resort to per-pixel calls
in cases such as a Separation color space which uses a CMYK alternate color
space via a tint-transform.
The color space handling code is also tightly coupled to image handling. The
various classes which read images each have their own color handling code which
rely on per-pixel conversions. For this reason any color space refactoring must
also included a significant refactoring of image handling code. This is an
opportunity to refactor all color handling so that it is encapsulated within
the color space classes, allowing downstream users to call toRGB(float[]) or
toRGB(BufferedImage) and not need to worry about tint transforms and the like.
===========
Here's a summary of the changes:
- PDCcitt has been removed, its reading capability has moved to CCITTFaxFilter
and writing capability has moved to CCITTFactory.
- PDJpeg has been removed. JPEG reading is now done by new code in DCTFilter
which correctly handles CMYK/YCCK color. This fixes various files where images
appeared like negatives. JPEG writing is done by new code in JPEGFactory.
- cleaned up JBIG2Filter
- cleaned up JPXFilter, in particular calling decode() caused the stream
dictionary to be updated, which was unsafe. I've also added a special
JPXColorSpace which wraps the embedded AWT color space of a JPX BufferedImage,
this replaces the need for the awkward mapping of ColorSpace to PDColorSpace.
- Added better error messages for missing JAI plugins (JPX, JBIG2). A special
exception, MissingImageReaderException is now thrown.
- PDXObjectForm has been renamed to PDFormXObject to match the PDF spec.
- PDXObjectImage has been renamed in the same manner.
- PDInlinedImage has been renamed to PDInlineImage for the same reason.
- CCITTFaxDecodeFilter has been renamed to CCITTFaxFilter for consistency with
the other filters.
- ImageParameters has been removed, it was used to represent inline image
parameters which are now simply members of PDInlineImage.
- added PDColor which represents a color value, including patterns, it is
immutable for ease of use.
- removed PDColorState which was a container for both a color and a color
space, in almost every case it was used to represent a color and so has been
replaced by PDColor and occasionally PDColorSpace.
- moved most of the functionality of PDXObject into its subclasses
- rewrote almost all color handling code in all PDColorSpace subclasses,
including fixing the calculations for l*a*b, DeviceN, and indexed color spaces.
- all color spaces now implement a toRGB(float[]) function for color
conversion, so external consumers of color spaces no longer have to know about
internals such as tint transforms.
- image color conversion is now performed in one operation, using
ColorConvertOp, rather than pixel-by-pixel, this speeds up ICC transforms by
many orders of magnitude. Color spaces now expose a special method
toImageRGB(Raster) for this purpose. This fixes some known performance issues
with certain files.
- updated Type1, Axial, Radial, and Gouraud shading contexts to call the new
toRGB functions. This is an interim measure, for better performance the color
conversion should instead be done using toImageRGB after the entire gradient is
drawn to the raster.
- creation of AWT Paint has been moved inside color spaces, hiding the details
from the caller. It is no longer possible to get an AWT Color from a color
space, only a Paint may be obtained.
- removed PDColorSpaceFactory and moved its functionality into PDColorSpace.
- moved some of the new shading and tiling pattern code to PDPattern so that
toPaint() is encapsulated in the color space.
- new PDImage interface which is implemented by both PDInlineImage and
PDImageXObject
- Image XObject image reading, masking and stencilling code has been
rewritten, resulting in the removal of CompositeImage.
- new SampledImageReader performs image reading for all formats, including JPEG
and CCITT. The format itself is simply a filter, as is the case in the PDF
spec. New image reading handles decode arrays, interpolation, and conversion of
all image types to efficient 8bpp rasters. This replaces PDPixelMap as well as
reading code from PDJpeg and PDCcitt. Handling of decod arrays fixes various
issues where images were inverted, especially inline images in Type 3 fonts.
- removed SetNonStrokingICCBasedColor, SetNonStrokingIndexed,
SetNonStrokingPattern, SetNonStrokingSeparation, SetStrokingICCBasedColor,
SetStrokingIndexed, SetStrokingPattern, SetStrokingSeparation, and replaced
them with SetColor.
was:
I'm currently working on this, so I wanted to open an issue to let everyone
know.
Color spaces need to be refactored in 2.0.0. Tilman noticed slowness in
PDFBOX-1851 due to using ICC profiles and calling ColorSpace#toRGB for every
pixel. For example, the file from PDFBOX-1851 went from rendering in 4 seconds
to taking over 60 seconds.
The solution is to use ColorConvertOp to convert an entire BufferedImage in one
go, taking advantage of AWT's native color management module. Color conversions
done this way are almost instantaneous, even for large images.
The current design of color spaces within PDFBox depends upon conversions being
done on a per-pixel basis, so a significant refactoring is needed in order to
convert images using ColorConvertOp without having to resort to per-pixel calls
in cases such as a Separation color space which uses a CMYK alternate color
space via a tint-transform.
The color space handling code is also tightly coupled to image handling. The
various classes which read images each have their own color handling code which
rely on per-pixel conversions. For this reason any color space refactoring must
also included a significant refactoring of image handling code. This is an
opportunity to refactor all color handling so that it is encapsulated within
the color space classes, allowing downstream users to call toRGB(float[]) or
toRGB(BufferedImage) and not need to worry about tint transforms and the like.
> Refactor color spaces
> ---------------------
>
> Key: PDFBOX-1893
> URL: https://issues.apache.org/jira/browse/PDFBOX-1893
> Project: PDFBox
> Issue Type: Improvement
> Components: Rendering
> Affects Versions: 2.0.0
> Reporter: John Hewson
> Assignee: John Hewson
> Labels: color
> Fix For: 2.0.0
>
>
> I'm currently working on this, so I wanted to open an issue to let everyone
> know.
> Color spaces need to be refactored in 2.0.0. Tilman noticed slowness in
> PDFBOX-1851 due to using ICC profiles and calling ColorSpace#toRGB for every
> pixel. For example, the file from PDFBOX-1851 went from rendering in 4
> seconds to taking over 60 seconds.
> The solution is to use ColorConvertOp to convert an entire BufferedImage in
> one go, taking advantage of AWT's native color management module. Color
> conversions done this way are almost instantaneous, even for large images.
> The current design of color spaces within PDFBox depends upon conversions
> being done on a per-pixel basis, so a significant refactoring is needed in
> order to convert images using ColorConvertOp without having to resort to
> per-pixel calls in cases such as a Separation color space which uses a CMYK
> alternate color space via a tint-transform.
> The color space handling code is also tightly coupled to image handling. The
> various classes which read images each have their own color handling code
> which rely on per-pixel conversions. For this reason any color space
> refactoring must also included a significant refactoring of image handling
> code. This is an opportunity to refactor all color handling so that it is
> encapsulated within the color space classes, allowing downstream users to
> call toRGB(float[]) or toRGB(BufferedImage) and not need to worry about tint
> transforms and the like.
> ===========
> Here's a summary of the changes:
> - PDCcitt has been removed, its reading capability has moved to
> CCITTFaxFilter and writing capability has moved to CCITTFactory.
> - PDJpeg has been removed. JPEG reading is now done by new code in DCTFilter
> which correctly handles CMYK/YCCK color. This fixes various files where
> images appeared like negatives. JPEG writing is done by new code in
> JPEGFactory.
> - cleaned up JBIG2Filter
> - cleaned up JPXFilter, in particular calling decode() caused the stream
> dictionary to be updated, which was unsafe. I've also added a special
> JPXColorSpace which wraps the embedded AWT color space of a JPX
> BufferedImage, this replaces the need for the awkward mapping of ColorSpace
> to PDColorSpace.
> - Added better error messages for missing JAI plugins (JPX, JBIG2). A special
> exception, MissingImageReaderException is now thrown.
> - PDXObjectForm has been renamed to PDFormXObject to match the PDF spec.
> - PDXObjectImage has been renamed in the same manner.
> - PDInlinedImage has been renamed to PDInlineImage for the same reason.
> - CCITTFaxDecodeFilter has been renamed to CCITTFaxFilter for consistency
> with the other filters.
> - ImageParameters has been removed, it was used to represent inline image
> parameters which are now simply members of PDInlineImage.
> - added PDColor which represents a color value, including patterns, it is
> immutable for ease of use.
> - removed PDColorState which was a container for both a color and a color
> space, in almost every case it was used to represent a color and so has been
> replaced by PDColor and occasionally PDColorSpace.
> - moved most of the functionality of PDXObject into its subclasses
> - rewrote almost all color handling code in all PDColorSpace subclasses,
> including fixing the calculations for l*a*b, DeviceN, and indexed color
> spaces.
> - all color spaces now implement a toRGB(float[]) function for color
> conversion, so external consumers of color spaces no longer have to know
> about internals such as tint transforms.
> - image color conversion is now performed in one operation, using
> ColorConvertOp, rather than pixel-by-pixel, this speeds up ICC transforms by
> many orders of magnitude. Color spaces now expose a special method
> toImageRGB(Raster) for this purpose. This fixes some known performance issues
> with certain files.
> - updated Type1, Axial, Radial, and Gouraud shading contexts to call the new
> toRGB functions. This is an interim measure, for better performance the color
> conversion should instead be done using toImageRGB after the entire gradient
> is drawn to the raster.
> - creation of AWT Paint has been moved inside color spaces, hiding the
> details from the caller. It is no longer possible to get an AWT Color from a
> color space, only a Paint may be obtained.
> - removed PDColorSpaceFactory and moved its functionality into PDColorSpace.
> - moved some of the new shading and tiling pattern code to PDPattern so that
> toPaint() is encapsulated in the color space.
> - new PDImage interface which is implemented by both PDInlineImage and
> PDImageXObject
> - Image XObject image reading, masking and stencilling code has been
> rewritten, resulting in the removal of CompositeImage.
> - new SampledImageReader performs image reading for all formats, including
> JPEG and CCITT. The format itself is simply a filter, as is the case in the
> PDF spec. New image reading handles decode arrays, interpolation, and
> conversion of all image types to efficient 8bpp rasters. This replaces
> PDPixelMap as well as reading code from PDJpeg and PDCcitt. Handling of decod
> arrays fixes various issues where images were inverted, especially inline
> images in Type 3 fonts.
> - removed SetNonStrokingICCBasedColor, SetNonStrokingIndexed,
> SetNonStrokingPattern, SetNonStrokingSeparation, SetStrokingICCBasedColor,
> SetStrokingIndexed, SetStrokingPattern, SetStrokingSeparation, and replaced
> them with SetColor.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)