[jira] [Commented] (PDFBOX-1996) PDSeparation optimization
[ https://issues.apache.org/jira/browse/PDFBOX-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13945463#comment-13945463 ] Tilman Hausherr commented on PDFBOX-1996: - While I'm not the one who will commit your patch (I don't know enough of that topic), do you have a non-confidential PDF that would use your patch, so that we can see that the result is the same before and after? > PDSeparation optimization > - > > Key: PDFBOX-1996 > URL: https://issues.apache.org/jira/browse/PDFBOX-1996 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.0 >Reporter: Dave Smith > Attachments: pdfbox.patch > > > I have a 4 page black and white pdf that takes 32 seconds (8 seconds a page) > to render. It uses a Separation color space and it has to run numerous > functions per pixel that is causing the slow down. I have a patch where I pre > calculate the black and white pixels and cache them instead of calculating > them every time. This optimization gets the page rendering down to less than > a second a page. I will attach my patch. I could see going forward caching > all calculated colours , but floats in hash maps are tricky. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1996) PDSeparation optimization
[ https://issues.apache.org/jira/browse/PDFBOX-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13945852#comment-13945852 ] John Hewson commented on PDFBOX-1996: - Which type of function does your PDF use for the tint transform? (i.e. which subclass of PDFunction is used?). It might be possible to speed up the underlying function instead so that RGB images will be faster too. > PDSeparation optimization > - > > Key: PDFBOX-1996 > URL: https://issues.apache.org/jira/browse/PDFBOX-1996 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.0 >Reporter: Dave Smith > Attachments: pdfbox.patch > > > I have a 4 page black and white pdf that takes 32 seconds (8 seconds a page) > to render. It uses a Separation color space and it has to run numerous > functions per pixel that is causing the slow down. I have a patch where I pre > calculate the black and white pixels and cache them instead of calculating > them every time. This optimization gets the page rendering down to less than > a second a page. I will attach my patch. I could see going forward caching > all calculated colours , but floats in hash maps are tricky. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1996) PDSeparation optimization
[ https://issues.apache.org/jira/browse/PDFBOX-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946020#comment-13946020 ] Dave Smith commented on PDFBOX-1996: The pdf is not public. I can send it to you off list. My first thought was to optimize the function, however there is more than one. dup, 0, mul, exch, dup, 0, mul, exch, dup, 0, mul, exch, 1, mul > PDSeparation optimization > - > > Key: PDFBOX-1996 > URL: https://issues.apache.org/jira/browse/PDFBOX-1996 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.0 >Reporter: Dave Smith >Priority: Minor > Attachments: pdfbox.patch > > > I have a 4 page black and white pdf that takes 32 seconds (8 seconds a page) > to render. It uses a Separation color space and it has to run numerous > functions per pixel that is causing the slow down. I have a patch where I pre > calculate the black and white pixels and cache them instead of calculating > them every time. This optimization gets the page rendering down to less than > a second a page. I will attach my patch. I could see going forward caching > all calculated colours , but floats in hash maps are tricky. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1996) PDSeparation optimization
[ https://issues.apache.org/jira/browse/PDFBOX-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947162#comment-13947162 ] Tilman Hausherr commented on PDFBOX-1996: - What would happen if an image has e.g. 300dpi (about 3500 x 2500 pixels) and many different colors? Wouldn't this make a huge memory footprint? Or is the Separation color space not used for such images? > PDSeparation optimization > - > > Key: PDFBOX-1996 > URL: https://issues.apache.org/jira/browse/PDFBOX-1996 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.0 >Reporter: Dave Smith >Priority: Minor > Attachments: pdfbox.patch, pdfbox.patch > > > I have a 4 page black and white pdf that takes 32 seconds (8 seconds a page) > to render. It uses a Separation color space and it has to run numerous > functions per pixel that is causing the slow down. I have a patch where I pre > calculate the black and white pixels and cache them instead of calculating > them every time. This optimization gets the page rendering down to less than > a second a page. I will attach my patch. I could see going forward caching > all calculated colours , but floats in hash maps are tricky. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1996) PDSeparation optimization
[ https://issues.apache.org/jira/browse/PDFBOX-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947408#comment-13947408 ] Dave Smith commented on PDFBOX-1996: I would check with John who wrote it but since we are sampling and only taking one float and dividing by 255 to get a value of 0..1 I would think we can only have 255 different values. > PDSeparation optimization > - > > Key: PDFBOX-1996 > URL: https://issues.apache.org/jira/browse/PDFBOX-1996 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.0 >Reporter: Dave Smith >Priority: Minor > Attachments: pdfbox.patch, pdfbox.patch > > > I have a 4 page black and white pdf that takes 32 seconds (8 seconds a page) > to render. It uses a Separation color space and it has to run numerous > functions per pixel that is causing the slow down. I have a patch where I pre > calculate the black and white pixels and cache them instead of calculating > them every time. This optimization gets the page rendering down to less than > a second a page. I will attach my patch. I could see going forward caching > all calculated colours , but floats in hash maps are tricky. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1996) PDSeparation optimization
[ https://issues.apache.org/jira/browse/PDFBOX-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959039#comment-13959039 ] John Hewson commented on PDFBOX-1996: - There are only 256 values (including zero) and separations are always single-color so this can be safely cached. Storing a 1-element int[] in a HashMap doesn't seem like the right choice though, as an array is an object with at least a pointer and a length to store, so this is going to have more memory overhead than storing just a boxed Integer. As a rough estimate, a HashMap is going to need at least 4 bytes for the Integer object pointer and 4 bytes for its int value. For the int[] there will be at least 4 bytes for the object pointer (arrays are objects) plus 4 bytes for the single int value, plus 4 bytes for the array's length. So we're looking at maybe 20 bytes per entry, around 5KB if the cache is full (not bad). It's also going to spend time doing memory allocations and computing hashes. A 256-element byte array would a fixed overhead of just 256 + 8 bytes and benefit from not having to do hash computations to perform a lookup. > PDSeparation optimization > - > > Key: PDFBOX-1996 > URL: https://issues.apache.org/jira/browse/PDFBOX-1996 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.0 >Reporter: Dave Smith >Priority: Minor > Attachments: pdfbox.patch, pdfbox.patch > > > I have a 4 page black and white pdf that takes 32 seconds (8 seconds a page) > to render. It uses a Separation color space and it has to run numerous > functions per pixel that is causing the slow down. I have a patch where I pre > calculate the black and white pixels and cache them instead of calculating > them every time. This optimization gets the page rendering down to less than > a second a page. I will attach my patch. I could see going forward caching > all calculated colours , but floats in hash maps are tricky. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1996) PDSeparation optimization
[ https://issues.apache.org/jira/browse/PDFBOX-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961448#comment-13961448 ] Tilman Hausherr commented on PDFBOX-1996: - I don't get it - for the sample file, the result is not a 1 element int[] but a 4 element int[] that would be the value of the map. Btw the speed improvement is amazing, the first three pages are rendered in 20secs with the existing method, and slightly < 2secs with the hash map. Using Float as the key instead of that weird "Float.floatToIntBits" trick makes it slower, now the time is about 3 secs. > PDSeparation optimization > - > > Key: PDFBOX-1996 > URL: https://issues.apache.org/jira/browse/PDFBOX-1996 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.0 >Reporter: Dave Smith >Priority: Minor > Attachments: pdfbox.patch, pdfbox.patch > > > I have a 4 page black and white pdf that takes 32 seconds (8 seconds a page) > to render. It uses a Separation color space and it has to run numerous > functions per pixel that is causing the slow down. I have a patch where I pre > calculate the black and white pixels and cache them instead of calculating > them every time. This optimization gets the page rendering down to less than > a second a page. I will attach my patch. I could see going forward caching > all calculated colours , but floats in hash maps are tricky. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1996) PDSeparation optimization
[ https://issues.apache.org/jira/browse/PDFBOX-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961465#comment-13961465 ] John Hewson commented on PDFBOX-1996: - You're quite right, I was looking at the input value instead of the output value. The output is indeed an int[], so ignore the 1-element part of my comment above. The rest still applies, except instead of 1 1-dimensional byte array an n-dimensional array could be used instead. However, as 5KB is still tiny I see no problem with just using a HashMap. > PDSeparation optimization > - > > Key: PDFBOX-1996 > URL: https://issues.apache.org/jira/browse/PDFBOX-1996 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.0 >Reporter: Dave Smith >Priority: Minor > Attachments: pdfbox.patch, pdfbox.patch > > > I have a 4 page black and white pdf that takes 32 seconds (8 seconds a page) > to render. It uses a Separation color space and it has to run numerous > functions per pixel that is causing the slow down. I have a patch where I pre > calculate the black and white pixels and cache them instead of calculating > them every time. This optimization gets the page rendering down to less than > a second a page. I will attach my patch. I could see going forward caching > all calculated colours , but floats in hash maps are tricky. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1996) PDSeparation optimization
[ https://issues.apache.org/jira/browse/PDFBOX-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961532#comment-13961532 ] John Hewson commented on PDFBOX-1996: - +1 > PDSeparation optimization > - > > Key: PDFBOX-1996 > URL: https://issues.apache.org/jira/browse/PDFBOX-1996 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.0 >Reporter: Dave Smith >Priority: Minor > Attachments: pdfbox.patch, pdfbox.patch > > > I have a 4 page black and white pdf that takes 32 seconds (8 seconds a page) > to render. It uses a Separation color space and it has to run numerous > functions per pixel that is causing the slow down. I have a patch where I pre > calculate the black and white pixels and cache them instead of calculating > them every time. This optimization gets the page rendering down to less than > a second a page. I will attach my patch. I could see going forward caching > all calculated colours , but floats in hash maps are tricky. -- This message was sent by Atlassian JIRA (v6.2#6252)