[jira] [Commented] (PDFBOX-1996) PDSeparation optimization

2014-03-24 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13945463#comment-13945463
 ] 

Tilman Hausherr commented on PDFBOX-1996:
-

While I'm not the one who will commit your patch (I don't know enough of that 
topic), do you have a non-confidential PDF that would use your patch, so that 
we can see that the result is the same before and after?

> PDSeparation optimization
> -
>
> Key: PDFBOX-1996
> URL: https://issues.apache.org/jira/browse/PDFBOX-1996
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Dave Smith
> Attachments: pdfbox.patch
>
>
> I have a 4 page black and white pdf that takes 32 seconds (8 seconds a page) 
> to render. It uses a Separation color space and it has to run numerous 
> functions per pixel that is causing the slow down. I have a patch where I pre 
> calculate the black and white pixels and cache them instead of calculating 
> them every time. This optimization gets the page rendering down to less than 
> a second a page. I will attach my patch. I could see going forward caching 
> all calculated colours , but floats in hash maps are tricky.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1996) PDSeparation optimization

2014-03-24 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13945852#comment-13945852
 ] 

John Hewson commented on PDFBOX-1996:
-

Which type of function does your PDF use for the tint transform? (i.e. which 
subclass of PDFunction is used?). It might be possible to speed up the 
underlying function instead so that RGB images will be faster too.

> PDSeparation optimization
> -
>
> Key: PDFBOX-1996
> URL: https://issues.apache.org/jira/browse/PDFBOX-1996
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Dave Smith
> Attachments: pdfbox.patch
>
>
> I have a 4 page black and white pdf that takes 32 seconds (8 seconds a page) 
> to render. It uses a Separation color space and it has to run numerous 
> functions per pixel that is causing the slow down. I have a patch where I pre 
> calculate the black and white pixels and cache them instead of calculating 
> them every time. This optimization gets the page rendering down to less than 
> a second a page. I will attach my patch. I could see going forward caching 
> all calculated colours , but floats in hash maps are tricky.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1996) PDSeparation optimization

2014-03-24 Thread Dave Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946020#comment-13946020
 ] 

Dave Smith commented on PDFBOX-1996:


The pdf is not public. I can send it to you off list.

My first thought was to optimize the function, however there is more than one.

dup, 0, mul, exch, dup, 0, mul, exch, dup, 0, mul, exch, 1, mul



> PDSeparation optimization
> -
>
> Key: PDFBOX-1996
> URL: https://issues.apache.org/jira/browse/PDFBOX-1996
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Dave Smith
>Priority: Minor
> Attachments: pdfbox.patch
>
>
> I have a 4 page black and white pdf that takes 32 seconds (8 seconds a page) 
> to render. It uses a Separation color space and it has to run numerous 
> functions per pixel that is causing the slow down. I have a patch where I pre 
> calculate the black and white pixels and cache them instead of calculating 
> them every time. This optimization gets the page rendering down to less than 
> a second a page. I will attach my patch. I could see going forward caching 
> all calculated colours , but floats in hash maps are tricky.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1996) PDSeparation optimization

2014-03-25 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947162#comment-13947162
 ] 

Tilman Hausherr commented on PDFBOX-1996:
-

What would happen if an image has e.g. 300dpi (about 3500 x 2500 pixels) and 
many different colors? Wouldn't this make a huge memory footprint? Or is the 
Separation color space not used for such images?

> PDSeparation optimization
> -
>
> Key: PDFBOX-1996
> URL: https://issues.apache.org/jira/browse/PDFBOX-1996
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Dave Smith
>Priority: Minor
> Attachments: pdfbox.patch, pdfbox.patch
>
>
> I have a 4 page black and white pdf that takes 32 seconds (8 seconds a page) 
> to render. It uses a Separation color space and it has to run numerous 
> functions per pixel that is causing the slow down. I have a patch where I pre 
> calculate the black and white pixels and cache them instead of calculating 
> them every time. This optimization gets the page rendering down to less than 
> a second a page. I will attach my patch. I could see going forward caching 
> all calculated colours , but floats in hash maps are tricky.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1996) PDSeparation optimization

2014-03-25 Thread Dave Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947408#comment-13947408
 ] 

Dave Smith commented on PDFBOX-1996:


I would check with John who wrote it but since we are sampling and only taking 
one float and dividing by 255 to get a value of 0..1 I would think we can only 
have 255 different values.

> PDSeparation optimization
> -
>
> Key: PDFBOX-1996
> URL: https://issues.apache.org/jira/browse/PDFBOX-1996
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Dave Smith
>Priority: Minor
> Attachments: pdfbox.patch, pdfbox.patch
>
>
> I have a 4 page black and white pdf that takes 32 seconds (8 seconds a page) 
> to render. It uses a Separation color space and it has to run numerous 
> functions per pixel that is causing the slow down. I have a patch where I pre 
> calculate the black and white pixels and cache them instead of calculating 
> them every time. This optimization gets the page rendering down to less than 
> a second a page. I will attach my patch. I could see going forward caching 
> all calculated colours , but floats in hash maps are tricky.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1996) PDSeparation optimization

2014-04-03 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959039#comment-13959039
 ] 

John Hewson commented on PDFBOX-1996:
-

There are only 256 values (including zero) and separations are always 
single-color so this can be safely cached. Storing a 1-element int[] in a 
HashMap doesn't seem like the right choice though, as an array is an object 
with at least a pointer and a length to store, so this is going to have more 
memory overhead than storing just a boxed Integer.

As a rough estimate, a HashMap is going to need at least 4 
bytes for the Integer object pointer and 4 bytes for its int value. For the 
int[] there will be at least 4 bytes for the object pointer (arrays are 
objects) plus 4 bytes for the single int value, plus 4 bytes for the array's 
length. So we're looking at maybe 20 bytes per entry, around 5KB if the cache 
is full (not bad). It's also going to spend time doing memory allocations and 
computing hashes.

A 256-element byte array would a fixed overhead of just 256 + 8 bytes and 
benefit from not having to do hash computations to perform a lookup.


> PDSeparation optimization
> -
>
> Key: PDFBOX-1996
> URL: https://issues.apache.org/jira/browse/PDFBOX-1996
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Dave Smith
>Priority: Minor
> Attachments: pdfbox.patch, pdfbox.patch
>
>
> I have a 4 page black and white pdf that takes 32 seconds (8 seconds a page) 
> to render. It uses a Separation color space and it has to run numerous 
> functions per pixel that is causing the slow down. I have a patch where I pre 
> calculate the black and white pixels and cache them instead of calculating 
> them every time. This optimization gets the page rendering down to less than 
> a second a page. I will attach my patch. I could see going forward caching 
> all calculated colours , but floats in hash maps are tricky.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1996) PDSeparation optimization

2014-04-06 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961448#comment-13961448
 ] 

Tilman Hausherr commented on PDFBOX-1996:
-

I don't get it - for the sample file, the result is not a 1 element int[] but a 
4 element int[] that would be the value of the map.

Btw the speed improvement is amazing, the first three pages are rendered in 
20secs with the existing method, and slightly < 2secs with the hash map. Using 
Float as the key instead of that weird "Float.floatToIntBits" trick makes it 
slower, now the time is about 3 secs.

> PDSeparation optimization
> -
>
> Key: PDFBOX-1996
> URL: https://issues.apache.org/jira/browse/PDFBOX-1996
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Dave Smith
>Priority: Minor
> Attachments: pdfbox.patch, pdfbox.patch
>
>
> I have a 4 page black and white pdf that takes 32 seconds (8 seconds a page) 
> to render. It uses a Separation color space and it has to run numerous 
> functions per pixel that is causing the slow down. I have a patch where I pre 
> calculate the black and white pixels and cache them instead of calculating 
> them every time. This optimization gets the page rendering down to less than 
> a second a page. I will attach my patch. I could see going forward caching 
> all calculated colours , but floats in hash maps are tricky.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1996) PDSeparation optimization

2014-04-06 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961465#comment-13961465
 ] 

John Hewson commented on PDFBOX-1996:
-

You're quite right, I was looking at the input value instead of the output 
value. The output is indeed an int[], so ignore the 1-element part of my 
comment above. The rest still applies, except instead of 1 1-dimensional byte 
array an n-dimensional array could be used instead. However, as 5KB is still 
tiny I see no problem with just using a HashMap.

> PDSeparation optimization
> -
>
> Key: PDFBOX-1996
> URL: https://issues.apache.org/jira/browse/PDFBOX-1996
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Dave Smith
>Priority: Minor
> Attachments: pdfbox.patch, pdfbox.patch
>
>
> I have a 4 page black and white pdf that takes 32 seconds (8 seconds a page) 
> to render. It uses a Separation color space and it has to run numerous 
> functions per pixel that is causing the slow down. I have a patch where I pre 
> calculate the black and white pixels and cache them instead of calculating 
> them every time. This optimization gets the page rendering down to less than 
> a second a page. I will attach my patch. I could see going forward caching 
> all calculated colours , but floats in hash maps are tricky.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1996) PDSeparation optimization

2014-04-06 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961532#comment-13961532
 ] 

John Hewson commented on PDFBOX-1996:
-

+1

> PDSeparation optimization
> -
>
> Key: PDFBOX-1996
> URL: https://issues.apache.org/jira/browse/PDFBOX-1996
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Dave Smith
>Priority: Minor
> Attachments: pdfbox.patch, pdfbox.patch
>
>
> I have a 4 page black and white pdf that takes 32 seconds (8 seconds a page) 
> to render. It uses a Separation color space and it has to run numerous 
> functions per pixel that is causing the slow down. I have a patch where I pre 
> calculate the black and white pixels and cache them instead of calculating 
> them every time. This optimization gets the page rendering down to less than 
> a second a page. I will attach my patch. I could see going forward caching 
> all calculated colours , but floats in hash maps are tricky.



--
This message was sent by Atlassian JIRA
(v6.2#6252)