[jira] [Comment Edited] (PDFBOX-2007) Performance regression since PDFRenderer

2014-04-01 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957397#comment-13957397
 ] 

Tilman Hausherr edited comment on PDFBOX-2007 at 4/2/14 6:34 AM:
-

I assume the relevant code is
https://github.com/fbernier/taz-clj/blob/master/src/taz_clj/converter.clj
Is it possible for you to do a profiling _within_ PDFBox, i.e. to see how much 
time is spent in the private PDFRenderer.renderPage method, as compared to the 
old PDPage.convertToImage() ?

This way we could find out whether the difference is in the rendering, or in 
the PDFBox code before the rendering. I ask this because both were changed.

And does this happen with any PDF, or with a certain, special PDF?


was (Author: tilman):
I assume the relevant code is
https://github.com/fbernier/taz-clj/blob/master/src/taz_clj/converter.clj
Is it possible for you to do a profiling _within_ PDFBox, i.e. to see how much 
time is spent in the private PDFRenderer.renderPage method, as compared to the 
old PDPage.convertToImage() ?

This way we could find out whether the difference is in the rendering, or in 
the code before the rendering. I ask this because both were changed.

And does this happen with any PDF, or with a certain, special PDF?

> Performance regression since PDFRenderer
> 
>
> Key: PDFBOX-2007
> URL: https://issues.apache.org/jira/browse/PDFBOX-2007
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: François Bernier
>  Labels: perfomance, regression
>
> Hi,
> I have the following toy project where I use PDFBox: 
> https://github.com/fbernier/taz-clj
> I've been using the snapshot versions of PDFBox for quite a while and 
> recently since the move from RenderUtil#convertToImage to 
> PDFRenderer#renderImage (this commit: 
> https://github.com/fbernier/taz-clj/commit/47917d494f2a9a0999da7f36827c45145d4bb42c),
>  there is quite a big performance regression. If I change the PDFBox 
> dependency to 1.8.x, everything is good. Here are my benchmarks:
> PDFBox 1.8.x:
> Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
>   4 threads and 4 connections
>   Thread Stats   Avg  Stdev Max   +/- Stdev
> Latency   208.98ms   58.27ms 391.43ms   52.08%
> Req/Sec 4.63  1.73 8.00 62.88%
>   1224 requests in 1.00m, 72.34MB read
> Requests/sec: 20.40
> Transfer/sec:  1.21MB
> PDFBox 2.0.0:
> Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
>   4 threads and 4 connections
>   Thread Stats   Avg  Stdev Max   +/- Stdev
> Latency   920.25ms  378.94ms   2.76s91.38%
> Req/Sec 0.80  0.40 1.00 80.17%
>   275 requests in 1.00m, 15.85MB read
> Requests/sec:  4.58
> Transfer/sec:270.41KB
> I have not looked any further than this and have no more data to give you 
> (yet).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2007) Performance regression since PDFRenderer

2014-04-01 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957397#comment-13957397
 ] 

Tilman Hausherr commented on PDFBOX-2007:
-

I assume the relevant code is
https://github.com/fbernier/taz-clj/blob/master/src/taz_clj/converter.clj
Is it possible for you to do a profiling _within_ PDFBox, i.e. to see how much 
time is spent in the private PDFRenderer.renderPage method, as compared to the 
old PDPage.convertToImage() ?

This way we could find out whether the difference is in the rendering, or in 
the code before the rendering. I ask this because both were changed.

And does this happen with any PDF, or with a certain, special PDF?

> Performance regression since PDFRenderer
> 
>
> Key: PDFBOX-2007
> URL: https://issues.apache.org/jira/browse/PDFBOX-2007
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: François Bernier
>  Labels: perfomance, regression
>
> Hi,
> I have the following toy project where I use PDFBox: 
> https://github.com/fbernier/taz-clj
> I've been using the snapshot versions of PDFBox for quite a while and 
> recently since the move from RenderUtil#convertToImage to 
> PDFRenderer#renderImage (this commit: 
> https://github.com/fbernier/taz-clj/commit/47917d494f2a9a0999da7f36827c45145d4bb42c),
>  there is quite a big performance regression. If I change the PDFBox 
> dependency to 1.8.x, everything is good. Here are my benchmarks:
> PDFBox 1.8.x:
> Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
>   4 threads and 4 connections
>   Thread Stats   Avg  Stdev Max   +/- Stdev
> Latency   208.98ms   58.27ms 391.43ms   52.08%
> Req/Sec 4.63  1.73 8.00 62.88%
>   1224 requests in 1.00m, 72.34MB read
> Requests/sec: 20.40
> Transfer/sec:  1.21MB
> PDFBox 2.0.0:
> Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
>   4 threads and 4 connections
>   Thread Stats   Avg  Stdev Max   +/- Stdev
> Latency   920.25ms  378.94ms   2.76s91.38%
> Req/Sec 0.80  0.40 1.00 80.17%
>   275 requests in 1.00m, 15.85MB read
> Requests/sec:  4.58
> Transfer/sec:270.41KB
> I have not looked any further than this and have no more data to give you 
> (yet).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2007) Performance regression since PDFRenderer

2014-04-01 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

François Bernier updated PDFBOX-2007:
-

Description: 
Hi,

I have the following toy project where I use PDFBox: 
https://github.com/fbernier/taz-clj

I've been using the snapshot versions of PDFBox for quite a while and recently 
since the move from RenderUtil#convertToImage to PDFRenderer#renderImage (this 
commit: 
https://github.com/fbernier/taz-clj/commit/47917d494f2a9a0999da7f36827c45145d4bb42c),
 there is quite a big performance regression. If I change the PDFBox dependency 
to 1.8.x, everything is good. Here are my benchmarks:

PDFBox 1.8.x:
Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
  4 threads and 4 connections
  Thread Stats   Avg  Stdev Max   +/- Stdev
Latency   208.98ms   58.27ms 391.43ms   52.08%
Req/Sec 4.63  1.73 8.00 62.88%
  1224 requests in 1.00m, 72.34MB read
Requests/sec: 20.40
Transfer/sec:  1.21MB

PDFBox 2.0.0:
Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
  4 threads and 4 connections
  Thread Stats   Avg  Stdev Max   +/- Stdev
Latency   920.25ms  378.94ms   2.76s91.38%
Req/Sec 0.80  0.40 1.00 80.17%
  275 requests in 1.00m, 15.85MB read
Requests/sec:  4.58
Transfer/sec:270.41KB


I have not looked any further than this and have no more data to give you (yet).

  was:
Hi,

I have the following toy project where I use PDFBox: 
https://github.com/fbernier/taz-clj

I've been using the snapshot versions of PDFBox for quite a while and recently 
since the move from RenderUtil#convertToImage to PDFRenderer#renderImage (this 
commit: 
https://github.com/fbernier/taz-clj/commit/47917d494f2a9a0999da7f36827c45145d4bb42c),
 there is quite a big performance regression. If I change the PDFBox dependency 
to 1.8.x, everything is good. Here are my benchmarks:

PDFBox 1.8.x:
Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
  4 threads and 4 connections
  Thread Stats   Avg  Stdev Max   +/- Stdev
Latency   920.25ms  378.94ms   2.76s91.38%
Req/Sec 0.80  0.40 1.00 80.17%
  275 requests in 1.00m, 15.85MB read
Requests/sec:  4.58
Transfer/sec:270.41KB


PDFBox 2.0.0:
Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
  4 threads and 4 connections
  Thread Stats   Avg  Stdev Max   +/- Stdev
Latency   208.98ms   58.27ms 391.43ms   52.08%
Req/Sec 4.63  1.73 8.00 62.88%
  1224 requests in 1.00m, 72.34MB read
Requests/sec: 20.40
Transfer/sec:  1.21MB

I have not looked any further than this and have no more data to give you (yet).


> Performance regression since PDFRenderer
> 
>
> Key: PDFBOX-2007
> URL: https://issues.apache.org/jira/browse/PDFBOX-2007
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: François Bernier
>  Labels: perfomance, regression
>
> Hi,
> I have the following toy project where I use PDFBox: 
> https://github.com/fbernier/taz-clj
> I've been using the snapshot versions of PDFBox for quite a while and 
> recently since the move from RenderUtil#convertToImage to 
> PDFRenderer#renderImage (this commit: 
> https://github.com/fbernier/taz-clj/commit/47917d494f2a9a0999da7f36827c45145d4bb42c),
>  there is quite a big performance regression. If I change the PDFBox 
> dependency to 1.8.x, everything is good. Here are my benchmarks:
> PDFBox 1.8.x:
> Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
>   4 threads and 4 connections
>   Thread Stats   Avg  Stdev Max   +/- Stdev
> Latency   208.98ms   58.27ms 391.43ms   52.08%
> Req/Sec 4.63  1.73 8.00 62.88%
>   1224 requests in 1.00m, 72.34MB read
> Requests/sec: 20.40
> Transfer/sec:  1.21MB
> PDFBox 2.0.0:
> Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
>   4 threads and 4 connections
>   Thread Stats   Avg  Stdev Max   +/- Stdev
> Latency   920.25ms  378.94ms   2.76s91.38%
> Req/Sec 0.80  0.40 1.00 80.17%
>   275 requests in 1.00m, 15.85MB read
> Requests/sec:  4.58
> Transfer/sec:270.41KB
> I have not looked any further than this and have no more data to give you 
> (yet).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2007) Performance regression since PDFRenderer

2014-04-01 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

François Bernier updated PDFBOX-2007:
-

Description: 
Hi,

I have the following toy project where I use PDFBox: 
https://github.com/fbernier/taz-clj

I've been using the snapshot versions of PDFBox for quite a while and recently 
since the move from RenderUtil#convertToImage to PDFRenderer#renderImage (this 
commit: 
https://github.com/fbernier/taz-clj/commit/47917d494f2a9a0999da7f36827c45145d4bb42c),
 there is quite a big performance regression. If I change the PDFBox dependency 
to 1.8.x, everything is good. Here are my benchmarks:

PDFBox 1.8.x:
Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
  4 threads and 4 connections
  Thread Stats   Avg  Stdev Max   +/- Stdev
Latency   920.25ms  378.94ms   2.76s91.38%
Req/Sec 0.80  0.40 1.00 80.17%
  275 requests in 1.00m, 15.85MB read
Requests/sec:  4.58
Transfer/sec:270.41KB


PDFBox 2.0.0:
Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
  4 threads and 4 connections
  Thread Stats   Avg  Stdev Max   +/- Stdev
Latency   208.98ms   58.27ms 391.43ms   52.08%
Req/Sec 4.63  1.73 8.00 62.88%
  1224 requests in 1.00m, 72.34MB read
Requests/sec: 20.40
Transfer/sec:  1.21MB

I have not looked any further than this and have no more data to give you (yet).

  was:
Hi,

I have the following toy project where I use PDFBox: 
https://github.com/fbernier/taz-clj

I've been using the snapshot versions of PDFBox for quite a while and recently 
since the move from RenderUtil#convertToImage to PDFRenderer#renderImage (this 
commit: 
https://github.com/fbernier/taz-clj/commit/47917d494f2a9a0999da7f36827c45145d4bb42c),
 there is quite a big performance regression. If I change the PDFBox dependency 
to 1.8.x, everything is good. Here are my benchmarks:

PDFBox 1.8.x:
Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
  4 threads and 4 connections
  Thread Stats   Avg  Stdev Max   +/- Stdev
Latency   208.98ms   58.27ms 391.43ms   52.08%
Req/Sec 4.63  1.73 8.00 62.88%
  1224 requests in 1.00m, 72.34MB read
Requests/sec: 20.40
Transfer/sec:  1.21MB

PDFBox 2.0.0:
Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
  4 threads and 4 connections
  Thread Stats   Avg  Stdev Max   +/- Stdev
Latency   208.98ms   58.27ms 391.43ms   52.08%
Req/Sec 4.63  1.73 8.00 62.88%
  1224 requests in 1.00m, 72.34MB read
Requests/sec: 20.40
Transfer/sec:  1.21MB

I have not looked any further than this and have no more data to give you (yet).


> Performance regression since PDFRenderer
> 
>
> Key: PDFBOX-2007
> URL: https://issues.apache.org/jira/browse/PDFBOX-2007
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: François Bernier
>  Labels: perfomance, regression
>
> Hi,
> I have the following toy project where I use PDFBox: 
> https://github.com/fbernier/taz-clj
> I've been using the snapshot versions of PDFBox for quite a while and 
> recently since the move from RenderUtil#convertToImage to 
> PDFRenderer#renderImage (this commit: 
> https://github.com/fbernier/taz-clj/commit/47917d494f2a9a0999da7f36827c45145d4bb42c),
>  there is quite a big performance regression. If I change the PDFBox 
> dependency to 1.8.x, everything is good. Here are my benchmarks:
> PDFBox 1.8.x:
> Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
>   4 threads and 4 connections
>   Thread Stats   Avg  Stdev Max   +/- Stdev
> Latency   920.25ms  378.94ms   2.76s91.38%
> Req/Sec 0.80  0.40 1.00 80.17%
>   275 requests in 1.00m, 15.85MB read
> Requests/sec:  4.58
> Transfer/sec:270.41KB
> PDFBox 2.0.0:
> Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
>   4 threads and 4 connections
>   Thread Stats   Avg  Stdev Max   +/- Stdev
> Latency   208.98ms   58.27ms 391.43ms   52.08%
> Req/Sec 4.63  1.73 8.00 62.88%
>   1224 requests in 1.00m, 72.34MB read
> Requests/sec: 20.40
> Transfer/sec:  1.21MB
> I have not looked any further than this and have no more data to give you 
> (yet).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2007) Performance regression since PDFRenderer

2014-04-01 Thread JIRA
François Bernier created PDFBOX-2007:


 Summary: Performance regression since PDFRenderer
 Key: PDFBOX-2007
 URL: https://issues.apache.org/jira/browse/PDFBOX-2007
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: François Bernier


Hi,

I have the following toy project where I use PDFBox: 
https://github.com/fbernier/taz-clj

I've been using the snapshot versions of PDFBox for quite a while and recently 
since the move from RenderUtil#convertToImage to PDFRenderer#renderImage (this 
commit: 
https://github.com/fbernier/taz-clj/commit/47917d494f2a9a0999da7f36827c45145d4bb42c),
 there is quite a big performance regression. If I change the PDFBox dependency 
to 1.8.x, everything is good. Here are my benchmarks:

PDFBox 1.8.x:
Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
  4 threads and 4 connections
  Thread Stats   Avg  Stdev Max   +/- Stdev
Latency   208.98ms   58.27ms 391.43ms   52.08%
Req/Sec 4.63  1.73 8.00 62.88%
  1224 requests in 1.00m, 72.34MB read
Requests/sec: 20.40
Transfer/sec:  1.21MB

PDFBox 2.0.0:
Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
  4 threads and 4 connections
  Thread Stats   Avg  Stdev Max   +/- Stdev
Latency   208.98ms   58.27ms 391.43ms   52.08%
Req/Sec 4.63  1.73 8.00 62.88%
  1224 requests in 1.00m, 72.34MB read
Requests/sec: 20.40
Transfer/sec:  1.21MB

I have not looked any further than this and have no more data to give you (yet).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2002) Show deprecation in the build and fix deprecated calls

2014-04-01 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13956878#comment-13956878
 ] 

Tilman Hausherr commented on PDFBOX-2002:
-

I deprecated only one method in 2.0. 

So I deprecated that one method in ImageIOUtils now in 1.8 as in 2.0, renamed 
parameters of all methods as in 2.0 version, replaced the calls to that method, 
in rev 1583748 for the 1.8 branch.

> Show deprecation in the build and fix deprecated calls
> --
>
> Key: PDFBOX-2002
> URL: https://issues.apache.org/jira/browse/PDFBOX-2002
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.0
>
>
> According to 
> https://pdfbox.apache.org/ideas.html
> one of the tasks is "Remove all deprecated methods". Therefore, I will modify 
> the parent POM to show the deprecated calls. This will show such calls, but 
> not fail the build. It is a gentle hint to fix these calls. Lets leave this 
> issue open until all is done.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Apache PDFBox April 2014 board report due

2014-04-01 Thread John Hewson
+1

-- John

On 30 Mar 2014, at 07:36, Tilman Hausherr  wrote:

> Am 30.03.2014 16:29, schrieb Andreas Lehmkuehler:
>> 
>> @Johm, @Tilman
>> Please add something about the GSoC status. 
> 
> John Hewson and Tilman Hausherr prepared projects to be mentored at GCoC 2014 
> (PDFBOX-1912  and 
> PDFBOX-1915 ) and received 
> 4 proposals. The rating is ongoing, see timeline at 
> https://www.google-melange.com/gsoc/events/google/gsoc2014 .
> 
> Tilman



Re: Apache PDFBox April 2014 board report due

2014-04-01 Thread Maruan Sahyoun
Hi Andreas,

+1 with the additions from John and Tilman

BR
Maruan

Am 30.03.2014 um 16:29 schrieb Andreas Lehmkuehler :

> Hi,
> 
> find attached a quick draft of the board report we're expected to submit this
> month.
> 
> @Johm, @Tilman
> Please add something about the GSoC status.
> 
> 
> Any further comments, objections or additions?
> 
> 
> 
> 
> The Apache PDFBox library is an open source Java tool for working with PDF
> documents.
> 
> 
> General Comments
> 
> 
> There are no issues that require Board attention.
> 
> 
> Community
> -
> 
> There is a steady stream of contributions and bug reports from the community.
> 
> John Hewson and Tilman Hausherr were added as committers and PMC members to 
> our ranks in February 2014.
> 
> Eric Leleu stepped back and went emeritus per his own request in March 2014.
> 
> 452 (429 last report) subscribers on the user@ list
> 157 (164 last report) subscribers on the dev@ list
> 
> Releases
> 
> 
> Version 1.8.4 was released on 31th of January 2014
> 
> 1.8.4 is an incremental bugfix release based on PDFBox 1.8.x.
> 
> GSoC
> 
> 
> TODO
> 
> Development:
> 
> 
> Most likely the next bugfix version 1.8.5 will be released in the second 
> quarter.
> 
> The work on our next major release is an ongoing effort. The main topics are:
> 
> - switch to java 1.6
> - modularization
> - replace/enhance the parser
> - refactor the underlying COS model
> - code cleanup
> - enhance rendering
> 
> 
> 
> BR
> Andreas Lehmkühler