[GitHub] poi pull request #54: Add Image Optimisations

2017-05-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/poi/pull/54


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org



[GitHub] poi pull request #54: Add Image Optimisations

2017-05-14 Thread thelmstedt
GitHub user thelmstedt opened a pull request:

https://github.com/apache/poi/pull/54

Add Image Optimisations

I need to be able to generate spreadsheets with 2000 images fast enough for 
a synchronous HTTP request. `3.16` takes ~25 seconds for this usecase for me. 
These changes take it down to ~1 second.  I've added a test for my case, and I 
don't get any more failures than `trunk`. I don't think I've broken any 
invariants but it's definitely worth a 2nd look!

The slowdown was caused by the cost of creating and sorting 
`PackagePartNames`. I assume it's part of the OOXML spec so there's no avoiding 
the overhead. But `addPicture` happened to make some redundant usage of these:
* adding a new relationship enumerated all current relationships, building 
`PackagePartName`s for each
* PackageParts were stored as as a `TreeMap`

Instead we
* cache relationship lookups by name (similarly to what is already done for 
ID and type)
* Store PackageParts in a HashMap for quick lookups, and explicitly sort 
its `.values()`

First commit adds a benchmark using 
http://openjdk.java.net/projects/code-tools/jmh/ 

Prior to my changes `addPicture` gets:

```
# Run complete. Total time: 00:00:31

Benchmark  Mode  
CntScore Error   Units
AddImageBench.benchCreatePicture   avgt   
10 2831.586 ±  38.824   us/op
AddImageBench.benchCreatePicture:·gc.alloc.rateavgt   
10  810.418 ±  22.303  MB/sec
AddImageBench.benchCreatePicture:·gc.alloc.rate.norm   avgt   
10  2407955.352 ±   33327.581B/op
AddImageBench.benchCreatePicture:·gc.churn.PS_Eden_Space   avgt   
10  847.676 ± 361.511  MB/sec
AddImageBench.benchCreatePicture:·gc.churn.PS_Eden_Space.norm  avgt   
10  2520570.616 ± 1084187.937B/op
AddImageBench.benchCreatePicture:·gc.churn.PS_Survivor_Space   avgt   
100.561 ±   0.645  MB/sec
AddImageBench.benchCreatePicture:·gc.churn.PS_Survivor_Space.norm  avgt   
10 1667.673 ±1912.256B/op
AddImageBench.benchCreatePicture:·gc.count avgt   
10   16.000counts
AddImageBench.benchCreatePicture:·gc.time  avgt   
10   69.000ms
AddImageBench.benchCreatePicture:·stackavgt
   NaN   ---
```

Afterwards we get 10x improvement in execution time, and 100x in memory:

```
# Run complete. Total time: 00:00:31

Benchmark  Mode  
Cnt  Score   Error   Units
AddImageBench.benchCreatePicture   avgt   
10227.339 ±49.226   us/op
AddImageBench.benchCreatePicture:·gc.alloc.rateavgt   
10119.667 ±25.859  MB/sec
AddImageBench.benchCreatePicture:·gc.alloc.rate.norm   avgt   
10  28021.776 ±54.539B/op
AddImageBench.benchCreatePicture:·gc.churn.PS_Eden_Space   avgt   
10 98.653 ±   314.433  MB/sec
AddImageBench.benchCreatePicture:·gc.churn.PS_Eden_Space.norm  avgt   
10  19826.075 ± 63192.153B/op
AddImageBench.benchCreatePicture:·gc.churn.PS_Survivor_Space   avgt   
10  0.228 ± 1.090  MB/sec
AddImageBench.benchCreatePicture:·gc.churn.PS_Survivor_Space.norm  avgt   
10 45.594 ±   217.979B/op
AddImageBench.benchCreatePicture:·gc.count avgt   
10  2.000  counts
AddImageBench.benchCreatePicture:·gc.time  avgt   
10 88.000  ms
AddImageBench.benchCreatePicture:·stackavgt
 NaN ---
```

Happy to back out the benchmark inclusion if you don't want to include 
another test dependency.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/thelmstedt/poi feature/redo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/poi/pull/54.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #54


commit c26b958ac32c20226db4cb41fb7dda8bc3e9a34f
Author: Tim Helmstedt 
Date:   2016-10-23T20:59:16Z

Benchmark adding images

commit 1d7cf3574016e64e0631556bb50cb466a930c18f
Author: Tim Helmstedt 
Date:   2016-10-22T11:06:53Z

PackageRelationshipCollection caches lookup by targetPart

Building partnames for all relationships is expensive. Here we avoid
this in findExistingRelation,