[GitHub] ant pull request #76: bz-43144 - Improve the performance of the tar task whe...

2018-11-01 Thread jaikiran
Github user jaikiran closed the pull request at:

https://github.com/apache/ant/pull/76


---

-
To unsubscribe, e-mail: dev-unsubscr...@ant.apache.org
For additional commands, e-mail: dev-h...@ant.apache.org



[GitHub] ant pull request #76: bz-43144 - Improve the performance of the tar task whe...

2018-10-31 Thread jaikiran
GitHub user jaikiran opened a pull request:

https://github.com/apache/ant/pull/76

bz-43144 - Improve the performance of the tar task when it uses a zipfileset

https://bz.apache.org/bugzilla/show_bug.cgi?id=43144 is an issue where 
users have reported that the tar task is extremely slow when used with a 
`zipfileset`. 

The comment in that bug, from @bodewig, rightly notes that the reason for 
this slowness has to do with the code where we iterate over the zip entries as 
a `Resource` and then open an `InputStream` for each entry during the `tar` 
task. The implementation of the `ZipResource#getInputStream` opens a new 
`ZipFile` every time and as Stefan notes in that bug, the reason it's done this 
way is because we don't know what would be the right time to close a `ZipFIle` 
if the implementation of the `ZipResource` would want to hold on to a open 
`ZipFile` instance and reuse it.

However, there are occasions, like the one here, where the callers of the 
`ArchiveFileSet` are aware and know when and how long they expect the 
underlying archive to be open. The commit in this PR, introduces a new method 
(`openArchive()`) on the `ArchiveFileSet` to allow such callers to have more 
control over the opening and closing of the underlying resource(s) like the 
`ZipFile`. The whole goal of this new method is to allow iterating over the 
entries in the archive (just like the existing `iterator()` method) and yet the 
same time exposing a way for users to explicitly `close` the returned 
`ArchiveEntries`. When to use the `iterator()` method and when to use the 
`openArchive` method will be left to the users of ArchiveSet.

The commit in this PR intentionally just exposes only one method 
`openArchive` as a `public` method and keeps the rest of the new methods either 
private or package private. Right now, only `ZipFileSet` overrides the new 
package private method to provide a scanner which holds on to open resource(s), 
if it's asked to do so.

The changes in this commit maintain backward compatibility of the existing 
methods and does _not_ introduce any change in behaviour of their usages or 
semantics.

The javadocs of the new methods explain more about what each one does and 
how the usage typically looks like.

I have run a few of the existing Tar related tests locally and haven't 
found any regressions. I have also run a manual test to see how the new 
performance with this change looks like. I used a random zip file which is 
around 5MB in size and has 927 entries (as reported by the unzip command):

```
unzip -l source.zip
 
- ---
 2385 927 files
```
I then used the following build file:

```




  

   


```
Ran the following command:
```
time ant
```
With Ant 1.10.5 (the latest released), it takes a while to complete and 
reports:

```
Total time: 23 seconds

real0m23.485s
user0m16.767s
sys 0m9.368s
```
So around 23 seconds for the tar task.

With this patch and the freshly built Ant distribution, this same build 
file (I did the necessary cleanup of the  build generated tar file, before 
running it again) reports:

```
Total time: 0 seconds

real0m1.068s
user0m1.994s
sys 0m0.258s
```

Around 1 second for the exact same task. So as expected there's a big 
improvement. I have done basic comparison of the generated tar files, with Ant 
1.10.5 and this patched version to check it contains all the expected content 
and it looked fine. I will see if I can add some kind of automated testing 
around this if possible. Until then I would like inputs on whether this looks 
fine and any suggestions/feedback on this change. Right now, this is targetted 
for master branch. I'll take a look later if it's easy (I guess, should be) to 
have this in 1.9.x too.

P.S: The linked bug also has one comment which states that the `copy` task 
when it uses a `zipfileset` is impacted by this performance issue too. I 
haven't checked or verified that. At a later date, I'll take a look if it needs 
this same/similar fix.



 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaikiran/ant bz-43144-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ant/pull/76.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #76


commit 81b8c80c550ba560770a1f82de60c4d0b11ace91
Author: Jaikiran Pai 
Date:   2018-10-31T13:59:29Z

bz-43144 - (performance fix) Explicitly control when an archive is opened 
and closed when a zipfileset is used