[ 
https://issues.apache.org/jira/browse/WAGON-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olaf Otto updated WAGON-537:
----------------------------
    Description: 
We are using maven for build process automation with docker. This sometimes 
involves uploading and downloading artifacts with a few gigabytes in size. 
Here, maven's transfer speed is consistently and reproducibly slow. For 
instance, an artifact with 7,5 GB in size took almost two hours to transfer in 
spite of a 100 MB/s connection with respective reproducible download speed from 
the remote nexus artifact repository when using a browser to download. The same 
is true when uploding such an artifact.

I have investigated the issue using JProfiler. The result shows an issue in 
AbstractWagon's transfer( Resource resource, InputStream input, OutputStream 
output, int requestType, long maxSize ) method used for remote artifacts and 
the same issue in AbstractHttpClientWagon#writeTo(OutputStream).

Here, the input stream is read in a loop using a 4 Kb buffer. Whenever data is 
received, the received data is pushed to downstream listeners via 
fireTransferProgress. These listeners (or rather consumers) perform expensive 
tasks.

Now, the underlying InputStream implementation used in transfer will return 
calls to read(buffer, offset, length) as soon as *some* data is available. That 
is, fireTransferProgress may well be invoked with an average number of bytes 
less than half the buffer capacity (this varies with the underlying network and 
hardware architecture). Consequently, fireTransferProgress is invoked *millions 
of times* for large files. As this is a blocking operation, the time spent in 
fireTransferProgress dominates and drastically slows down the transfers by at 
least one order of magnitude. 

!wagon-issue.png! 

In our case, we found download speed reduced from a theoretical optimum of ~80 
seconds to to more than 3200 seconds.

>From an architectural perspective, I would not want to make the consumers / 
>listeners invoked via fireTransferProgress aware of their potential impact on 
>download speed, but rather refactor the transfer method such that it uses a 
>buffer strategy reducing the the number of fireTransferProgress invocations. 
>This should be done with regard to the expected file size of the transfer, 
>such that fireTransferProgress is invoked often enough but not to frequent.

  was:
We are using maven for build process automation with docker. This sometimes 
involves downloading images with a few gigabytes in size. Here, maven's 
download speed is consistently and reproducibly slow. For instance, an artifact 
with 7,5 GB in size took almost two hours to transfer in spite of a 100 MB/s 
connection with respective reproducible download speed from the remote nexus 
artifact repository when using a browser to download.

I have investigated the issue using JProfiler. The result clearly shows a 
significant issue in AbstractWagon's transfer( Resource resource, InputStream 
input, OutputStream output, int requestType, long maxSize ) method used for 
remote artifacts.

Here, the input stream is read in a loop using a 4 Kb buffer. Whenever data is 
received, the received data is pushed to downstream listeners via 
fireTransferProgress. These listeners (or rather consumers) perform  expensive 
tasks such as checksumming or printing to console.

Now, the underlying InputStream implementation used in transfer will return 
calls to read(bugger, offset, length) as soon as *some* data is available. That 
is, fireTransferProgress is invoked with an average number of bytes less than 
half the buffer capacity (this varies with the underlying network and hardware 
architecture). Consequently, fireTransferProgress is invoked *millions of 
times* for large files. As this is a blocking operation, the time spent in 
fireTransferProgress dominates and drastically slows down the transfer by at 
least one order of magnitude. 

!wagon-issue.png! 

In our case, we found download speed reduced from a theoretical optimum of ~80 
seconds to to more than 3200 seconds.

>From an architectural perspective, I would not want to make the consumers / 
>listeners invoked via fireTransferProgress aware of their potential impact on 
>download speed, but rather refactor the transfer method such that it uses a 
>buffer strategy reducing the the number of fireTransferProgress invocations. 
>This should be done with regard to the expected file size of the transfer, 
>such that fireTransferProgress is invoked often enough but not to frequent.

I have implemented a solution and transfer speed went up more than one order of 
magnitude. I will provide a pull request asap.



        Summary: Maven transfer speed of large artifacts is slow due to 
unsuitable buffer strategy  (was: Maven download speed of large artifacts is 
slow due to unsuitable buffer strategy for remote Artifacts in AbstractWagon)

> Maven transfer speed of large artifacts is slow due to unsuitable buffer 
> strategy
> ---------------------------------------------------------------------------------
>
>                 Key: WAGON-537
>                 URL: https://issues.apache.org/jira/browse/WAGON-537
>             Project: Maven Wagon
>          Issue Type: Improvement
>          Components: wagon-provider-api
>    Affects Versions: 3.2.0
>         Environment: Windows 10, JDK 1.8, Nexus  Artifact store > 100MB/s 
> network connection.
>            Reporter: Olaf Otto
>            Assignee: Michael Osipov
>            Priority: Major
>              Labels: perfomance
>         Attachments: wagon-issue.png
>
>
> We are using maven for build process automation with docker. This sometimes 
> involves uploading and downloading artifacts with a few gigabytes in size. 
> Here, maven's transfer speed is consistently and reproducibly slow. For 
> instance, an artifact with 7,5 GB in size took almost two hours to transfer 
> in spite of a 100 MB/s connection with respective reproducible download speed 
> from the remote nexus artifact repository when using a browser to download. 
> The same is true when uploding such an artifact.
> I have investigated the issue using JProfiler. The result shows an issue in 
> AbstractWagon's transfer( Resource resource, InputStream input, OutputStream 
> output, int requestType, long maxSize ) method used for remote artifacts and 
> the same issue in AbstractHttpClientWagon#writeTo(OutputStream).
> Here, the input stream is read in a loop using a 4 Kb buffer. Whenever data 
> is received, the received data is pushed to downstream listeners via 
> fireTransferProgress. These listeners (or rather consumers) perform expensive 
> tasks.
> Now, the underlying InputStream implementation used in transfer will return 
> calls to read(buffer, offset, length) as soon as *some* data is available. 
> That is, fireTransferProgress may well be invoked with an average number of 
> bytes less than half the buffer capacity (this varies with the underlying 
> network and hardware architecture). Consequently, fireTransferProgress is 
> invoked *millions of times* for large files. As this is a blocking operation, 
> the time spent in fireTransferProgress dominates and drastically slows down 
> the transfers by at least one order of magnitude. 
> !wagon-issue.png! 
> In our case, we found download speed reduced from a theoretical optimum of 
> ~80 seconds to to more than 3200 seconds.
> From an architectural perspective, I would not want to make the consumers / 
> listeners invoked via fireTransferProgress aware of their potential impact on 
> download speed, but rather refactor the transfer method such that it uses a 
> buffer strategy reducing the the number of fireTransferProgress invocations. 
> This should be done with regard to the expected file size of the transfer, 
> such that fireTransferProgress is invoked often enough but not to frequent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to