[ 
https://issues.apache.org/jira/browse/HDDS-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3223:
-----------------------------
    Description: 
*What's the problem ?*
Read a 300M file, it cost about 25 seconds, i.e. 12M/s, which is too slow. Then 
I capture the packet. You can see from the image, read a 300M file need 10 GET 
requests, each GET request read about 30M. 
The first GET request cost about 1 second, but the 10th GET request cost about 
23 seconds.


 !screenshot-1.png! 



*What's the reason ?*
When do GET, the stack is: 
[IOUtils::copyLarge|https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java#L262]
 -> 
[IOUtils::skipFully|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1190]
 -> 
[IOUtils::skip|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L2064]
 -> 
[InputStream::read|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1957].

It means, the 10th GET request which should read 270M-300M, but to skip 0-270M, 
it also 
[InputStream::read|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1957]
 0-270M. So the GET  request become slower and slower

You can also refer it [here|https://issues.apache.org/jira/browse/IO-355] why 
IOUtils implement skip by read rather than real skip, e.g. seek.

  was:
*What's the problem ?*
Read a 300M file, it cost about 25 seconds, i.e. 12M/s, which is too slow. Then 
I capture the packet. You can see from the image, read a 300M file need 10 GET 
requests, each GET request read about 30M. 
The first GET request cost about 1 second, but the 10th GET request cost about 
23 seconds.


 !screenshot-1.png! 



*What's the reason ?*
When do GET, the stack is: 
[IOUtils::copyLarge|https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java#L262]
 -> 
[IOUtils::skipFully|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1190]
 -> 
[IOUtils::skip|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L2064]
 -> 
[InputStream::read|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1957].

It means, the 10th GET request which should read 270M-300M, but to skip 0-270M, 
it also 
[InputStream::read|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1957]
 0-270M. So the GET  request become slower and slower


> S3g become slower when read bigger object for error use of skip
> ---------------------------------------------------------------
>
>                 Key: HDDS-3223
>                 URL: https://issues.apache.org/jira/browse/HDDS-3223
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>            Reporter: runzhiwang
>            Assignee: runzhiwang
>            Priority: Critical
>         Attachments: screenshot-1.png
>
>
> *What's the problem ?*
> Read a 300M file, it cost about 25 seconds, i.e. 12M/s, which is too slow. 
> Then I capture the packet. You can see from the image, read a 300M file need 
> 10 GET requests, each GET request read about 30M. 
> The first GET request cost about 1 second, but the 10th GET request cost 
> about 23 seconds.
>  !screenshot-1.png! 
> *What's the reason ?*
> When do GET, the stack is: 
> [IOUtils::copyLarge|https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java#L262]
>  -> 
> [IOUtils::skipFully|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1190]
>  -> 
> [IOUtils::skip|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L2064]
>  -> 
> [InputStream::read|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1957].
> It means, the 10th GET request which should read 270M-300M, but to skip 
> 0-270M, it also 
> [InputStream::read|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1957]
>  0-270M. So the GET  request become slower and slower
> You can also refer it [here|https://issues.apache.org/jira/browse/IO-355] why 
> IOUtils implement skip by read rather than real skip, e.g. seek.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to