[ 
https://issues.apache.org/jira/browse/HDDS-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3223:
---------------------------------
    Labels: pull-request-available  (was: )

> Improve s3g read 1GB object efficiency by 100 times 
> ----------------------------------------------------
>
>                 Key: HDDS-3223
>                 URL: https://issues.apache.org/jira/browse/HDDS-3223
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>            Reporter: runzhiwang
>            Assignee: runzhiwang
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: screenshot-1.png
>
>
> *What's the problem ?*
> Read 1000M object, it cost about 470 seconds, i.e. 2.2M/s, which is too slow. 
> *What's the reason ?*
> When read 1000M file, there are 50 GET requests, each GET request read 20M. 
> When do GET, the stack is: 
> [IOUtils::copyLarge|https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java#L262]
>  -> 
> [IOUtils::skipFully|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1190]
>  -> 
> [IOUtils::skip|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L2064]
>  -> 
> [InputStream::read|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1957].
> It means, the 50th GET request which should read 980M-1000M, but to skip 
> 0-980M, it also 
> [InputStream::read|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1957]
>  0-980M. So the 1st GET request read 0-20M, the 2nd GET request read 0-40M, 
> the 3rd GET request read 0-60M, ..., the 50th GET request read 0-1000M. So 
> the GET  request from 1st-50th become slower and slower.
> You can also refer it [here|https://issues.apache.org/jira/browse/IO-203] why 
> IOUtils implement skip by read rather than real skip, e.g. seek.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to