[ https://issues.apache.org/jira/browse/HDDS-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
runzhiwang updated HDDS-3223: ----------------------------- Description: *What's the problem ?* Read a 300M file, it cost about 25 seconds, i.e. 12M/s, which is too slow. Then I capture the packet. You can see from the image, read a 300M file need 10 GET requests, each GET request read about 30M. The first GET request cost about 1 second, but the 10th GET request cost about 23 seconds. !screenshot-1.png! *What's the reason ?* When do GET, the stack is: [IOUtils::copyLarge|https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java#L262] -> [IOUtils::skipFully|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1190] -> [IOUtils::skip|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L2064] -> [InputStream::read|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1957]. It means, the 10th GET request which should read 270M-300M, but to skip 0-270M, it also [InputStream::read|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1957] 0-270M. So the GET request become slower and slower You can also refer it [here|https://issues.apache.org/jira/browse/IO-355] why IOUtils implement skip by read rather than real skip, e.g. seek. was: *What's the problem ?* Read a 300M file, it cost about 25 seconds, i.e. 12M/s, which is too slow. Then I capture the packet. You can see from the image, read a 300M file need 10 GET requests, each GET request read about 30M. The first GET request cost about 1 second, but the 10th GET request cost about 23 seconds. !screenshot-1.png! *What's the reason ?* When do GET, the stack is: [IOUtils::copyLarge|https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java#L262] -> [IOUtils::skipFully|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1190] -> [IOUtils::skip|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L2064] -> [InputStream::read|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1957]. It means, the 10th GET request which should read 270M-300M, but to skip 0-270M, it also [InputStream::read|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1957] 0-270M. So the GET request become slower and slower > S3g become slower when read bigger object for error use of skip > --------------------------------------------------------------- > > Key: HDDS-3223 > URL: https://issues.apache.org/jira/browse/HDDS-3223 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Reporter: runzhiwang > Assignee: runzhiwang > Priority: Critical > Attachments: screenshot-1.png > > > *What's the problem ?* > Read a 300M file, it cost about 25 seconds, i.e. 12M/s, which is too slow. > Then I capture the packet. You can see from the image, read a 300M file need > 10 GET requests, each GET request read about 30M. > The first GET request cost about 1 second, but the 10th GET request cost > about 23 seconds. > !screenshot-1.png! > *What's the reason ?* > When do GET, the stack is: > [IOUtils::copyLarge|https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java#L262] > -> > [IOUtils::skipFully|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1190] > -> > [IOUtils::skip|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L2064] > -> > [InputStream::read|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1957]. > It means, the 10th GET request which should read 270M-300M, but to skip > 0-270M, it also > [InputStream::read|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1957] > 0-270M. So the GET request become slower and slower > You can also refer it [here|https://issues.apache.org/jira/browse/IO-355] why > IOUtils implement skip by read rather than real skip, e.g. seek. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org