[ https://issues.apache.org/jira/browse/HADOOP-13871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15733274#comment-15733274 ]
Steve Loughran commented on HADOOP-13871: ----------------------------------------- Video appears to show that everything is coming in out of order. https://youtu.be/4lJnknNtZNI Having a tight timeout and expanded rx buffer isn't enough, as there's enough capacity for OOO packets to be buffered, so less discarded, hence pauses waiting for other bits of data to get resent. This almost argues for a smaller Rx buffer so it blockes faster, triggering timeouts. But that's some bits of TCP there that I'm not knowledgeable of. After all: ooo packets do appear to be arriving, so the channel is live, just not delivering data to the caller. We could collect effective read stats (as the test does) within the input stream...just do nanotime counts before and after each read, and build up long-term stats of the current stream, which can then be queried. although they could be aggregated, I tried that with the output and it doesn't work: if a network is full due to to many parallel connections, the effective bandwidth of each one is low; aggregating via just total bytes/total elapsed time doesn't work, as it generates false statistics implying very low bandwidth, rather than a saturated, shared, network link > ITestS3AInputStreamPerformance.testTimeToOpenAndReadWholeFileBlocks > performance on branch-2.8 awful > --------------------------------------------------------------------------------------------------- > > Key: HADOOP-13871 > URL: https://issues.apache.org/jira/browse/HADOOP-13871 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 2.8.0, 2.9.0, 3.0.0-alpha2 > Environment: landsat bucket from the UK > Reporter: Steve Loughran > Assignee: Steve Loughran > Attachments: HADOOP-13871-001.patch, > org.apache.hadoop.fs.s3a.scale.ITestS3AInputStreamPerformance-output.txt > > > The ITestS3AInputStreamPerformance.testTimeToOpenAndReadWholeFileBlocks takes > 15s on branch-2, but is now taking minutes. > This is a regression, and it's surfacing on some internal branches too. Even > where the code hasn't changed. -It does not happen on branch-2, which has a > later SDK.- -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org