[ 
https://issues.apache.org/jira/browse/HADOOP-13871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15733274#comment-15733274
 ] 

Steve Loughran commented on HADOOP-13871:
-----------------------------------------

Video appears to show that everything is coming in out of order. 
https://youtu.be/4lJnknNtZNI

Having a tight timeout and expanded rx buffer isn't enough, as there's enough 
capacity for OOO packets to be buffered, so less discarded, hence pauses 
waiting for other bits of data to get resent.

This almost argues for a smaller Rx buffer so it blockes faster, triggering 
timeouts. But that's some bits of TCP there that I'm not knowledgeable of. 
After all: ooo packets do appear to be arriving, so the channel is live, just 
not delivering data to the caller.

We could collect effective read stats (as the test does) within the input 
stream...just do nanotime counts before and after each read, and build up 
long-term stats of the current stream, which can then be queried. although they 
could be aggregated, I tried that with the output and it doesn't work: if a 
network is full due to to many parallel connections, the effective bandwidth of 
each one is low; aggregating via just total bytes/total elapsed time doesn't 
work, as it generates false statistics implying very low bandwidth, rather than 
a saturated, shared, network link

> ITestS3AInputStreamPerformance.testTimeToOpenAndReadWholeFileBlocks 
> performance on branch-2.8 awful
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13871
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13871
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0, 2.9.0, 3.0.0-alpha2
>         Environment: landsat bucket from the UK
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13871-001.patch, 
> org.apache.hadoop.fs.s3a.scale.ITestS3AInputStreamPerformance-output.txt
>
>
> The ITestS3AInputStreamPerformance.testTimeToOpenAndReadWholeFileBlocks takes 
> 15s on branch-2, but is now taking minutes.
> This is a regression, and it's surfacing on some internal branches too. Even 
> where the code hasn't changed. -It does not happen on branch-2, which has a 
> later SDK.-



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to