[ 
https://issues.apache.org/jira/browse/HADOOP-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336378#comment-15336378
 ] 

Steve Loughran commented on HADOOP-13203:
-----------------------------------------

Performance of the HADOOP-13286 patch
{code}
testDecompression128K: Decompress with a 128K readahead

2016-06-17 17:14:57,072 [Thread-0] INFO  compress.CodecPool 
(CodecPool.java:getDecompressor(181)) - Got brand-new decompressor [.gz]
2016-06-17 17:15:32,986 [Thread-0] INFO  contract.ContractTestUtils 
(ContractTestUtils.java:end(1262)) - Duration of Time to read 514690 lines 
[99896260 bytes expanded, 22633778 raw] with readahead = 131072: 36,078,064,490 
nS
2016-06-17 17:15:32,986 [Thread-0] INFO  scale.TestS3AInputStreamPerformance 
(TestS3AInputStreamPerformance.java:logTimePerIOP(144)) - Time per IOP: 70,096 
nS
2016-06-17 17:15:32,987 [Thread-0] INFO  scale.TestS3AInputStreamPerformance 
(TestS3AInputStreamPerformance.java:logStreamStatistics(306)) - Stream 
Statistics
StreamStatistics{OpenOperations=175, CloseOperations=175, Closed=175, 
Aborted=0, SeekOperations=0, ReadExceptions=0, ForwardSeekOperations=0, 
BackwardSeekOperations=0, BytesSkippedOnSeek=0, BytesBackwardsOnSeek=0, 
BytesRead=22633778, BytesRead excluding skipped=22633778, ReadOperations=6680, 
ReadFullyOperations=0, ReadsIncomplete=1583}
{code}

> S3a: Consider reducing the number of connection aborts by setting correct 
> length in s3 request
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13203
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13203
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>         Attachments: HADOOP-13203-branch-2-001.patch, 
> HADOOP-13203-branch-2-002.patch, HADOOP-13203-branch-2-003.patch, 
> HADOOP-13203-branch-2-004.patch, stream_stats.tar.gz
>
>
> Currently file's "contentLength" is set as the "requestedStreamLen", when 
> invoking S3AInputStream::reopen().  As a part of lazySeek(), sometimes the 
> stream had to be closed and reopened. But lots of times the stream was closed 
> with abort() causing the internal http connection to be unusable. This incurs 
> lots of connection establishment cost in some jobs.  It would be good to set 
> the correct value for the stream length to avoid connection aborts. 
> I will post the patch once aws tests passes in my machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to