[ https://issues.apache.org/jira/browse/HADOOP-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197663#comment-16197663 ]
Steve Loughran commented on HADOOP-14937: ----------------------------------------- Steven: assigned to you. Bear in mind that HTTP1.1 connections are pooled, so there will be an initial cost of setting up an HTTP1.1 connection for that first connection (+DNS lookup, TCP routing for long-haul); Smaller blocks may get uploaded with reuse of the pool of http connections, eliminate setup cost Also: the FS statistics will track #of active uploads from a stream, though that's only tracked in stream-level stats, not in the published FS statistics > initial part uploads seem to block unnecessarily in S3ABlockOutputStream > ------------------------------------------------------------------------ > > Key: HADOOP-14937 > URL: https://issues.apache.org/jira/browse/HADOOP-14937 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 > Affects Versions: 3.0.0-beta1 > Reporter: Steven Rand > Assignee: Steven Rand > Attachments: yjp_threads.png > > > From looking at a YourKit snapshot of an FsShell process running a {{hadoop > fs -put file:///... s3a://...}}, it seems that the first part in the > multipart upload doesn't begin to upload until n of the > {{s3a-transfer-shared-pool}} threads are able to start uploading, where n is > the value of {{fs.s3a.fast.upload.active.blocks}}. > To hopefully clarify a bit, the series of events that I expected to see with > {{fs.s3a.fast.upload.active.blocks}} set to 4 is: > 1. An amount of data equal to {{fs.s3a.multipart.size}} is buffered into > off-heap memory (I have {{fs.s3a.fast.upload.buffer = bytebuffer}}). > 2. As soon as that happens, a thread begins to upload that part. Meanwhile, > the main thread continues to buffer data into off-heap memory. > 3. Once another part has been buffered into off-heap memory, a separate > thread uploads that part, and so on. > Whereas what I think the YK snapshot shows happening is: > 1. An amount of data equal to {{fs.s3a.multipart.size}} * 4 is buffered into > off-heap memory. > 2. Four threads start to upload one part each at the same time. > I've attached a picture of the "Threads" tab to show what I mean. Basically > the times at which the first four {{s3a-transfer-shared-pool}} threads start > to upload are roughly the same, whereas I would've expected them to be more > staggered. > I'm actually not sure whether this is the expected behavior or not, so feel > free to close if this doesn't come as a surprise to anyone. > For some context, I've been trying to get a sense for roughly which values of > {{fs.s3a.multipart.size}} perform the best at different file sizes. One thing > that I found confusing is that a part size of 5 MB seems to outperform a part > size of 64 MB up until files that are upwards of about 500 MB in size. This > seems odd, since each {{uploadPart}} call is its own HTTP request, and I > would've expected the overhead of those to become costly at small part sizes. > My suspicion is that with 4 concurrent part uploads and 64 MB blocks, we have > to wait until 256 MB are buffered before we can start uploading, while with 5 > MB blocks we can start uploading as soon as we buffer 20 MB, and that's what > gives the smaller parts the advantage for smaller files. > I'm happy to submit a patch if this is in fact a problem, but wanted to check > to make sure I'm not just misunderstanding something. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org