[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390932#comment-16390932 ]
genericqa commented on HADOOP-14999: ------------------------------------ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 3s{color} | {color:blue} The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HADOOP-14999 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-14999 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913562/diff-between-patch7-and-patch8.txt | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14282/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > ------------------------------------------------------------------------ > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss > Affects Versions: 3.0.0-beta1 > Reporter: Genmao Yu > Assignee: Genmao Yu > Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, > asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org