[jira] [Commented] (HADOOP-15607) AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream

2018-07-30 Thread SammiChen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563125#comment-16563125
 ] 

SammiChen commented on HADOOP-15607:


[~wujinhu], have you tried building the branch-2 locally with the patch 
applied?  It seems the build system has some issues on branch-2. 

> AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream 
> -
>
> Key: HADOOP-15607
> URL: https://issues.apache.org/jira/browse/HADOOP-15607
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0, 2.10.0, 2.9.1, 3.2.0, 3.0.3
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: HADOOP-15607-branch-2.001.patch, HADOOP-15607.001.patch, 
> HADOOP-15607.002.patch, HADOOP-15607.003.patch, HADOOP-15607.004.patch
>
>
> When I generated data with hive-tpcds tool, I got exception below:
> 2018-07-16 14:50:43,680 INFO mapreduce.Job: Task Id : 
> attempt_1531723399698_0001_m_52_0, Status : FAILED
>  Error: com.aliyun.oss.OSSException: The list of parts was not in ascending 
> order. Parts list must specified in order by part number.
>  [ErrorCode]: InvalidPartOrder
>  [RequestId]: 5B4C40425FCC208D79D1EAF5
>  [HostId]: 100.103.0.137
>  [ResponseError]:
>  
>  
>  InvalidPartOrder
>  The list of parts was not in ascending order. Parts list must 
> specified in order by part number.
>  5B4C40425FCC208D79D1EAF5
>  xx.xx.xx.xx
>  current PartNumber 3, you given part number 3is not in 
> ascending order
>  
> at 
> com.aliyun.oss.common.utils.ExceptionFactory.createOSSException(ExceptionFactory.java:99)
>  at 
> com.aliyun.oss.internal.OSSErrorResponseHandler.handle(OSSErrorResponseHandler.java:69)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.handleResponse(ServiceClient.java:248)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequestImpl(ServiceClient.java:130)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequest(ServiceClient.java:68)
>  at com.aliyun.oss.internal.OSSOperation.send(OSSOperation.java:94)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:149)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:113)
>  at 
> com.aliyun.oss.internal.OSSMultipartOperation.completeMultipartUpload(OSSMultipartOperation.java:185)
>  at com.aliyun.oss.OSSClient.completeMultipartUpload(OSSClient.java:790)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.completeMultipartUpload(AliyunOSSFileSystemStore.java:643)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSBlockOutputStream.close(AliyunOSSBlockOutputStream.java:120)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>  at 
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:106)
>  at 
> org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.close(MultipleOutputs.java:574)
>  at org.notmysock.tpcds.GenTable$DSDGen.cleanup(GenTable.java:169)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:149)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>  
> I reviewed code below, 
> {code:java}
> blockId {code}
> has thread synchronization problem
> {code:java}
> // code placeholder
> private void uploadCurrentPart() throws IOException {
>   blockFiles.add(blockFile);
>   blockStream.flush();
>   blockStream.close();
>   if (blockId == 0) {
> uploadId = store.getUploadId(key);
>   }
>   ListenableFuture partETagFuture =
>   executorService.submit(() -> {
> PartETag partETag = store.uploadPart(blockFile, key, uploadId,
> blockId + 1);
> return partETag;
>   });
>   partETagsFutures.add(partETagFuture);
>   blockFile = newBlockFile();
>   blockId++;
>   blockStream = new BufferedOutputStream(new FileOutputStream(blockFile));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15607) AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream

2018-07-29 Thread SammiChen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15607:
---
Fix Version/s: 3.0.4
   3.2.0
   3.1.2

> AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream 
> -
>
> Key: HADOOP-15607
> URL: https://issues.apache.org/jira/browse/HADOOP-15607
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0, 2.10.0, 2.9.1, 3.2.0, 3.0.3
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: HADOOP-15607.001.patch, HADOOP-15607.002.patch, 
> HADOOP-15607.003.patch, HADOOP-15607.004.patch
>
>
> When I generated data with hive-tpcds tool, I got exception below:
> 2018-07-16 14:50:43,680 INFO mapreduce.Job: Task Id : 
> attempt_1531723399698_0001_m_52_0, Status : FAILED
>  Error: com.aliyun.oss.OSSException: The list of parts was not in ascending 
> order. Parts list must specified in order by part number.
>  [ErrorCode]: InvalidPartOrder
>  [RequestId]: 5B4C40425FCC208D79D1EAF5
>  [HostId]: 100.103.0.137
>  [ResponseError]:
>  
>  
>  InvalidPartOrder
>  The list of parts was not in ascending order. Parts list must 
> specified in order by part number.
>  5B4C40425FCC208D79D1EAF5
>  xx.xx.xx.xx
>  current PartNumber 3, you given part number 3is not in 
> ascending order
>  
> at 
> com.aliyun.oss.common.utils.ExceptionFactory.createOSSException(ExceptionFactory.java:99)
>  at 
> com.aliyun.oss.internal.OSSErrorResponseHandler.handle(OSSErrorResponseHandler.java:69)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.handleResponse(ServiceClient.java:248)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequestImpl(ServiceClient.java:130)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequest(ServiceClient.java:68)
>  at com.aliyun.oss.internal.OSSOperation.send(OSSOperation.java:94)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:149)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:113)
>  at 
> com.aliyun.oss.internal.OSSMultipartOperation.completeMultipartUpload(OSSMultipartOperation.java:185)
>  at com.aliyun.oss.OSSClient.completeMultipartUpload(OSSClient.java:790)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.completeMultipartUpload(AliyunOSSFileSystemStore.java:643)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSBlockOutputStream.close(AliyunOSSBlockOutputStream.java:120)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>  at 
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:106)
>  at 
> org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.close(MultipleOutputs.java:574)
>  at org.notmysock.tpcds.GenTable$DSDGen.cleanup(GenTable.java:169)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:149)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>  
> I reviewed code below, 
> {code:java}
> blockId {code}
> has thread synchronization problem
> {code:java}
> // code placeholder
> private void uploadCurrentPart() throws IOException {
>   blockFiles.add(blockFile);
>   blockStream.flush();
>   blockStream.close();
>   if (blockId == 0) {
> uploadId = store.getUploadId(key);
>   }
>   ListenableFuture partETagFuture =
>   executorService.submit(() -> {
> PartETag partETag = store.uploadPart(blockFile, key, uploadId,
> blockId + 1);
> return partETag;
>   });
>   partETagsFutures.add(partETagFuture);
>   blockFile = newBlockFile();
>   blockId++;
>   blockStream = new BufferedOutputStream(new FileOutputStream(blockFile));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15607) AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream

2018-07-29 Thread SammiChen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561380#comment-16561380
 ] 

SammiChen commented on HADOOP-15607:


Hi [~wujinhu],  would you provide a patch for branch-2. It reports conflict 
when I apply the current patch to branch-2. 

> AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream 
> -
>
> Key: HADOOP-15607
> URL: https://issues.apache.org/jira/browse/HADOOP-15607
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0, 2.10.0, 2.9.1, 3.2.0, 3.0.3
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15607.001.patch, HADOOP-15607.002.patch, 
> HADOOP-15607.003.patch, HADOOP-15607.004.patch
>
>
> When I generated data with hive-tpcds tool, I got exception below:
> 2018-07-16 14:50:43,680 INFO mapreduce.Job: Task Id : 
> attempt_1531723399698_0001_m_52_0, Status : FAILED
>  Error: com.aliyun.oss.OSSException: The list of parts was not in ascending 
> order. Parts list must specified in order by part number.
>  [ErrorCode]: InvalidPartOrder
>  [RequestId]: 5B4C40425FCC208D79D1EAF5
>  [HostId]: 100.103.0.137
>  [ResponseError]:
>  
>  
>  InvalidPartOrder
>  The list of parts was not in ascending order. Parts list must 
> specified in order by part number.
>  5B4C40425FCC208D79D1EAF5
>  xx.xx.xx.xx
>  current PartNumber 3, you given part number 3is not in 
> ascending order
>  
> at 
> com.aliyun.oss.common.utils.ExceptionFactory.createOSSException(ExceptionFactory.java:99)
>  at 
> com.aliyun.oss.internal.OSSErrorResponseHandler.handle(OSSErrorResponseHandler.java:69)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.handleResponse(ServiceClient.java:248)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequestImpl(ServiceClient.java:130)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequest(ServiceClient.java:68)
>  at com.aliyun.oss.internal.OSSOperation.send(OSSOperation.java:94)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:149)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:113)
>  at 
> com.aliyun.oss.internal.OSSMultipartOperation.completeMultipartUpload(OSSMultipartOperation.java:185)
>  at com.aliyun.oss.OSSClient.completeMultipartUpload(OSSClient.java:790)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.completeMultipartUpload(AliyunOSSFileSystemStore.java:643)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSBlockOutputStream.close(AliyunOSSBlockOutputStream.java:120)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>  at 
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:106)
>  at 
> org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.close(MultipleOutputs.java:574)
>  at org.notmysock.tpcds.GenTable$DSDGen.cleanup(GenTable.java:169)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:149)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>  
> I reviewed code below, 
> {code:java}
> blockId {code}
> has thread synchronization problem
> {code:java}
> // code placeholder
> private void uploadCurrentPart() throws IOException {
>   blockFiles.add(blockFile);
>   blockStream.flush();
>   blockStream.close();
>   if (blockId == 0) {
> uploadId = store.getUploadId(key);
>   }
>   ListenableFuture partETagFuture =
>   executorService.submit(() -> {
> PartETag partETag = store.uploadPart(blockFile, key, uploadId,
> blockId + 1);
> return partETag;
>   });
>   partETagsFutures.add(partETagFuture);
>   blockFile = newBlockFile();
>   blockId++;
>   blockStream = new BufferedOutputStream(new FileOutputStream(blockFile));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15607) AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream

2018-07-27 Thread SammiChen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559456#comment-16559456
 ] 

SammiChen commented on HADOOP-15607:


+1.  Thanks [~wujinhu] for the contribution. I will commit after the build 
comes out. 

> AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream 
> -
>
> Key: HADOOP-15607
> URL: https://issues.apache.org/jira/browse/HADOOP-15607
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0, 2.10.0, 2.9.1, 3.2.0, 3.0.3
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15607.001.patch, HADOOP-15607.002.patch, 
> HADOOP-15607.003.patch, HADOOP-15607.004.patch
>
>
> When I generated data with hive-tpcds tool, I got exception below:
> 2018-07-16 14:50:43,680 INFO mapreduce.Job: Task Id : 
> attempt_1531723399698_0001_m_52_0, Status : FAILED
>  Error: com.aliyun.oss.OSSException: The list of parts was not in ascending 
> order. Parts list must specified in order by part number.
>  [ErrorCode]: InvalidPartOrder
>  [RequestId]: 5B4C40425FCC208D79D1EAF5
>  [HostId]: 100.103.0.137
>  [ResponseError]:
>  
>  
>  InvalidPartOrder
>  The list of parts was not in ascending order. Parts list must 
> specified in order by part number.
>  5B4C40425FCC208D79D1EAF5
>  xx.xx.xx.xx
>  current PartNumber 3, you given part number 3is not in 
> ascending order
>  
> at 
> com.aliyun.oss.common.utils.ExceptionFactory.createOSSException(ExceptionFactory.java:99)
>  at 
> com.aliyun.oss.internal.OSSErrorResponseHandler.handle(OSSErrorResponseHandler.java:69)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.handleResponse(ServiceClient.java:248)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequestImpl(ServiceClient.java:130)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequest(ServiceClient.java:68)
>  at com.aliyun.oss.internal.OSSOperation.send(OSSOperation.java:94)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:149)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:113)
>  at 
> com.aliyun.oss.internal.OSSMultipartOperation.completeMultipartUpload(OSSMultipartOperation.java:185)
>  at com.aliyun.oss.OSSClient.completeMultipartUpload(OSSClient.java:790)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.completeMultipartUpload(AliyunOSSFileSystemStore.java:643)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSBlockOutputStream.close(AliyunOSSBlockOutputStream.java:120)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>  at 
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:106)
>  at 
> org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.close(MultipleOutputs.java:574)
>  at org.notmysock.tpcds.GenTable$DSDGen.cleanup(GenTable.java:169)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:149)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>  
> I reviewed code below, 
> {code:java}
> blockId {code}
> has thread synchronization problem
> {code:java}
> // code placeholder
> private void uploadCurrentPart() throws IOException {
>   blockFiles.add(blockFile);
>   blockStream.flush();
>   blockStream.close();
>   if (blockId == 0) {
> uploadId = store.getUploadId(key);
>   }
>   ListenableFuture partETagFuture =
>   executorService.submit(() -> {
> PartETag partETag = store.uploadPart(blockFile, key, uploadId,
> blockId + 1);
> return partETag;
>   });
>   partETagsFutures.add(partETagFuture);
>   blockFile = newBlockFile();
>   blockId++;
>   blockStream = new BufferedOutputStream(new FileOutputStream(blockFile));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15607) AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream

2018-07-27 Thread SammiChen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559315#comment-16559315
 ] 

SammiChen edited comment on HADOOP-15607 at 7/27/18 6:32 AM:
-

Hi [~wujinhu], the 003 patch looks overall good. A few minor issues, 
 # testMultiPartUploadConcurrent, cross line indent is 4 space, not 8 space
 # conf.setInt("fs.oss.upload.active.blocks", 20);  Use the Constants field 
 # conf.setInt(IO_CHUNK_BUFFER_SIZE, 
conf.getInt(Constants.MULTIPART_UPLOAD_PART_SIZE_KEY, 0));   line exceeds 80 
characters

The newly added UT really helps verify the issues is fixed. 


was (Author: sammi):
Hi [~wujinhu], the 003 patch looks overall good. A few minor issues, 
 # testMultiPartUploadConcurrent, cross line indent is 4 space, not 8 space
 # conf.setInt("fs.oss.upload.active.blocks", 20);  Use the Constants field 
 # conf.setInt(IO_CHUNK_BUFFER_SIZE, 
conf.getInt(Constants.MULTIPART_UPLOAD_PART_SIZE_KEY, 0));   line exceeds 80 
characters
 # removePartFiles is also called in uploadCurrentPart. It can help to remove 
the temp files as soon as possible. But if a temp file is deleted, it's 
corresponding partETagFuture is not removed from the partETagsFutures. You will 
find many misleading "Failed to delete temporary file {}" messages later in the 
log file.

The newly added UT really helps verify the issues is fixed. 

> AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream 
> -
>
> Key: HADOOP-15607
> URL: https://issues.apache.org/jira/browse/HADOOP-15607
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.10.0, 2.9.1, 3.2.0, 3.0.3
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15607.001.patch, HADOOP-15607.002.patch, 
> HADOOP-15607.003.patch
>
>
> When I generated data with hive-tpcds tool, I got exception below:
> 2018-07-16 14:50:43,680 INFO mapreduce.Job: Task Id : 
> attempt_1531723399698_0001_m_52_0, Status : FAILED
>  Error: com.aliyun.oss.OSSException: The list of parts was not in ascending 
> order. Parts list must specified in order by part number.
>  [ErrorCode]: InvalidPartOrder
>  [RequestId]: 5B4C40425FCC208D79D1EAF5
>  [HostId]: 100.103.0.137
>  [ResponseError]:
>  
>  
>  InvalidPartOrder
>  The list of parts was not in ascending order. Parts list must 
> specified in order by part number.
>  5B4C40425FCC208D79D1EAF5
>  xx.xx.xx.xx
>  current PartNumber 3, you given part number 3is not in 
> ascending order
>  
> at 
> com.aliyun.oss.common.utils.ExceptionFactory.createOSSException(ExceptionFactory.java:99)
>  at 
> com.aliyun.oss.internal.OSSErrorResponseHandler.handle(OSSErrorResponseHandler.java:69)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.handleResponse(ServiceClient.java:248)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequestImpl(ServiceClient.java:130)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequest(ServiceClient.java:68)
>  at com.aliyun.oss.internal.OSSOperation.send(OSSOperation.java:94)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:149)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:113)
>  at 
> com.aliyun.oss.internal.OSSMultipartOperation.completeMultipartUpload(OSSMultipartOperation.java:185)
>  at com.aliyun.oss.OSSClient.completeMultipartUpload(OSSClient.java:790)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.completeMultipartUpload(AliyunOSSFileSystemStore.java:643)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSBlockOutputStream.close(AliyunOSSBlockOutputStream.java:120)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>  at 
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:106)
>  at 
> org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.close(MultipleOutputs.java:574)
>  at org.notmysock.tpcds.GenTable$DSDGen.cleanup(GenTable.java:169)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:149)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>  
> I reviewed code below, 
> {code:java}
> blockId {code}
> has thread synchronization problem
> {code:java}
> // code placeholder
> private void uploadCurrentPart() throws IOException {
>   blockFiles.add(blockFile);
>   

[jira] [Commented] (HADOOP-15607) AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream

2018-07-27 Thread SammiChen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559315#comment-16559315
 ] 

SammiChen commented on HADOOP-15607:


Hi [~wujinhu], the 003 patch looks overall good. A few minor issues, 
 # testMultiPartUploadConcurrent, cross line indent is 4 space, not 8 space
 # conf.setInt("fs.oss.upload.active.blocks", 20);  Use the Constants field 
 # conf.setInt(IO_CHUNK_BUFFER_SIZE, 
conf.getInt(Constants.MULTIPART_UPLOAD_PART_SIZE_KEY, 0));   line exceeds 80 
characters
 # removePartFiles is also called in uploadCurrentPart. It can help to remove 
the temp files as soon as possible. But if a temp file is deleted, it's 
corresponding partETagFuture is not removed from the partETagsFutures. You will 
find many misleading "Failed to delete temporary file {}" messages later in the 
log file.

The newly added UT really helps verify the issues is fixed. 

> AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream 
> -
>
> Key: HADOOP-15607
> URL: https://issues.apache.org/jira/browse/HADOOP-15607
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.10.0, 2.9.1, 3.2.0, 3.0.3
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15607.001.patch, HADOOP-15607.002.patch, 
> HADOOP-15607.003.patch
>
>
> When I generated data with hive-tpcds tool, I got exception below:
> 2018-07-16 14:50:43,680 INFO mapreduce.Job: Task Id : 
> attempt_1531723399698_0001_m_52_0, Status : FAILED
>  Error: com.aliyun.oss.OSSException: The list of parts was not in ascending 
> order. Parts list must specified in order by part number.
>  [ErrorCode]: InvalidPartOrder
>  [RequestId]: 5B4C40425FCC208D79D1EAF5
>  [HostId]: 100.103.0.137
>  [ResponseError]:
>  
>  
>  InvalidPartOrder
>  The list of parts was not in ascending order. Parts list must 
> specified in order by part number.
>  5B4C40425FCC208D79D1EAF5
>  xx.xx.xx.xx
>  current PartNumber 3, you given part number 3is not in 
> ascending order
>  
> at 
> com.aliyun.oss.common.utils.ExceptionFactory.createOSSException(ExceptionFactory.java:99)
>  at 
> com.aliyun.oss.internal.OSSErrorResponseHandler.handle(OSSErrorResponseHandler.java:69)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.handleResponse(ServiceClient.java:248)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequestImpl(ServiceClient.java:130)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequest(ServiceClient.java:68)
>  at com.aliyun.oss.internal.OSSOperation.send(OSSOperation.java:94)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:149)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:113)
>  at 
> com.aliyun.oss.internal.OSSMultipartOperation.completeMultipartUpload(OSSMultipartOperation.java:185)
>  at com.aliyun.oss.OSSClient.completeMultipartUpload(OSSClient.java:790)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.completeMultipartUpload(AliyunOSSFileSystemStore.java:643)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSBlockOutputStream.close(AliyunOSSBlockOutputStream.java:120)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>  at 
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:106)
>  at 
> org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.close(MultipleOutputs.java:574)
>  at org.notmysock.tpcds.GenTable$DSDGen.cleanup(GenTable.java:169)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:149)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>  
> I reviewed code below, 
> {code:java}
> blockId {code}
> has thread synchronization problem
> {code:java}
> // code placeholder
> private void uploadCurrentPart() throws IOException {
>   blockFiles.add(blockFile);
>   blockStream.flush();
>   blockStream.close();
>   if (blockId == 0) {
> uploadId = store.getUploadId(key);
>   }
>   ListenableFuture partETagFuture =
>   executorService.submit(() -> {
> PartETag partETag = store.uploadPart(blockFile, key, uploadId,
> blockId + 1);
> return partETag;
>   });
>   partETagsFutures.add(partETagFuture);
>   blockFile = newBlockFile();
>   blockId++;
>   blockStream = new BufferedOutputStream(new 

[jira] [Updated] (HADOOP-15607) AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream

2018-07-26 Thread SammiChen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15607:
---
Affects Version/s: (was: 3.1.1)

> AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream 
> -
>
> Key: HADOOP-15607
> URL: https://issues.apache.org/jira/browse/HADOOP-15607
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.10.0, 2.9.1, 3.2.0, 3.0.3
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15607.001.patch, HADOOP-15607.002.patch, 
> HADOOP-15607.003.patch
>
>
> When I generated data with hive-tpcds tool, I got exception below:
> 2018-07-16 14:50:43,680 INFO mapreduce.Job: Task Id : 
> attempt_1531723399698_0001_m_52_0, Status : FAILED
>  Error: com.aliyun.oss.OSSException: The list of parts was not in ascending 
> order. Parts list must specified in order by part number.
>  [ErrorCode]: InvalidPartOrder
>  [RequestId]: 5B4C40425FCC208D79D1EAF5
>  [HostId]: 100.103.0.137
>  [ResponseError]:
>  
>  
>  InvalidPartOrder
>  The list of parts was not in ascending order. Parts list must 
> specified in order by part number.
>  5B4C40425FCC208D79D1EAF5
>  xx.xx.xx.xx
>  current PartNumber 3, you given part number 3is not in 
> ascending order
>  
> at 
> com.aliyun.oss.common.utils.ExceptionFactory.createOSSException(ExceptionFactory.java:99)
>  at 
> com.aliyun.oss.internal.OSSErrorResponseHandler.handle(OSSErrorResponseHandler.java:69)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.handleResponse(ServiceClient.java:248)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequestImpl(ServiceClient.java:130)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequest(ServiceClient.java:68)
>  at com.aliyun.oss.internal.OSSOperation.send(OSSOperation.java:94)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:149)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:113)
>  at 
> com.aliyun.oss.internal.OSSMultipartOperation.completeMultipartUpload(OSSMultipartOperation.java:185)
>  at com.aliyun.oss.OSSClient.completeMultipartUpload(OSSClient.java:790)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.completeMultipartUpload(AliyunOSSFileSystemStore.java:643)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSBlockOutputStream.close(AliyunOSSBlockOutputStream.java:120)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>  at 
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:106)
>  at 
> org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.close(MultipleOutputs.java:574)
>  at org.notmysock.tpcds.GenTable$DSDGen.cleanup(GenTable.java:169)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:149)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>  
> I reviewed code below, 
> {code:java}
> blockId {code}
> has thread synchronization problem
> {code:java}
> // code placeholder
> private void uploadCurrentPart() throws IOException {
>   blockFiles.add(blockFile);
>   blockStream.flush();
>   blockStream.close();
>   if (blockId == 0) {
> uploadId = store.getUploadId(key);
>   }
>   ListenableFuture partETagFuture =
>   executorService.submit(() -> {
> PartETag partETag = store.uploadPart(blockFile, key, uploadId,
> blockId + 1);
> return partETag;
>   });
>   partETagsFutures.add(partETagFuture);
>   blockFile = newBlockFile();
>   blockId++;
>   blockStream = new BufferedOutputStream(new FileOutputStream(blockFile));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15607) AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream

2018-07-19 Thread SammiChen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550151#comment-16550151
 ] 

SammiChen commented on HADOOP-15607:


Thanks [~wujinhu] for reporting and working on it.  Can you explain more detail 
about how the duplicate part numbers are generated.

Is the blockFiles type change from  List to Map to 
guarantee no duplicate part number?  A unit test case is preferred to verify 
the issue.

> AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream 
> -
>
> Key: HADOOP-15607
> URL: https://issues.apache.org/jira/browse/HADOOP-15607
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.10.0, 2.9.1, 3.2.0, 3.1.1, 3.0.3
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15607.001.patch, HADOOP-15607.002.patch
>
>
> When I generated data with hive-tpcds tool, I got exception below:
> 2018-07-16 14:50:43,680 INFO mapreduce.Job: Task Id : 
> attempt_1531723399698_0001_m_52_0, Status : FAILED
> Error: com.aliyun.oss.OSSException: The list of parts was not in ascending 
> order. Parts list must specified in order by part number.
> [ErrorCode]: InvalidPartOrder
> [RequestId]: 5B4C40425FCC208D79D1EAF5
> [HostId]: 100.103.0.137
> [ResponseError]:
> 
> 
>  InvalidPartOrder
>  The list of parts was not in ascending order. Parts list must 
> specified in order by part number.
>  5B4C40425FCC208D79D1EAF5
>  100.103.0.137
>  current PartNumber 3, you given part number 3is not in 
> ascending order
> 
> at 
> com.aliyun.oss.common.utils.ExceptionFactory.createOSSException(ExceptionFactory.java:99)
>  at 
> com.aliyun.oss.internal.OSSErrorResponseHandler.handle(OSSErrorResponseHandler.java:69)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.handleResponse(ServiceClient.java:248)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequestImpl(ServiceClient.java:130)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequest(ServiceClient.java:68)
>  at com.aliyun.oss.internal.OSSOperation.send(OSSOperation.java:94)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:149)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:113)
>  at 
> com.aliyun.oss.internal.OSSMultipartOperation.completeMultipartUpload(OSSMultipartOperation.java:185)
>  at com.aliyun.oss.OSSClient.completeMultipartUpload(OSSClient.java:790)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.completeMultipartUpload(AliyunOSSFileSystemStore.java:643)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSBlockOutputStream.close(AliyunOSSBlockOutputStream.java:120)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>  at 
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:106)
>  at 
> org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.close(MultipleOutputs.java:574)
>  at org.notmysock.tpcds.GenTable$DSDGen.cleanup(GenTable.java:169)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:149)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>  
> I reviewed code below, 
> {code:java}
> blockId {code}
> has thread synchronization problem
> {code:java}
> // code placeholder
> private void uploadCurrentPart() throws IOException {
>   blockFiles.add(blockFile);
>   blockStream.flush();
>   blockStream.close();
>   if (blockId == 0) {
> uploadId = store.getUploadId(key);
>   }
>   ListenableFuture partETagFuture =
>   executorService.submit(() -> {
> PartETag partETag = store.uploadPart(blockFile, key, uploadId,
> blockId + 1);
> return partETag;
>   });
>   partETagsFutures.add(partETagFuture);
>   blockFile = newBlockFile();
>   blockId++;
>   blockStream = new BufferedOutputStream(new FileOutputStream(blockFile));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15499) Performance severe drop when running RawErasureCoderBenchmark with NativeRSRawErasureCoder

2018-06-11 Thread SammiChen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15499:
---
Fix Version/s: 3.0.4
   3.1.1
   3.2.0

> Performance severe drop when running RawErasureCoderBenchmark with 
> NativeRSRawErasureCoder
> --
>
> Key: HADOOP-15499
> URL: https://issues.apache.org/jira/browse/HADOOP-15499
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.1
>Reporter: SammiChen
>Assignee: SammiChen
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HADOOP-15499.001.patch, HADOOP-15499.002.patch
>
>
> Run RawErasureCoderBenchmark  which is a micro-benchmark to test EC codec 
> encoding/decoding performance. 
> 50 concurrency Native ISA-L coder has the less throughput than 1 concurrency 
> Native ISA-L case. It's abnormal. 
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 1 
> 1024 1024
> Using 126MB buffer.
> ISA-L coder encode 1008MB data, with chunk size 1024KB
> Total time: 0.19 s.
> Total throughput: 5390.37 MB/s
> Threads statistics:
> 1 threads in total.
> Min: 0.18 s, Max: 0.18 s, Avg: 0.18 s, 90th Percentile: 0.18 s.
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 
> 50 1024 10240
> Using 120MB buffer.
> ISA-L coder encode 54000MB data, with chunk size 10240KB
> Total time: 11.58 s.
> Total throughput: 4662 MB/s
> Threads statistics:
> 50 threads in total.
> Min: 0.55 s, Max: 11.5 s, Avg: 6.32 s, 90th Percentile: 10.45 s.
>  
> RawErasureCoderBenchmark shares a single coder between all concurrent 
> threads. While 
> NativeRSRawEncoder and NativeRSRawDecoder has synchronized key work on 
> doDecode and doEncode function. So 50 concurrent threads are forced to use 
> the shared coder encode/decode function one by one. 
>  
> To resolve the issue, there are two approaches. 
>  # Refactor RawErasureCoderBenchmark  to use dedicated coder for each 
> concurrent thread.
>  # Refactor NativeRSRawEncoder  and NativeRSRawDecoder  to get better 
> concurrency.  Since the synchronized key work is to try to protect the 
> private variable nativeCoder from being checked in doEncode/doDecode and  
> being modified in release.  We can use reentrantReadWriteLock to increase the 
> concurrency since doEncode/doDecode can be called multiple times without 
> change the nativeCoder state.
>  I prefer approach 2 and will upload a patch later. 
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15499) Performance severe drop when running RawErasureCoderBenchmark with NativeRSRawErasureCoder

2018-06-11 Thread SammiChen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507710#comment-16507710
 ] 

SammiChen commented on HADOOP-15499:


Thanks [~xiaochen] for the review.  Committed to trunk, branch-3.0 & 
branch-3.1.  

> Performance severe drop when running RawErasureCoderBenchmark with 
> NativeRSRawErasureCoder
> --
>
> Key: HADOOP-15499
> URL: https://issues.apache.org/jira/browse/HADOOP-15499
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.1
>Reporter: SammiChen
>Assignee: SammiChen
>Priority: Major
> Attachments: HADOOP-15499.001.patch, HADOOP-15499.002.patch
>
>
> Run RawErasureCoderBenchmark  which is a micro-benchmark to test EC codec 
> encoding/decoding performance. 
> 50 concurrency Native ISA-L coder has the less throughput than 1 concurrency 
> Native ISA-L case. It's abnormal. 
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 1 
> 1024 1024
> Using 126MB buffer.
> ISA-L coder encode 1008MB data, with chunk size 1024KB
> Total time: 0.19 s.
> Total throughput: 5390.37 MB/s
> Threads statistics:
> 1 threads in total.
> Min: 0.18 s, Max: 0.18 s, Avg: 0.18 s, 90th Percentile: 0.18 s.
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 
> 50 1024 10240
> Using 120MB buffer.
> ISA-L coder encode 54000MB data, with chunk size 10240KB
> Total time: 11.58 s.
> Total throughput: 4662 MB/s
> Threads statistics:
> 50 threads in total.
> Min: 0.55 s, Max: 11.5 s, Avg: 6.32 s, 90th Percentile: 10.45 s.
>  
> RawErasureCoderBenchmark shares a single coder between all concurrent 
> threads. While 
> NativeRSRawEncoder and NativeRSRawDecoder has synchronized key work on 
> doDecode and doEncode function. So 50 concurrent threads are forced to use 
> the shared coder encode/decode function one by one. 
>  
> To resolve the issue, there are two approaches. 
>  # Refactor RawErasureCoderBenchmark  to use dedicated coder for each 
> concurrent thread.
>  # Refactor NativeRSRawEncoder  and NativeRSRawDecoder  to get better 
> concurrency.  Since the synchronized key work is to try to protect the 
> private variable nativeCoder from being checked in doEncode/doDecode and  
> being modified in release.  We can use reentrantReadWriteLock to increase the 
> concurrency since doEncode/doDecode can be called multiple times without 
> change the nativeCoder state.
>  I prefer approach 2 and will upload a patch later. 
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15499) Performance severe drop when running RawErasureCoderBenchmark with NativeRSRawErasureCoder

2018-06-11 Thread SammiChen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15499:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Performance severe drop when running RawErasureCoderBenchmark with 
> NativeRSRawErasureCoder
> --
>
> Key: HADOOP-15499
> URL: https://issues.apache.org/jira/browse/HADOOP-15499
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.1
>Reporter: SammiChen
>Assignee: SammiChen
>Priority: Major
> Attachments: HADOOP-15499.001.patch, HADOOP-15499.002.patch
>
>
> Run RawErasureCoderBenchmark  which is a micro-benchmark to test EC codec 
> encoding/decoding performance. 
> 50 concurrency Native ISA-L coder has the less throughput than 1 concurrency 
> Native ISA-L case. It's abnormal. 
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 1 
> 1024 1024
> Using 126MB buffer.
> ISA-L coder encode 1008MB data, with chunk size 1024KB
> Total time: 0.19 s.
> Total throughput: 5390.37 MB/s
> Threads statistics:
> 1 threads in total.
> Min: 0.18 s, Max: 0.18 s, Avg: 0.18 s, 90th Percentile: 0.18 s.
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 
> 50 1024 10240
> Using 120MB buffer.
> ISA-L coder encode 54000MB data, with chunk size 10240KB
> Total time: 11.58 s.
> Total throughput: 4662 MB/s
> Threads statistics:
> 50 threads in total.
> Min: 0.55 s, Max: 11.5 s, Avg: 6.32 s, 90th Percentile: 10.45 s.
>  
> RawErasureCoderBenchmark shares a single coder between all concurrent 
> threads. While 
> NativeRSRawEncoder and NativeRSRawDecoder has synchronized key work on 
> doDecode and doEncode function. So 50 concurrent threads are forced to use 
> the shared coder encode/decode function one by one. 
>  
> To resolve the issue, there are two approaches. 
>  # Refactor RawErasureCoderBenchmark  to use dedicated coder for each 
> concurrent thread.
>  # Refactor NativeRSRawEncoder  and NativeRSRawDecoder  to get better 
> concurrency.  Since the synchronized key work is to try to protect the 
> private variable nativeCoder from being checked in doEncode/doDecode and  
> being modified in release.  We can use reentrantReadWriteLock to increase the 
> concurrency since doEncode/doDecode can be called multiple times without 
> change the nativeCoder state.
>  I prefer approach 2 and will upload a patch later. 
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15499) Performance severe drop when running RawErasureCoderBenchmark with NativeRSRawErasureCoder

2018-06-05 Thread SammiChen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502796#comment-16502796
 ] 

SammiChen commented on HADOOP-15499:


Thanks [~xiaochen] for the review and comments.  A new patch is uploaded after 
addressed all issues. 

> Performance severe drop when running RawErasureCoderBenchmark with 
> NativeRSRawErasureCoder
> --
>
> Key: HADOOP-15499
> URL: https://issues.apache.org/jira/browse/HADOOP-15499
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.1
>Reporter: SammiChen
>Assignee: SammiChen
>Priority: Major
> Attachments: HADOOP-15499.001.patch, HADOOP-15499.002.patch
>
>
> Run RawErasureCoderBenchmark  which is a micro-benchmark to test EC codec 
> encoding/decoding performance. 
> 50 concurrency Native ISA-L coder has the less throughput than 1 concurrency 
> Native ISA-L case. It's abnormal. 
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 1 
> 1024 1024
> Using 126MB buffer.
> ISA-L coder encode 1008MB data, with chunk size 1024KB
> Total time: 0.19 s.
> Total throughput: 5390.37 MB/s
> Threads statistics:
> 1 threads in total.
> Min: 0.18 s, Max: 0.18 s, Avg: 0.18 s, 90th Percentile: 0.18 s.
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 
> 50 1024 10240
> Using 120MB buffer.
> ISA-L coder encode 54000MB data, with chunk size 10240KB
> Total time: 11.58 s.
> Total throughput: 4662 MB/s
> Threads statistics:
> 50 threads in total.
> Min: 0.55 s, Max: 11.5 s, Avg: 6.32 s, 90th Percentile: 10.45 s.
>  
> RawErasureCoderBenchmark shares a single coder between all concurrent 
> threads. While 
> NativeRSRawEncoder and NativeRSRawDecoder has synchronized key work on 
> doDecode and doEncode function. So 50 concurrent threads are forced to use 
> the shared coder encode/decode function one by one. 
>  
> To resolve the issue, there are two approaches. 
>  # Refactor RawErasureCoderBenchmark  to use dedicated coder for each 
> concurrent thread.
>  # Refactor NativeRSRawEncoder  and NativeRSRawDecoder  to get better 
> concurrency.  Since the synchronized key work is to try to protect the 
> private variable nativeCoder from being checked in doEncode/doDecode and  
> being modified in release.  We can use reentrantReadWriteLock to increase the 
> concurrency since doEncode/doDecode can be called multiple times without 
> change the nativeCoder state.
>  I prefer approach 2 and will upload a patch later. 
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15499) Performance severe drop when running RawErasureCoderBenchmark with NativeRSRawErasureCoder

2018-06-05 Thread SammiChen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15499:
---
Attachment: HADOOP-15499.002.patch

> Performance severe drop when running RawErasureCoderBenchmark with 
> NativeRSRawErasureCoder
> --
>
> Key: HADOOP-15499
> URL: https://issues.apache.org/jira/browse/HADOOP-15499
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.1
>Reporter: SammiChen
>Assignee: SammiChen
>Priority: Major
> Attachments: HADOOP-15499.001.patch, HADOOP-15499.002.patch
>
>
> Run RawErasureCoderBenchmark  which is a micro-benchmark to test EC codec 
> encoding/decoding performance. 
> 50 concurrency Native ISA-L coder has the less throughput than 1 concurrency 
> Native ISA-L case. It's abnormal. 
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 1 
> 1024 1024
> Using 126MB buffer.
> ISA-L coder encode 1008MB data, with chunk size 1024KB
> Total time: 0.19 s.
> Total throughput: 5390.37 MB/s
> Threads statistics:
> 1 threads in total.
> Min: 0.18 s, Max: 0.18 s, Avg: 0.18 s, 90th Percentile: 0.18 s.
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 
> 50 1024 10240
> Using 120MB buffer.
> ISA-L coder encode 54000MB data, with chunk size 10240KB
> Total time: 11.58 s.
> Total throughput: 4662 MB/s
> Threads statistics:
> 50 threads in total.
> Min: 0.55 s, Max: 11.5 s, Avg: 6.32 s, 90th Percentile: 10.45 s.
>  
> RawErasureCoderBenchmark shares a single coder between all concurrent 
> threads. While 
> NativeRSRawEncoder and NativeRSRawDecoder has synchronized key work on 
> doDecode and doEncode function. So 50 concurrent threads are forced to use 
> the shared coder encode/decode function one by one. 
>  
> To resolve the issue, there are two approaches. 
>  # Refactor RawErasureCoderBenchmark  to use dedicated coder for each 
> concurrent thread.
>  # Refactor NativeRSRawEncoder  and NativeRSRawDecoder  to get better 
> concurrency.  Since the synchronized key work is to try to protect the 
> private variable nativeCoder from being checked in doEncode/doDecode and  
> being modified in release.  We can use reentrantReadWriteLock to increase the 
> concurrency since doEncode/doDecode can be called multiple times without 
> change the nativeCoder state.
>  I prefer approach 2 and will upload a patch later. 
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15499) Performance severe drop when running RawErasureCoderBenchmark with NativeRSRawErasureCoder

2018-06-05 Thread SammiChen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493287#comment-16493287
 ] 

SammiChen edited comment on HADOOP-15499 at 6/6/18 3:07 AM:


Performance data before the patch,

bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar  
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 50 
1024 64
 Using 126MB buffer.
 ISA-L coder encode 50400MB data, with chunk size 64KB
Total time: 9.24 s.
Total throughput: 5455.73 MB/s
Threads statistics:
50 threads in total.
Min: 1.79 s, Max: 9.19 s, Avg: 6.58 s, 90th Percentile: 8.94 s.

 

Performance data after the patch,

bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 72 
10240 4096
 Using 120MB buffer.
 ISA-L coder encode 734400MB data, with chunk size 4096KB
 Total time: 8.11 s.
 Total throughput: 90521.39 MB/s
 Threads statistics:
 72 threads in total.
 Min: 6.78 s, Max: 7.93 s, Avg: 7.36 s, 90th Percentile: 7.66 s.

 

I also compared the performance data of two scenarios, one is remove all the 
synchronized key words, another is the current ReentrantReadWriteLock solution. 

The performance of ReentrantReadWriteLock solution is like less than 5% degrade 
than the remove synchronized key words case. It's acceptable for me. 

 


was (Author: sammi):
Performance data before the patch,

bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar  
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 50 
1024 64
 Using 126MB buffer.
 ISA-L coder encode 50400MB data, with chunk size 64KB
 Total time: 0.98 s.
 Total throughput: 51639.34 MB/s
 Threads statistics:
 50 threads in total.

 

Performance data after the patch,

bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 72 
10240 4096
 Using 120MB buffer.
 ISA-L coder encode 734400MB data, with chunk size 4096KB
 Total time: 8.11 s.
 Total throughput: 90521.39 MB/s
 Threads statistics:
 72 threads in total.
 Min: 6.78 s, Max: 7.93 s, Avg: 7.36 s, 90th Percentile: 7.66 s.

 

I also compared the performance data of two scenarios, one is remove all the 
synchronized key words, another is the current ReentrantReadWriteLock solution. 

The performance of ReentrantReadWriteLock solution is like less than 5% degrade 
than the remove synchronized key words case. It's acceptable for me. 

 

> Performance severe drop when running RawErasureCoderBenchmark with 
> NativeRSRawErasureCoder
> --
>
> Key: HADOOP-15499
> URL: https://issues.apache.org/jira/browse/HADOOP-15499
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.1
>Reporter: SammiChen
>Assignee: SammiChen
>Priority: Major
> Attachments: HADOOP-15499.001.patch
>
>
> Run RawErasureCoderBenchmark  which is a micro-benchmark to test EC codec 
> encoding/decoding performance. 
> 50 concurrency Native ISA-L coder has the less throughput than 1 concurrency 
> Native ISA-L case. It's abnormal. 
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 1 
> 1024 1024
> Using 126MB buffer.
> ISA-L coder encode 1008MB data, with chunk size 1024KB
> Total time: 0.19 s.
> Total throughput: 5390.37 MB/s
> Threads statistics:
> 1 threads in total.
> Min: 0.18 s, Max: 0.18 s, Avg: 0.18 s, 90th Percentile: 0.18 s.
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 
> 50 1024 10240
> Using 120MB buffer.
> ISA-L coder encode 54000MB data, with chunk size 10240KB
> Total time: 11.58 s.
> Total throughput: 4662 MB/s
> Threads statistics:
> 50 threads in total.
> Min: 0.55 s, Max: 11.5 s, Avg: 6.32 s, 90th Percentile: 10.45 s.
>  
> RawErasureCoderBenchmark shares a single coder between all concurrent 
> threads. While 
> NativeRSRawEncoder and NativeRSRawDecoder has synchronized key work on 
> doDecode and doEncode function. So 50 concurrent threads are forced to use 
> the shared coder encode/decode function one by one. 
>  
> To resolve the issue, there are two approaches. 
>  # Refactor RawErasureCoderBenchmark  to use dedicated coder for each 
> concurrent thread.
>  # Refactor NativeRSRawEncoder  and NativeRSRawDecoder  to get better 
> concurrency.  Since the synchronized key work is to try to protect the 
> private variable nativeCoder from being checked in doEncode/doDecode and  
> being modified in release.  We 

[jira] [Comment Edited] (HADOOP-15499) Performance severe drop when running RawErasureCoderBenchmark with NativeRSRawErasureCoder

2018-05-29 Thread SammiChen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493287#comment-16493287
 ] 

SammiChen edited comment on HADOOP-15499 at 5/29/18 9:19 AM:
-

Performance data before the patch,

bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar  
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 50 
1024 64
 Using 126MB buffer.
 ISA-L coder encode 50400MB data, with chunk size 64KB
 Total time: 0.98 s.
 Total throughput: 51639.34 MB/s
 Threads statistics:
 50 threads in total.

 

Performance data after the patch,

bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 72 
10240 4096
 Using 120MB buffer.
 ISA-L coder encode 734400MB data, with chunk size 4096KB
 Total time: 8.11 s.
 Total throughput: 90521.39 MB/s
 Threads statistics:
 72 threads in total.
 Min: 6.78 s, Max: 7.93 s, Avg: 7.36 s, 90th Percentile: 7.66 s.

 

I also compared the performance data of two scenarios, one is remove all the 
synchronized key words, another is the current ReentrantReadWriteLock solution. 

The performance of ReentrantReadWriteLock solution is like less than 5% degrade 
than the remove synchronized key words case. It's acceptable for me. 

 


was (Author: sammi):
Performance data before the patch,

bin/hadoop jar ./share/hadoop/common/hadoop-common-3.0.0-alpha2.jar 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 50 
1024 64
Using 126MB buffer.
ISA-L coder encode 50400MB data, with chunk size 64KB
Total time: 0.98 s.
Total throughput: 51639.34 MB/s
Threads statistics:
50 threads in total.

 

Performance data after the patch,

bin/hadoop jar ./share/hadoop/common/hadoop-common-3.0.0-alpha2.jar 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 72 
10240 4096
Using 120MB buffer.
ISA-L coder encode 734400MB data, with chunk size 4096KB
Total time: 8.11 s.
Total throughput: 90521.39 MB/s
Threads statistics:
72 threads in total.
Min: 6.78 s, Max: 7.93 s, Avg: 7.36 s, 90th Percentile: 7.66 s.

 

I also compared the performance data of two scenarios, one is remove all the 
synchronized key words, another is the current ReentrantReadWriteLock solution. 

The performance of ReentrantReadWriteLock solution is like less than 5% degrade 
than the remove synchronized key words case. It's acceptable for me. 

 

> Performance severe drop when running RawErasureCoderBenchmark with 
> NativeRSRawErasureCoder
> --
>
> Key: HADOOP-15499
> URL: https://issues.apache.org/jira/browse/HADOOP-15499
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 3.0.1, 3.0.2
>Reporter: SammiChen
>Assignee: SammiChen
>Priority: Major
> Attachments: HADOOP-15499.001.patch
>
>
> Run RawErasureCoderBenchmark  which is a micro-benchmark to test EC codec 
> encoding/decoding performance. 
> 50 concurrency Native ISA-L coder has the less throughput than 1 concurrency 
> Native ISA-L case. It's abnormal. 
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 1 
> 1024 1024
> Using 126MB buffer.
> ISA-L coder encode 1008MB data, with chunk size 1024KB
> Total time: 0.19 s.
> Total throughput: 5390.37 MB/s
> Threads statistics:
> 1 threads in total.
> Min: 0.18 s, Max: 0.18 s, Avg: 0.18 s, 90th Percentile: 0.18 s.
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 
> 50 1024 10240
> Using 120MB buffer.
> ISA-L coder encode 54000MB data, with chunk size 10240KB
> Total time: 11.58 s.
> Total throughput: 4662 MB/s
> Threads statistics:
> 50 threads in total.
> Min: 0.55 s, Max: 11.5 s, Avg: 6.32 s, 90th Percentile: 10.45 s.
>  
> RawErasureCoderBenchmark shares a single coder between all concurrent 
> threads. While 
> NativeRSRawEncoder and NativeRSRawDecoder has synchronized key work on 
> doDecode and doEncode function. So 50 concurrent threads are forced to use 
> the shared coder encode/decode function one by one. 
>  
> To resolve the issue, there are two approaches. 
>  # Refactor RawErasureCoderBenchmark  to use dedicated coder for each 
> concurrent thread.
>  # Refactor NativeRSRawEncoder  and NativeRSRawDecoder  to get better 
> concurrency.  Since the synchronized key work is to try to protect the 
> private variable nativeCoder from being checked in doEncode/doDecode and  
> being modified in release.  We can use reentrantReadWriteLock to increase the 
> concurrency since doEncode/doDecode can be 

[jira] [Commented] (HADOOP-15499) Performance severe drop when running RawErasureCoderBenchmark with NativeRSRawErasureCoder

2018-05-29 Thread SammiChen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493287#comment-16493287
 ] 

SammiChen commented on HADOOP-15499:


Performance data before the patch,

bin/hadoop jar ./share/hadoop/common/hadoop-common-3.0.0-alpha2.jar 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 50 
1024 64
Using 126MB buffer.
ISA-L coder encode 50400MB data, with chunk size 64KB
Total time: 0.98 s.
Total throughput: 51639.34 MB/s
Threads statistics:
50 threads in total.

 

Performance data after the patch,

bin/hadoop jar ./share/hadoop/common/hadoop-common-3.0.0-alpha2.jar 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 72 
10240 4096
Using 120MB buffer.
ISA-L coder encode 734400MB data, with chunk size 4096KB
Total time: 8.11 s.
Total throughput: 90521.39 MB/s
Threads statistics:
72 threads in total.
Min: 6.78 s, Max: 7.93 s, Avg: 7.36 s, 90th Percentile: 7.66 s.

 

I also compared the performance data of two scenarios, one is remove all the 
synchronized key words, another is the current ReentrantReadWriteLock solution. 

The performance of ReentrantReadWriteLock solution is like less than 5% degrade 
than the remove synchronized key words case. It's acceptable for me. 

 

> Performance severe drop when running RawErasureCoderBenchmark with 
> NativeRSRawErasureCoder
> --
>
> Key: HADOOP-15499
> URL: https://issues.apache.org/jira/browse/HADOOP-15499
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 3.0.1, 3.0.2
>Reporter: SammiChen
>Assignee: SammiChen
>Priority: Major
> Attachments: HADOOP-15499.001.patch
>
>
> Run RawErasureCoderBenchmark  which is a micro-benchmark to test EC codec 
> encoding/decoding performance. 
> 50 concurrency Native ISA-L coder has the less throughput than 1 concurrency 
> Native ISA-L case. It's abnormal. 
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 1 
> 1024 1024
> Using 126MB buffer.
> ISA-L coder encode 1008MB data, with chunk size 1024KB
> Total time: 0.19 s.
> Total throughput: 5390.37 MB/s
> Threads statistics:
> 1 threads in total.
> Min: 0.18 s, Max: 0.18 s, Avg: 0.18 s, 90th Percentile: 0.18 s.
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 
> 50 1024 10240
> Using 120MB buffer.
> ISA-L coder encode 54000MB data, with chunk size 10240KB
> Total time: 11.58 s.
> Total throughput: 4662 MB/s
> Threads statistics:
> 50 threads in total.
> Min: 0.55 s, Max: 11.5 s, Avg: 6.32 s, 90th Percentile: 10.45 s.
>  
> RawErasureCoderBenchmark shares a single coder between all concurrent 
> threads. While 
> NativeRSRawEncoder and NativeRSRawDecoder has synchronized key work on 
> doDecode and doEncode function. So 50 concurrent threads are forced to use 
> the shared coder encode/decode function one by one. 
>  
> To resolve the issue, there are two approaches. 
>  # Refactor RawErasureCoderBenchmark  to use dedicated coder for each 
> concurrent thread.
>  # Refactor NativeRSRawEncoder  and NativeRSRawDecoder  to get better 
> concurrency.  Since the synchronized key work is to try to protect the 
> private variable nativeCoder from being checked in doEncode/doDecode and  
> being modified in release.  We can use reentrantReadWriteLock to increase the 
> concurrency since doEncode/doDecode can be called multiple times without 
> change the nativeCoder state.
>  I prefer approach 2 and will upload a patch later. 
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15499) Performance severe drop when running RawErasureCoderBenchmark with NativeRSRawErasureCoder

2018-05-29 Thread SammiChen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15499:
---
Attachment: HADOOP-15499.001.patch

> Performance severe drop when running RawErasureCoderBenchmark with 
> NativeRSRawErasureCoder
> --
>
> Key: HADOOP-15499
> URL: https://issues.apache.org/jira/browse/HADOOP-15499
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 3.0.1, 3.0.2
>Reporter: SammiChen
>Assignee: SammiChen
>Priority: Major
> Attachments: HADOOP-15499.001.patch
>
>
> Run RawErasureCoderBenchmark  which is a micro-benchmark to test EC codec 
> encoding/decoding performance. 
> 50 concurrency Native ISA-L coder has the less throughput than 1 concurrency 
> Native ISA-L case. It's abnormal. 
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 1 
> 1024 1024
> Using 126MB buffer.
> ISA-L coder encode 1008MB data, with chunk size 1024KB
> Total time: 0.19 s.
> Total throughput: 5390.37 MB/s
> Threads statistics:
> 1 threads in total.
> Min: 0.18 s, Max: 0.18 s, Avg: 0.18 s, 90th Percentile: 0.18 s.
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 
> 50 1024 10240
> Using 120MB buffer.
> ISA-L coder encode 54000MB data, with chunk size 10240KB
> Total time: 11.58 s.
> Total throughput: 4662 MB/s
> Threads statistics:
> 50 threads in total.
> Min: 0.55 s, Max: 11.5 s, Avg: 6.32 s, 90th Percentile: 10.45 s.
>  
> RawErasureCoderBenchmark shares a single coder between all concurrent 
> threads. While 
> NativeRSRawEncoder and NativeRSRawDecoder has synchronized key work on 
> doDecode and doEncode function. So 50 concurrent threads are forced to use 
> the shared coder encode/decode function one by one. 
>  
> To resolve the issue, there are two approaches. 
>  # Refactor RawErasureCoderBenchmark  to use dedicated coder for each 
> concurrent thread.
>  # Refactor NativeRSRawEncoder  and NativeRSRawDecoder  to get better 
> concurrency.  Since the synchronized key work is to try to protect the 
> private variable nativeCoder from being checked in doEncode/doDecode and  
> being modified in release.  We can use reentrantReadWriteLock to increase the 
> concurrency since doEncode/doDecode can be called multiple times without 
> change the nativeCoder state.
>  I prefer approach 2 and will upload a patch later. 
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15499) Performance severe drop when running RawErasureCoderBenchmark with NativeRSRawErasureCoder

2018-05-29 Thread SammiChen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15499:
---
Status: Patch Available  (was: Open)

> Performance severe drop when running RawErasureCoderBenchmark with 
> NativeRSRawErasureCoder
> --
>
> Key: HADOOP-15499
> URL: https://issues.apache.org/jira/browse/HADOOP-15499
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.2, 3.0.1, 3.0.0
>Reporter: SammiChen
>Assignee: SammiChen
>Priority: Major
> Attachments: HADOOP-15499.001.patch
>
>
> Run RawErasureCoderBenchmark  which is a micro-benchmark to test EC codec 
> encoding/decoding performance. 
> 50 concurrency Native ISA-L coder has the less throughput than 1 concurrency 
> Native ISA-L case. It's abnormal. 
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 1 
> 1024 1024
> Using 126MB buffer.
> ISA-L coder encode 1008MB data, with chunk size 1024KB
> Total time: 0.19 s.
> Total throughput: 5390.37 MB/s
> Threads statistics:
> 1 threads in total.
> Min: 0.18 s, Max: 0.18 s, Avg: 0.18 s, 90th Percentile: 0.18 s.
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 
> 50 1024 10240
> Using 120MB buffer.
> ISA-L coder encode 54000MB data, with chunk size 10240KB
> Total time: 11.58 s.
> Total throughput: 4662 MB/s
> Threads statistics:
> 50 threads in total.
> Min: 0.55 s, Max: 11.5 s, Avg: 6.32 s, 90th Percentile: 10.45 s.
>  
> RawErasureCoderBenchmark shares a single coder between all concurrent 
> threads. While 
> NativeRSRawEncoder and NativeRSRawDecoder has synchronized key work on 
> doDecode and doEncode function. So 50 concurrent threads are forced to use 
> the shared coder encode/decode function one by one. 
>  
> To resolve the issue, there are two approaches. 
>  # Refactor RawErasureCoderBenchmark  to use dedicated coder for each 
> concurrent thread.
>  # Refactor NativeRSRawEncoder  and NativeRSRawDecoder  to get better 
> concurrency.  Since the synchronized key work is to try to protect the 
> private variable nativeCoder from being checked in doEncode/doDecode and  
> being modified in release.  We can use reentrantReadWriteLock to increase the 
> concurrency since doEncode/doDecode can be called multiple times without 
> change the nativeCoder state.
>  I prefer approach 2 and will upload a patch later. 
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15499) Performance several drop when running RawErasureCoderBenchmark with NativeRSRawErasureCoder

2018-05-29 Thread SammiChen (JIRA)
SammiChen created HADOOP-15499:
--

 Summary: Performance several drop when running 
RawErasureCoderBenchmark with NativeRSRawErasureCoder
 Key: HADOOP-15499
 URL: https://issues.apache.org/jira/browse/HADOOP-15499
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 3.0.2, 3.0.1, 3.0.0
Reporter: SammiChen
Assignee: SammiChen


Run RawErasureCoderBenchmark  which is a micro-benchmark to test EC codec 
encoding/decoding performance. 

50 concurrency Native ISA-L coder has the less throughput than 1 concurrency 
Native ISA-L case. It's abnormal. 

 

bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 1 
1024 1024
Using 126MB buffer.
ISA-L coder encode 1008MB data, with chunk size 1024KB
Total time: 0.19 s.
Total throughput: 5390.37 MB/s
Threads statistics:
1 threads in total.
Min: 0.18 s, Max: 0.18 s, Avg: 0.18 s, 90th Percentile: 0.18 s.

 

bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 50 
1024 10240
Using 120MB buffer.
ISA-L coder encode 54000MB data, with chunk size 10240KB
Total time: 11.58 s.
Total throughput: 4662 MB/s
Threads statistics:
50 threads in total.
Min: 0.55 s, Max: 11.5 s, Avg: 6.32 s, 90th Percentile: 10.45 s.

 

RawErasureCoderBenchmark shares a single coder between all concurrent threads. 
While 

NativeRSRawEncoder and NativeRSRawDecoder has synchronized key work on doDecode 
and doEncode function. So 50 concurrent threads are forced to use the shared 
coder encode/decode function one by one. 

 

To resolve the issue, there are two approaches. 
 # Refactor RawErasureCoderBenchmark  to use dedicated coder for each 
concurrent thread.
 # Refactor NativeRSRawEncoder  and NativeRSRawDecoder  to get better 
concurrency.  Since the synchronized key work is to try to protect the private 
variable nativeCoder from being checked in doEncode/doDecode and  being 
modified in release.  We can use reentrantReadWriteLock to increase the 
concurrency since doEncode/doDecode can be called multiple times without change 
the nativeCoder state.

 I prefer approach 2 and will upload a patch later. 

 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15499) Performance severe drop when running RawErasureCoderBenchmark with NativeRSRawErasureCoder

2018-05-29 Thread SammiChen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15499:
---
Summary: Performance severe drop when running RawErasureCoderBenchmark with 
NativeRSRawErasureCoder  (was: Performance several drop when running 
RawErasureCoderBenchmark with NativeRSRawErasureCoder)

> Performance severe drop when running RawErasureCoderBenchmark with 
> NativeRSRawErasureCoder
> --
>
> Key: HADOOP-15499
> URL: https://issues.apache.org/jira/browse/HADOOP-15499
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 3.0.1, 3.0.2
>Reporter: SammiChen
>Assignee: SammiChen
>Priority: Major
>
> Run RawErasureCoderBenchmark  which is a micro-benchmark to test EC codec 
> encoding/decoding performance. 
> 50 concurrency Native ISA-L coder has the less throughput than 1 concurrency 
> Native ISA-L case. It's abnormal. 
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 1 
> 1024 1024
> Using 126MB buffer.
> ISA-L coder encode 1008MB data, with chunk size 1024KB
> Total time: 0.19 s.
> Total throughput: 5390.37 MB/s
> Threads statistics:
> 1 threads in total.
> Min: 0.18 s, Max: 0.18 s, Avg: 0.18 s, 90th Percentile: 0.18 s.
>  
> bin/hadoop jar ./share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT-tests.jar 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureCoderBenchmark encode 3 
> 50 1024 10240
> Using 120MB buffer.
> ISA-L coder encode 54000MB data, with chunk size 10240KB
> Total time: 11.58 s.
> Total throughput: 4662 MB/s
> Threads statistics:
> 50 threads in total.
> Min: 0.55 s, Max: 11.5 s, Avg: 6.32 s, 90th Percentile: 10.45 s.
>  
> RawErasureCoderBenchmark shares a single coder between all concurrent 
> threads. While 
> NativeRSRawEncoder and NativeRSRawDecoder has synchronized key work on 
> doDecode and doEncode function. So 50 concurrent threads are forced to use 
> the shared coder encode/decode function one by one. 
>  
> To resolve the issue, there are two approaches. 
>  # Refactor RawErasureCoderBenchmark  to use dedicated coder for each 
> concurrent thread.
>  # Refactor NativeRSRawEncoder  and NativeRSRawDecoder  to get better 
> concurrency.  Since the synchronized key work is to try to protect the 
> private variable nativeCoder from being checked in doEncode/doDecode and  
> being modified in release.  We can use reentrantReadWriteLock to increase the 
> concurrency since doEncode/doDecode can be called multiple times without 
> change the nativeCoder state.
>  I prefer approach 2 and will upload a patch later. 
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-12896) kdiag to add a --DEFAULTREALM option

2018-05-07 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465877#comment-16465877
 ] 

SammiChen edited comment on HADOOP-12896 at 5/7/18 12:56 PM:
-

Remove the fix version field since it's not fixed actually.



was (Author: sammi):
Remov the fix version field since it's not fixed actually.


> kdiag to add a --DEFAULTREALM option 
> -
>
> Key: HADOOP-12896
> URL: https://issues.apache.org/jira/browse/HADOOP-12896
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
>
> * kdiag to add a --DEFAULTREALM option to say not having a default realm is 
> an error.
> * if this flag is unset, when dumping the credential cache, if there is any 
> entry without a realm, *and there is no default realm*, diagnostics to fail 
> with an error. Hadoop will fail in this situation; kdiag should detect and 
> report



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12896) kdiag to add a --DEFAULTREALM option

2018-05-07 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465877#comment-16465877
 ] 

SammiChen commented on HADOOP-12896:


Remov the fix version field since it's not fixed actually.


> kdiag to add a --DEFAULTREALM option 
> -
>
> Key: HADOOP-12896
> URL: https://issues.apache.org/jira/browse/HADOOP-12896
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
>
> * kdiag to add a --DEFAULTREALM option to say not having a default realm is 
> an error.
> * if this flag is unset, when dumping the credential cache, if there is any 
> entry without a realm, *and there is no default realm*, diagnostics to fail 
> with an error. Hadoop will fail in this situation; kdiag should detect and 
> report



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-12896) kdiag to add a --DEFAULTREALM option

2018-05-07 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-12896:
---
Fix Version/s: (was: 2.9.1)

> kdiag to add a --DEFAULTREALM option 
> -
>
> Key: HADOOP-12896
> URL: https://issues.apache.org/jira/browse/HADOOP-12896
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
>
> * kdiag to add a --DEFAULTREALM option to say not having a default realm is 
> an error.
> * if this flag is unset, when dumping the credential cache, if there is any 
> entry without a realm, *and there is no default realm*, diagnostics to fail 
> with an error. Hadoop will fail in this situation; kdiag should detect and 
> report



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15385) Many tests are failing in hadoop-distcp project in branch-2

2018-04-24 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451569#comment-16451569
 ] 

SammiChen commented on HADOOP-15385:


Thanks [~jlowe] for the quick fix and [~djp] for the commit. I'm glad the issue 
limits to the test case, doesn't impact the code.

> Many tests are failing in hadoop-distcp project in branch-2
> ---
>
> Key: HADOOP-15385
> URL: https://issues.apache.org/jira/browse/HADOOP-15385
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.8.2
>Reporter: Rushabh S Shah
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 2.10.0, 2.8.4, 2.9.2
>
> Attachments: HADOOP-15385-branch-2.001.patch
>
>
> Many tests are failing in hadoop-distcp project in branch-2.8
> Below are the failing tests.
> {noformat}
> Failed tests: 
>   
> TestDistCpViewFs.testUpdateGlobTargetMissingSingleLevel:326->checkResult:428 
> expected:<4> but was:<5>
>   TestDistCpViewFs.testGlobTargetMissingMultiLevel:346->checkResult:428 
> expected:<4> but was:<5>
>   TestDistCpViewFs.testGlobTargetMissingSingleLevel:306->checkResult:428 
> expected:<2> but was:<3>
>   TestDistCpViewFs.testUpdateGlobTargetMissingMultiLevel:367->checkResult:428 
> expected:<6> but was:<8>
>   TestIntegration.testUpdateGlobTargetMissingSingleLevel:431->checkResult:577 
> expected:<4> but was:<5>
>   TestIntegration.testGlobTargetMissingMultiLevel:454->checkResult:577 
> expected:<4> but was:<5>
>   TestIntegration.testGlobTargetMissingSingleLevel:408->checkResult:577 
> expected:<2> but was:<3>
>   TestIntegration.testUpdateGlobTargetMissingMultiLevel:478->checkResult:577 
> expected:<6> but was:<8>
>   TestIntegration.testUpdateGlobTargetMissingSingleLevel:431->checkResult:577 
> expected:<4> but was:<5>
>   TestIntegration.testGlobTargetMissingMultiLevel:454->checkResult:577 
> expected:<4> but was:<5>
>   TestIntegration.testGlobTargetMissingSingleLevel:408->checkResult:577 
> expected:<2> but was:<3>
>   TestIntegration.testUpdateGlobTargetMissingMultiLevel:478->checkResult:577 
> expected:<6> but was:<8>
>   TestIntegration.testUpdateGlobTargetMissingSingleLevel:431->checkResult:577 
> expected:<4> but was:<5>
>   TestIntegration.testGlobTargetMissingMultiLevel:454->checkResult:577 
> expected:<4> but was:<5>
>   TestIntegration.testGlobTargetMissingSingleLevel:408->checkResult:577 
> expected:<2> but was:<3>
>   TestIntegration.testUpdateGlobTargetMissingMultiLevel:478->checkResult:577 
> expected:<6> but was:<8>
> Tests run: 258, Failures: 16, Errors: 0, Skipped: 0
> {noformat}
> {noformat}
> rushabhs$ pwd
> /Users/rushabhs/hadoop/apacheHadoop/hadoop/hadoop-tools/hadoop-distcp
> rushabhs$ git branch
>  branch-2
>   branch-2.7
> * branch-2.8
>   branch-2.9
>   branch-3.0
>  rushabhs$ git log --oneline | head -n3
> c4ea1c8bb73 HADOOP-14970. MiniHadoopClusterManager doesn't respect lack of 
> format option. Contributed by Erik Krogen
> 1548205a845 YARN-8147. TestClientRMService#testGetApplications sporadically 
> fails. Contributed by Jason Lowe
> c01b425ba31 YARN-8120. JVM can crash with SIGSEGV when exiting due to custom 
> leveldb logger. Contributed by Jason Lowe.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15385) Many tests are failing in hadoop-distcp project in branch-2.8

2018-04-19 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445109#comment-16445109
 ] 

SammiChen commented on HADOOP-15385:


Hi [~shahrs87], thanks for ping me. Are you going on work on this JIRA?  I 
fully agree that we should better resolve the issue before the release. And on 
the other hand, there are some customers who are waiting eagerly to try the 
enhanced features in 2.9.  So If we can resolve the issue in a shot time 
window, that would be great.  Otherwise, I might consider leave it to next 
release. Your thoughts?

> Many tests are failing in hadoop-distcp project in branch-2.8
> -
>
> Key: HADOOP-15385
> URL: https://issues.apache.org/jira/browse/HADOOP-15385
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.8.3
>Reporter: Rushabh S Shah
>Priority: Blocker
>
> Many tests are failing in hadoop-distcp project in branch-2.8
> Below are the failing tests.
> {noformat}
> Failed tests: 
>   
> TestDistCpViewFs.testUpdateGlobTargetMissingSingleLevel:326->checkResult:428 
> expected:<4> but was:<5>
>   TestDistCpViewFs.testGlobTargetMissingMultiLevel:346->checkResult:428 
> expected:<4> but was:<5>
>   TestDistCpViewFs.testGlobTargetMissingSingleLevel:306->checkResult:428 
> expected:<2> but was:<3>
>   TestDistCpViewFs.testUpdateGlobTargetMissingMultiLevel:367->checkResult:428 
> expected:<6> but was:<8>
>   TestIntegration.testUpdateGlobTargetMissingSingleLevel:431->checkResult:577 
> expected:<4> but was:<5>
>   TestIntegration.testGlobTargetMissingMultiLevel:454->checkResult:577 
> expected:<4> but was:<5>
>   TestIntegration.testGlobTargetMissingSingleLevel:408->checkResult:577 
> expected:<2> but was:<3>
>   TestIntegration.testUpdateGlobTargetMissingMultiLevel:478->checkResult:577 
> expected:<6> but was:<8>
>   TestIntegration.testUpdateGlobTargetMissingSingleLevel:431->checkResult:577 
> expected:<4> but was:<5>
>   TestIntegration.testGlobTargetMissingMultiLevel:454->checkResult:577 
> expected:<4> but was:<5>
>   TestIntegration.testGlobTargetMissingSingleLevel:408->checkResult:577 
> expected:<2> but was:<3>
>   TestIntegration.testUpdateGlobTargetMissingMultiLevel:478->checkResult:577 
> expected:<6> but was:<8>
>   TestIntegration.testUpdateGlobTargetMissingSingleLevel:431->checkResult:577 
> expected:<4> but was:<5>
>   TestIntegration.testGlobTargetMissingMultiLevel:454->checkResult:577 
> expected:<4> but was:<5>
>   TestIntegration.testGlobTargetMissingSingleLevel:408->checkResult:577 
> expected:<2> but was:<3>
>   TestIntegration.testUpdateGlobTargetMissingMultiLevel:478->checkResult:577 
> expected:<6> but was:<8>
> Tests run: 258, Failures: 16, Errors: 0, Skipped: 0
> {noformat}
> {noformat}
> rushabhs$ pwd
> /Users/rushabhs/hadoop/apacheHadoop/hadoop/hadoop-tools/hadoop-distcp
> rushabhs$ git branch
>  branch-2
>   branch-2.7
> * branch-2.8
>   branch-2.9
>   branch-3.0
>  rushabhs$ git log --oneline | head -n3
> c4ea1c8bb73 HADOOP-14970. MiniHadoopClusterManager doesn't respect lack of 
> format option. Contributed by Erik Krogen
> 1548205a845 YARN-8147. TestClientRMService#testGetApplications sporadically 
> fails. Contributed by Jason Lowe
> c01b425ba31 YARN-8120. JVM can crash with SIGSEGV when exiting due to custom 
> leveldb logger. Contributed by Jason Lowe.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15205) maven release: missing source attachments for hadoop-mapreduce-client-core

2018-04-19 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443763#comment-16443763
 ] 

SammiChen commented on HADOOP-15205:


I tried "mvn deploy -Psign -DskipTests -Dgpg.executable=gpg2 -Pdist,src,yarn-ui 
-Dtar" when uploading 2.9.1 RC0.  It works.  

Thanks [~eddyxu] for providing the solution. 

> maven release: missing source attachments for hadoop-mapreduce-client-core
> --
>
> Key: HADOOP-15205
> URL: https://issues.apache.org/jira/browse/HADOOP-15205
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.5, 3.0.0
>Reporter: Zoltan Haindrich
>Priority: Major
>
> I wanted to use the source attachment; however it looks like since 2.7.5 that 
> artifact is not present at maven central ; it looks like the last release 
> which had source attachments / javadocs was 2.7.4
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core/2.7.4/
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core/2.7.5/
> this seems to be not limited to mapreduce; as the same change is present for 
> yarn-common as well
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-yarn-common/2.7.4/
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-yarn-common/2.7.5/
> and also hadoop-common
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-common/2.7.4/
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-common/2.7.5/
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-common/3.0.0/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-04-12 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-14999:
---
   Resolution: Fixed
Fix Version/s: 3.0.3
   2.9.2
   3.1.1
   3.2.0
   2.9.1
   2.10.0
   Status: Resolved  (was: Patch Available)

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Fix For: 2.10.0, 2.9.1, 3.2.0, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: HADOOP-14999-branch-2.001.patch, 
> HADOOP-14999-branch-2.002.patch, HADOOP-14999.001.patch, 
> HADOOP-14999.002.patch, HADOOP-14999.003.patch, HADOOP-14999.004.patch, 
> HADOOP-14999.005.patch, HADOOP-14999.006.patch, HADOOP-14999.007.patch, 
> HADOOP-14999.008.patch, HADOOP-14999.009.patch, HADOOP-14999.010.patch, 
> HADOOP-14999.011.patch, asynchronous_file_uploading.pdf, 
> diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-04-12 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436699#comment-16436699
 ] 

SammiChen commented on HADOOP-14999:


Committed to trunk, branch-3.0, branch-3.1, branch-2, branch-2.9 & branch-2.9.1.

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999-branch-2.001.patch, 
> HADOOP-14999-branch-2.002.patch, HADOOP-14999.001.patch, 
> HADOOP-14999.002.patch, HADOOP-14999.003.patch, HADOOP-14999.004.patch, 
> HADOOP-14999.005.patch, HADOOP-14999.006.patch, HADOOP-14999.007.patch, 
> HADOOP-14999.008.patch, HADOOP-14999.009.patch, HADOOP-14999.010.patch, 
> HADOOP-14999.011.patch, asynchronous_file_uploading.pdf, 
> diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15082) add AbstractContractRootDirectoryTest test for mkdir / ; wasb to implement the test

2018-04-02 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15082:
---
Target Version/s: 2.9.2  (was: 2.9.1)

> add AbstractContractRootDirectoryTest test for mkdir / ; wasb to implement 
> the test
> ---
>
> Key: HADOOP-15082
> URL: https://issues.apache.org/jira/browse/HADOOP-15082
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/azure, test
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-15082-001.patch, HADOOP-15082-002.patch
>
>
> I managed to get a stack trace on an older version of WASB with some coding 
> doing a mkdir(new Path("/"))some of the ranger parentage checks didn't 
> handle that specific case.
> # Add a new root Fs contract test for this operation
> # Have WASB implement the test suite as an integration test.
> # if the test fails shows a problem fix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-10584) ActiveStandbyElector goes down if ZK quorum become unavailable

2018-04-02 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-10584:
---
Target Version/s: 3.2.0, 2.9.2  (was: 2.9.1, 3.2.0)

> ActiveStandbyElector goes down if ZK quorum become unavailable
> --
>
> Key: HADOOP-10584
> URL: https://issues.apache.org/jira/browse/HADOOP-10584
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HADOOP-10584.prelim.patch, hadoop-10584-prelim.patch, 
> rm.log
>
>
> ActiveStandbyElector retries operations for a few times. If the ZK quorum 
> itself is down, it goes down and the daemons will have to be brought up 
> again. 
> Instead, it should log the fact that it is unable to talk to ZK, call 
> becomeStandby on its client, and continue to attempt connecting to ZK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15069) support git-secrets commit hook to keep AWS secrets out of git

2018-04-02 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15069:
---
Target Version/s: 2.8.3, 3.2.0, 3.0.2, 2.9.2  (was: 2.8.3, 2.9.1, 3.2.0, 
3.0.2)

> support git-secrets commit hook to keep AWS secrets out of git
> --
>
> Key: HADOOP-15069
> URL: https://issues.apache.org/jira/browse/HADOOP-15069
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-15069-001.patch, HADOOP-15069-002.patch
>
>
> The latest Uber breach looks like it involved AWS keys in git repos.
> Nobody wants that, which is why amazon provide 
> [git-secrets|https://github.com/awslabs/git-secrets]; a script you can use to 
> scan a repo and its history, *and* add as an automated check.
> Anyone can set this up, but there are a few false positives in the scan, 
> mostly from longs and a few all-upper-case constants. These can all be added 
> to a .gitignore file.
> Also: mention git-secrets in the aws testing docs; say "use it"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15082) add AbstractContractRootDirectoryTest test for mkdir / ; wasb to implement the test

2018-04-02 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15082:
---
Target Version/s: 2.9.1,   (was: 2.9.1)

> add AbstractContractRootDirectoryTest test for mkdir / ; wasb to implement 
> the test
> ---
>
> Key: HADOOP-15082
> URL: https://issues.apache.org/jira/browse/HADOOP-15082
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/azure, test
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-15082-001.patch, HADOOP-15082-002.patch
>
>
> I managed to get a stack trace on an older version of WASB with some coding 
> doing a mkdir(new Path("/"))some of the ranger parentage checks didn't 
> handle that specific case.
> # Add a new root Fs contract test for this operation
> # Have WASB implement the test suite as an integration test.
> # if the test fails shows a problem fix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-30 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420468#comment-16420468
 ] 

SammiChen commented on HADOOP-14999:


Hi, [~uncleGen], please upload a patch for branch-2.  Lambda expression in 
AliyunOSSBlockOutputStream.java is not supported on branch-2. 

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> HADOOP-14999.009.patch, HADOOP-14999.010.patch, HADOOP-14999.011.patch, 
> asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-30 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420453#comment-16420453
 ] 

SammiChen commented on HADOOP-14999:


Thanks [~uncleGen] for the contribution.  My + 1.  Will commit to trunk, 
branch-3.0, branch-3.1, branch-2 & branch-2.9.

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> HADOOP-14999.009.patch, HADOOP-14999.010.patch, HADOOP-14999.011.patch, 
> asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-29 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420102#comment-16420102
 ] 

SammiChen commented on HADOOP-14999:


Hi, [~uncleGen], The last patch looks overall good. Some indent issues,

The general Hadoop indent rules are

1.  4 space indent should be used when a statement exceeds 80 characters and 
need to expand to multiple lines. 

2.  2 space indent should be used for a new statement line

3.  This format is not recommended. Parameters should stay in the same line as 
long as not reach the 80 character limit.  

{quote}

static long longOption(Configuration conf,
 String key,
 long defVal,
 long min) {

{quote}

 

Here the indent is 8 spaces

{quote}

if (partSize < MULTIPART_MIN_SIZE) {
 LOG.warn("{} must be at least 100 KB; configured value is {}",
 property, partSize);
 partSize = MULTIPART_MIN_SIZE;
} else if (partSize > Integer.MAX_VALUE) {
 LOG.warn("oss: {} capped to ~2.14GB(maximum allowed size with " +
 "current output mechanism)", MULTIPART_UPLOAD_PART_SIZE_KEY);
 partSize = Integer.MAX_VALUE;

{quote}

Here the indent is 2 spaces

{quote}

CompleteMultipartUploadRequest completeMultipartUploadRequest =
 new CompleteMultipartUploadRequest(bucketName, key, uploadId,
 partETags);

{quote}

Please double check if I missed other places.   

I would suggest change the default settings of indent in the IDE(Eclipse or 
IntelliJ) so that every new created piece of code will follow the Hadoop indent 
rule.

 

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> HADOOP-14999.009.patch, HADOOP-14999.010.patch, 
> asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15262) AliyunOSS: move files under a directory in parallel when rename a directory

2018-03-19 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404451#comment-16404451
 ] 

SammiChen edited comment on HADOOP-15262 at 3/19/18 7:46 AM:
-

+1.  Committed to trunk, branch-3.0, branch-2 and branch-2.9. Thanks [~wujinhu] 
's contribution. 


was (Author: sammi):
+1.  Committed to trunk, branch-2 and branch-2.9. Thanks [~wujinhu] 's 
contribution. 

> AliyunOSS: move files under a directory in parallel when rename a directory
> ---
>
> Key: HADOOP-15262
> URL: https://issues.apache.org/jira/browse/HADOOP-15262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 2.10.0, 2.9.1, 3.2.0, 3.0.2
>
> Attachments: HADOOP-15262-branch-2.001.patch, HADOOP-15262.001.patch, 
> HADOOP-15262.002.patch, HADOOP-15262.003.patch, HADOOP-15262.004.patch, 
> HADOOP-15262.005.patch, HADOOP-15262.006.patch, HADOOP-15262.007.patch
>
>
> Currently, rename() operation renames files in series. This will be slow if a 
> directory contains many files. So we can improve this by rename files in 
> parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15262) AliyunOSS: move files under a directory in parallel when rename a directory

2018-03-19 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15262:
---
Fix Version/s: 3.0.2

> AliyunOSS: move files under a directory in parallel when rename a directory
> ---
>
> Key: HADOOP-15262
> URL: https://issues.apache.org/jira/browse/HADOOP-15262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 2.10.0, 2.9.1, 3.2.0, 3.0.2
>
> Attachments: HADOOP-15262-branch-2.001.patch, HADOOP-15262.001.patch, 
> HADOOP-15262.002.patch, HADOOP-15262.003.patch, HADOOP-15262.004.patch, 
> HADOOP-15262.005.patch, HADOOP-15262.006.patch, HADOOP-15262.007.patch
>
>
> Currently, rename() operation renames files in series. This will be slow if a 
> directory contains many files. So we can improve this by rename files in 
> parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15262) AliyunOSS: move files under a directory in parallel when rename a directory

2018-03-19 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404451#comment-16404451
 ] 

SammiChen commented on HADOOP-15262:


+1.  Committed to trunk, branch-2 and branch-2.9. Thanks [~wujinhu] 's 
contribution. 

> AliyunOSS: move files under a directory in parallel when rename a directory
> ---
>
> Key: HADOOP-15262
> URL: https://issues.apache.org/jira/browse/HADOOP-15262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 2.10.0, 2.9.1, 3.2.0
>
> Attachments: HADOOP-15262-branch-2.001.patch, HADOOP-15262.001.patch, 
> HADOOP-15262.002.patch, HADOOP-15262.003.patch, HADOOP-15262.004.patch, 
> HADOOP-15262.005.patch, HADOOP-15262.006.patch, HADOOP-15262.007.patch
>
>
> Currently, rename() operation renames files in series. This will be slow if a 
> directory contains many files. So we can improve this by rename files in 
> parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15262) AliyunOSS: move files under a directory in parallel when rename a directory

2018-03-19 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15262:
---
   Resolution: Fixed
Fix Version/s: 3.2.0
   2.9.1
   2.10.0
   Status: Resolved  (was: Patch Available)

> AliyunOSS: move files under a directory in parallel when rename a directory
> ---
>
> Key: HADOOP-15262
> URL: https://issues.apache.org/jira/browse/HADOOP-15262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 2.10.0, 2.9.1, 3.2.0
>
> Attachments: HADOOP-15262-branch-2.001.patch, HADOOP-15262.001.patch, 
> HADOOP-15262.002.patch, HADOOP-15262.003.patch, HADOOP-15262.004.patch, 
> HADOOP-15262.005.patch, HADOOP-15262.006.patch, HADOOP-15262.007.patch
>
>
> Currently, rename() operation renames files in series. This will be slow if a 
> directory contains many files. So we can improve this by rename files in 
> parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15262) AliyunOSS: move files under a directory in parallel when rename a directory

2018-03-19 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15262:
---
Summary: AliyunOSS: move files under a directory in parallel when rename a 
directory  (was: AliyunOSS: rename() to move files in a directory in parallel)

> AliyunOSS: move files under a directory in parallel when rename a directory
> ---
>
> Key: HADOOP-15262
> URL: https://issues.apache.org/jira/browse/HADOOP-15262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15262-branch-2.001.patch, HADOOP-15262.001.patch, 
> HADOOP-15262.002.patch, HADOOP-15262.003.patch, HADOOP-15262.004.patch, 
> HADOOP-15262.005.patch, HADOOP-15262.006.patch, HADOOP-15262.007.patch
>
>
> Currently, rename() operation renames files in series. This will be slow if a 
> directory contains many files. So we can improve this by rename files in 
> parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15262) AliyunOSS: rename() to move files in a directory in parallel

2018-03-15 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401400#comment-16401400
 ] 

SammiChen commented on HADOOP-15262:


Hi [~wujinhu], the 006 patch looks overall good. 

One minor issue is the indent in testRenameDirectoryCopyTaskPartialFailed is 
still "8". It should be "4". 

Please also upload a patch for branch-2 besides the current patch from trunk. 

And fire a new Jira to update the document for this improvement.

> AliyunOSS: rename() to move files in a directory in parallel
> 
>
> Key: HADOOP-15262
> URL: https://issues.apache.org/jira/browse/HADOOP-15262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15262.001.patch, HADOOP-15262.002.patch, 
> HADOOP-15262.003.patch, HADOOP-15262.004.patch, HADOOP-15262.005.patch, 
> HADOOP-15262.006.patch
>
>
> Currently, rename() operation renames files in series. This will be slow if a 
> directory contains many files. So we can improve this by rename files in 
> parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-13 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396605#comment-16396605
 ] 

SammiChen commented on HADOOP-14999:


Hi [~uncleGen], some comments, 

1. 
{quote}
Preconditions.checkArgument(v >= min,
String.format("Value of %s: %d is below the minimum value %d",
key, v, min));
{quote}

Ignore the comment. String.format with %d is OK. 

2. 
bq.  Asynchronous multi-part based uploading mechanism to support huge file* 
which is larger than 5GB.

Please give a detail explain about where does 5GB threshold comes from?

3. 
{quote}
 if (partSize < MULTIPART_MIN_SIZE) {
  LOG.warn("{} must be at least 5 MB; configured value is {}",
  property, partSize);
  partSize = MULTIPART_MIN_SIZE;
{quote}

MULTIPART_MIN_SIZE is 100K. The threshold in warning message is 5 MB.

4. 
bq.  long partSize = AliyunOSSUtils.getMultipartSizeProperty(getConf(), 
MULTIPART_UPLOAD_PART_SIZE_DEFAULT);

 can we use uploadPartSize instead here?

5.  
   bq.  I also add the resource clean logic in try-finally
 {quote}
 try {
  blockStream.write(b, off, len);
  blockWritten += len;
  if (blockWritten >= blockSize) {
uploadCurrentPart();
blockWritten = 0L;
  }
} finally {
  for (File tFile: blockFiles) {
if (tFile.exists() && !tFile.delete()) {
  LOG.warn("Failed to delete temporary file {}", tFile);
}
  }
}
   {quote}
  I see you add the temp file delete in finally no matter the above operation 
succeeds or not .  When store.uploadPart() returns, is the upload finished 
already?  If it's 
  an async operation, delete the temp file in normal case may have trouble. 

6. the performance data looks good. 
  




> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> HADOOP-14999.009.patch, asynchronous_file_uploading.pdf, 
> diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15262) AliyunOSS: rename() to move files in a directory in parallel

2018-03-04 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385603#comment-16385603
 ] 

SammiChen commented on HADOOP-15262:


Some comments:

1.  Indent is 4 black spaces instead of 8 black spaces in Hadoop code style. 

2.  The comment style is not consistent in Constants class.  Leave a space 
between // and the, and also "the" should begin with upper case "T". 

//maximum number of threads allowed in the pool for copies

3. Import with wildcast "*"  is strongly not recommended by Hadoop code style. 

import static org.junit.Assert.*;

4.  function parameter list coding style,  please refer to AliyunOSSInputStream 
to improve the AliyunOSSCopyFileTask parameter list style. 
5.  unboundedCopyThreadPool. Suggest set a upper limit to waiting list size. 
Using unbound resource is not recommended. 
6.  lock before check the status
if (copyFileContext.isCopyFailure()) {
  //some error occurs, break
  break;
} 


> AliyunOSS: rename() to move files in a directory in parallel
> 
>
> Key: HADOOP-15262
> URL: https://issues.apache.org/jira/browse/HADOOP-15262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 3.1.0, 2.9.1, 3.0.1
>
> Attachments: HADOOP-15262.001.patch, HADOOP-15262.002.patch, 
> HADOOP-15262.003.patch
>
>
> Currently, rename() operation renames files in series. This will be slow if a 
> directory contains many files. So we can improve this by rename files in 
> parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15099) YARN Federation Link not working

2018-03-01 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383077#comment-16383077
 ] 

SammiChen commented on HADOOP-15099:


Hi [~animenon], thanks for reporting this.  Would you create a patch using Git 
and upload to this Jira through more->attach files. I can help you commit into 
the upstream. 

> YARN Federation Link not working
> 
>
> Key: HADOOP-15099
> URL: https://issues.apache.org/jira/browse/HADOOP-15099
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.9.0
>Reporter: Anirudh
>Priority: Trivial
>  Labels: documentation, easyfix, newbie
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> YARN federation(in the last paragraph on the page) link isn't working on 
> [http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15099) YARN Federation Link not working

2018-03-01 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15099:
---
Target Version/s: 3.0.0  (was: 2.9.1)

> YARN Federation Link not working
> 
>
> Key: HADOOP-15099
> URL: https://issues.apache.org/jira/browse/HADOOP-15099
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.9.0
>Reporter: Anirudh
>Priority: Trivial
>  Labels: documentation, easyfix, newbie
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> YARN federation(in the last paragraph on the page) link isn't working on 
> [http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-10584) ActiveStandbyElector goes down if ZK quorum become unavailable

2018-03-01 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383045#comment-16383045
 ] 

SammiChen commented on HADOOP-10584:


Hi [~templedf],  does this still target for 2.9.1?  If not, can we push this 
out to next 2.9.2 release? 

> ActiveStandbyElector goes down if ZK quorum become unavailable
> --
>
> Key: HADOOP-10584
> URL: https://issues.apache.org/jira/browse/HADOOP-10584
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HADOOP-10584.prelim.patch, hadoop-10584-prelim.patch, 
> rm.log
>
>
> ActiveStandbyElector retries operations for a few times. If the ZK quorum 
> itself is down, it goes down and the daemons will have to be brought up 
> again. 
> Instead, it should log the fact that it is unable to talk to ZK, call 
> becomeStandby on its client, and continue to attempt connecting to ZK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-02-08 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356633#comment-16356633
 ] 

SammiChen commented on HADOOP-14999:


Add two more,
 # TestAliyunOSSBlockOutputStream. Need tests to cover big file upload, at 
least bigger than {color:#660e7a}MULTIPART_UPLOAD_SIZE{color} size.
 # Any performance comparison data? using the original code and the patch code. 

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-02-08 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356629#comment-16356629
 ] 

SammiChen edited comment on HADOOP-14999 at 2/8/18 8:15 AM:


Hi [~uncleGen],  thanks for refine the patch. Here are a few comments.

1.  AliyunOSSFileSystemStore.

{color:#660e7a}uploadPartSize {color}= 
conf.getLong({color:#660e7a}MULTIPART_UPLOAD_SIZE_KEY{color},
 {color:#660e7a}MULTIPART_UPLOAD_SIZE_DEFAULT{color});
 {color:#660e7a}multipartThreshold {color}= 
conf.getLong({color:#660e7a}MIN_MULTIPART_UPLOAD_THRESHOLD_KEY{color},
 {color:#660e7a}MIN_MULTIPART_UPLOAD_THRESHOLD_DEFAULT{color});
 partSize = conf.getLong({color:#660e7a}MULTIPART_UPLOAD_SIZE_KEY{color},
 {color:#660e7a}MULTIPART_UPLOAD_SIZE_DEFAULT{color});
 {color:#80}if {color}(partSize < 
{color:#660e7a}MIN_MULTIPART_UPLOAD_PART_SIZE{color}) {
 partSize = {color:#660e7a}MIN_MULTIPART_UPLOAD_PART_SIZE{color};
 }

 What's the difference usage of "uploadPartSize" and "partSize" with the same 
initial value? It seems "partSize" is not used in other places.

Also please refine the multi upload related constant properties, put related 
property in adjacent place. Seems "

{color:#660e7a}MULTIPART_UPLOAD_SIZE_DEFAULT{color}" should be called 
"{color:#660e7a}MULTIPART_UPLOAD_PART_SIZE_DEFAULT{color}".  And 
"{color:#660e7a}MULTIPART_UPLOAD_SIZE {color}= {color:#ff}104857600{color}" 
is the temp file size. Try to make the property name carries the accurate 
meaning.

{color:#808080}// Size of each of or multipart pieces in bytes{color}

{color:#80}public static final {color}String 
{color:#660e7a}MULTIPART_UPLOAD_SIZE_KEY {color}=
 {color:#008000}"fs.oss.multipart.upload.size"{color};
 {color:#80}public static final long 
{color}{color:#660e7a}MULTIPART_UPLOAD_SIZE {color}= 
{color:#ff}104857600{color}; {color:#808080}// 100 MB{color}

{color:#80}public static final long 
{color}{color:#660e7a}MULTIPART_UPLOAD_SIZE_DEFAULT {color}= {color:#ff}10 
{color}* {color:#ff}1024 {color}* {color:#ff}1024{color};
 {color:#80}public static final int 
{color}{color:#660e7a}MULTIPART_UPLOAD_PART_NUM_LIMIT {color}= 
{color:#ff}1{color};

{color:#808080}// Minimum size in bytes before we start a multipart uploads or 
copy{color}

{color:#80}public static final {color}String 
{color:#660e7a}MIN_MULTIPART_UPLOAD_THRESHOLD_KEY {color}=
 {color:#008000}"fs.oss.multipart.upload.threshold"{color};
 {color:#80}public static final long 
{color}{color:#660e7a}MIN_MULTIPART_UPLOAD_THRESHOLD_DEFAULT {color}=
 {color:#ff}20 {color}* {color:#ff}1024 {color}* 
{color:#ff}1024{color};

{color:#80}public static final long 
{color}{color:#660e7a}MIN_MULTIPART_UPLOAD_PART_SIZE {color}= 
{color:#ff}100 {color}* {color:#ff}1024L{color};

 

2. AliyunOSSUtils#createTmpFileForWrite

    Change the order of following statements,

{color:#80}if {color}({color:#660e7a}directoryAllocator {color}== 
{color:#80}null{color}) {
 {color:#660e7a}directoryAllocator {color}= {color:#80}new 
{color}LocalDirAllocator({color:#660e7a}BUFFER_DIR_KEY{color});
 }
 {color:#80}if {color}(conf.get({color:#660e7a}BUFFER_DIR_KEY{color}) == 
{color:#80}null{color}) {
 conf.set({color:#660e7a}BUFFER_DIR_KEY{color}, 
conf.get({color:#008000}"hadoop.tmp.dir"{color}) + 
{color:#008000}"/oss"{color});
 }

Also is "{color:#660e7a}directoryAllocator{color}" final?

3. AliyunOSSUtils#intOption,  longOption

   Precondition doesn't support "%d".  Add test case to cover the logic. 
Suggest change the name to more meaning full names like getXOption. Pay 
attention to the code style, the indent.

4. TestAliyunOSSBlockOutputStream.  Add random length file tests here. Only 
1024 aligned file length is not enough.

5. AliyunOSSBlockOutputStream

   {color:#808080} Asynchronous multi-part based uploading mechanism to support 
huge file{color}{color:#808080}* which is larger than 5GB.{color}

Where is this 5GB threshold checked in the code?

The resources are well cleaned after close() is called. But they are not 
cleaned when exception happens during the write() process.

 


was (Author: sammi):
Hi [~uncleGen],  thanks for refine the patch. Here are a few comments.

1.  AliyunOSSFileSystemStore.

{color:#660e7a}uploadPartSize {color}= 
conf.getLong({color:#660e7a}MULTIPART_UPLOAD_SIZE_KEY{color},
 {color:#660e7a}MULTIPART_UPLOAD_SIZE_DEFAULT{color});
 {color:#660e7a}multipartThreshold {color}= 
conf.getLong({color:#660e7a}MIN_MULTIPART_UPLOAD_THRESHOLD_KEY{color},
 {color:#660e7a}MIN_MULTIPART_UPLOAD_THRESHOLD_DEFAULT{color});
 partSize = conf.getLong({color:#660e7a}MULTIPART_UPLOAD_SIZE_KEY{color},
 {color:#660e7a}MULTIPART_UPLOAD_SIZE_DEFAULT{color});
 {color:#80}if {color}(partSize < 
{color:#660e7a}MIN_MULTIPART_UPLOAD_PART_SIZE{color}) {
 partSize = 

[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-02-08 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356629#comment-16356629
 ] 

SammiChen commented on HADOOP-14999:


Hi [~uncleGen],  thanks for refine the patch. Here are a few comments.

1.  AliyunOSSFileSystemStore.

{color:#660e7a}uploadPartSize {color}= 
conf.getLong({color:#660e7a}MULTIPART_UPLOAD_SIZE_KEY{color},
 {color:#660e7a}MULTIPART_UPLOAD_SIZE_DEFAULT{color});
 {color:#660e7a}multipartThreshold {color}= 
conf.getLong({color:#660e7a}MIN_MULTIPART_UPLOAD_THRESHOLD_KEY{color},
 {color:#660e7a}MIN_MULTIPART_UPLOAD_THRESHOLD_DEFAULT{color});
 partSize = conf.getLong({color:#660e7a}MULTIPART_UPLOAD_SIZE_KEY{color},
 {color:#660e7a}MULTIPART_UPLOAD_SIZE_DEFAULT{color});
 {color:#80}if {color}(partSize < 
{color:#660e7a}MIN_MULTIPART_UPLOAD_PART_SIZE{color}) {
 partSize = {color:#660e7a}MIN_MULTIPART_UPLOAD_PART_SIZE{color};
 }

 What's the difference usage of "uploadPartSize" and "partSize" with the same 
initial value? It seems "partSize" is not used in other places.

Also please refine the multi upload related constant properties, put related 
property in adjacent place. Seems "

{color:#660e7a}MULTIPART_UPLOAD_SIZE_DEFAULT{color}" should be called 
"{color:#660e7a}MULTIPART_UPLOAD_PART_SIZE_DEFAULT{color}".  And 
"{color:#660e7a}MULTIPART_UPLOAD_SIZE {color}= {color:#ff}104857600{color}" 
is the temp file size. Try to make the property name carries the accurate 
meaning.

{color:#808080}// Size of each of or multipart pieces in 
bytes{color}{color:#80}public static final {color}String 
{color:#660e7a}MULTIPART_UPLOAD_SIZE_KEY {color}=
 {color:#008000}"fs.oss.multipart.upload.size"{color};
 {color:#80}public static final long 
{color}{color:#660e7a}MULTIPART_UPLOAD_SIZE {color}= 
{color:#ff}104857600{color}; {color:#808080}// 100 
MB{color}{color:#80}public static final long 
{color}{color:#660e7a}MULTIPART_UPLOAD_SIZE_DEFAULT {color}= {color:#ff}10 
{color}* {color:#ff}1024 {color}* {color:#ff}1024{color};
 {color:#80}public static final int 
{color}{color:#660e7a}MULTIPART_UPLOAD_PART_NUM_LIMIT {color}= 
{color:#ff}1{color};

{color:#808080}// Minimum size in bytes before we start a multipart uploads or 
copy{color}{color:#80}public static final {color}String 
{color:#660e7a}MIN_MULTIPART_UPLOAD_THRESHOLD_KEY {color}=
 {color:#008000}"fs.oss.multipart.upload.threshold"{color};
 {color:#80}public static final long 
{color}{color:#660e7a}MIN_MULTIPART_UPLOAD_THRESHOLD_DEFAULT {color}=
 {color:#ff}20 {color}* {color:#ff}1024 {color}* 
{color:#ff}1024{color};

{color:#80}public static final long 
{color}{color:#660e7a}MIN_MULTIPART_UPLOAD_PART_SIZE {color}= 
{color:#ff}100 {color}* {color:#ff}1024L{color};

 

2. AliyunOSSUtils#createTmpFileForWrite

    Change the order of following statements,

{color:#80}if {color}({color:#660e7a}directoryAllocator {color}== 
{color:#80}null{color}) {
 {color:#660e7a}directoryAllocator {color}= {color:#80}new 
{color}LocalDirAllocator({color:#660e7a}BUFFER_DIR_KEY{color});
 }
 {color:#80}if {color}(conf.get({color:#660e7a}BUFFER_DIR_KEY{color}) == 
{color:#80}null{color}) {
 conf.set({color:#660e7a}BUFFER_DIR_KEY{color}, 
conf.get({color:#008000}"hadoop.tmp.dir"{color}) + 
{color:#008000}"/oss"{color});
 }

Also is "{color:#660e7a}directoryAllocator{color}" final?

3. AliyunOSSUtils#intOption,  longOption

   Precondition doesn't support "%d".  Add test case to cover the logic. 
Suggest change the name to more meaning full names like getXOption. Pay 
attention to the code style, the indent.

4. TestAliyunOSSBlockOutputStream.  Add random length file tests here. Only 
1024 aligned file length is not enough.

5. AliyunOSSBlockOutputStream

   {color:#808080} Asynchronous multi-part based uploading mechanism to support 
huge file{color}{color:#808080}* which is larger than 5GB.{color}

Where is this 5GB threshold checked in the code?

The resources are well cleaned after close() is called. But they are not 
cleaned when exception happens during the write() process.

 

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and 

[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve sequential read from Hadoop to Aliyun OSS performance

2018-01-30 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15027:
---
Fix Version/s: 3.0.1
   2.9.1
   2.10.0

> AliyunOSS: Support multi-thread pre-read to improve sequential read from 
> Hadoop to Aliyun OSS performance
> -
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1
>
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch, HADOOP-15027.013.patch, HADOOP-15027.014.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15189) backport HADOOP-15039 to branch-2 and branch-3

2018-01-29 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15189:
---
Fix Version/s: 2.10.0

> backport HADOOP-15039 to branch-2 and branch-3
> --
>
> Key: HADOOP-15189
> URL: https://issues.apache.org/jira/browse/HADOOP-15189
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Blocker
> Fix For: 2.10.0, 2.9.1, 3.0.1
>
> Attachments: HADOOP-15189-branch-2.001.patch, 
> HADOOP-15189-branch-2.9.001.patch, HADOOP-15189-branch-3.0.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15039) Move SemaphoredDelegatingExecutor to hadoop-common

2018-01-19 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331974#comment-16331974
 ] 

SammiChen commented on HADOOP-15039:


Hi [~uncleGen], HADOOP-15027 is depend on this JIRA. I would like to commit 
this JIRA's content into "branch-3", "branch-3.0", "branch-2" and "branch-2.9". 
  I thought about directly cherry-pick the commit to other branches, then I saw 
there is code change in {{S3AFileSystem}}. so would you please rebase the patch 
against these 4 branches, rerun involved S3 test cases, then upload 4 new 
patches, follow the patch name pattern 
"-..patch" ?

> Move SemaphoredDelegatingExecutor to hadoop-common
> --
>
> Key: HADOOP-15039
> URL: https://issues.apache.org/jira/browse/HADOOP-15039
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/oss, fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HADOOP-15039.001.patch, HADOOP-15039.002.patch, 
> HADOOP-15039.003.patch, HADOOP-15039.004.patch, HADOOP-15039.005.patch
>
>
> Detailed discussions in HADOOP-14999 and HADOOP-15027.
> share {{SemaphoredDelegatingExecutor}} and move it to {{hadoop-common}}.
> cc [~ste...@apache.org] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve sequential read from Hadoop to Aliyun OSS performance

2018-01-19 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331960#comment-16331960
 ] 

SammiChen commented on HADOOP-15027:


Thanks [~jlowe] for the notification. I will help to commit HADOOP-15039 to 
other branches first.

> AliyunOSS: Support multi-thread pre-read to improve sequential read from 
> Hadoop to Aliyun OSS performance
> -
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch, HADOOP-15027.013.patch, HADOOP-15027.014.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve sequential read from Hadoop to Aliyun OSS performance

2018-01-17 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15027:
---
Target Version/s: 3.1.0, 2.10.0, 2.9.1, 3.0.1  (was: 3.1.0, 2.9.1, 3.0.1)

> AliyunOSS: Support multi-thread pre-read to improve sequential read from 
> Hadoop to Aliyun OSS performance
> -
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1
>
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch, HADOOP-15027.013.patch, HADOOP-15027.014.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve sequential read from Hadoop to Aliyun OSS performance

2018-01-17 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15027:
---
  Resolution: Fixed
Release Note: Support multi-thread pre-read in AliyunOSSInputStream to 
improve the sequential read performance from Hadoop to Aliyun OSS. 
  Status: Resolved  (was: Patch Available)

> AliyunOSS: Support multi-thread pre-read to improve sequential read from 
> Hadoop to Aliyun OSS performance
> -
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1
>
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch, HADOOP-15027.013.patch, HADOOP-15027.014.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve sequential read from Hadoop to Aliyun OSS performance

2018-01-17 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15027:
---
Fix Version/s: 3.0.1
   2.9.1
   2.10.0
   3.1.0

> AliyunOSS: Support multi-thread pre-read to improve sequential read from 
> Hadoop to Aliyun OSS performance
> -
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1
>
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch, HADOOP-15027.013.patch, HADOOP-15027.014.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve sequential read from Hadoop to Aliyun OSS performance

2018-01-16 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15027:
---
Status: Patch Available  (was: Open)

> AliyunOSS: Support multi-thread pre-read to improve sequential read from 
> Hadoop to Aliyun OSS performance
> -
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch, HADOOP-15027.013.patch, HADOOP-15027.014.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve sequential read from Hadoop to Aliyun OSS performance

2018-01-16 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15027:
---
Status: Open  (was: Patch Available)

To trigger the build which should happen in place.

> AliyunOSS: Support multi-thread pre-read to improve sequential read from 
> Hadoop to Aliyun OSS performance
> -
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch, HADOOP-15027.013.patch, HADOOP-15027.014.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve sequential read from Hadoop to Aliyun OSS performance

2018-01-16 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15027:
---
Summary: AliyunOSS: Support multi-thread pre-read to improve sequential 
read from Hadoop to Aliyun OSS performance  (was: AliyunOSS: Support 
multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance)

> AliyunOSS: Support multi-thread pre-read to improve sequential read from 
> Hadoop to Aliyun OSS performance
> -
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch, HADOOP-15027.013.patch, HADOOP-15027.014.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-16 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16328262#comment-16328262
 ] 

SammiChen commented on HADOOP-15027:


My + 1 to the patch. Will commit after [~genericqa] comment comes out.  Thanks 
[~wujinhu] for the contribution. 

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch, HADOOP-15027.013.patch, HADOOP-15027.014.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-15 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326764#comment-16326764
 ] 

SammiChen commented on HADOOP-15027:


Hi [~wujinhu], the performance data looks very good.  We are very close now.  

The findbugs filter is too general.  It should be as specific as possible. 
Refer to other modules like AWS to see how to specify the filter.  Also please 
make sure the filter is necessary.
 {quote}
 


  {quote}




> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-09 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15027:
---
Target Version/s: 3.1.0, 2.9.1, 3.0.1

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-05 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312897#comment-16312897
 ] 

SammiChen commented on HADOOP-15027:


Hi [~wujinhu], thanks for refine the patch. can you add some performance 
comparison data here? Compare current multi-thread pre-read and previous singe 
thread pre-read implementation. 

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15158) AliyunOSS: Supports role based credential

2018-01-05 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15158:
---
Fix Version/s: (was: 3.0.1)
   (was: 2.9.1)

> AliyunOSS: Supports role based credential
> -
>
> Key: HADOOP-15158
> URL: https://issues.apache.org/jira/browse/HADOOP-15158
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15158.001.patch
>
>
> Currently, AliyunCredentialsProvider supports credential by 
> configuration(core-site.xml). Sometimes, admin wants to create different 
> temporary credential(key/secret/token) for different roles so that one role 
> cannot read data that belongs to another role.
> So, our code should support pass in the URI when creates an 
> XXXCredentialsProvider so that we can get user info(role) from the URI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-04 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16311155#comment-16311155
 ] 

SammiChen commented on HADOOP-15027:


Thanks [~wujinhu] for working on it.  some comments,

1.  DEFAULT_MAX_TOTAL_TASKS = 128; the naming pattern is not consistent 
with others.  Put the default as suffix. 
2.  take care of all the checkstyle issues
3. {quote} 
  store.close();
  boundedThreadPool.shutdown();
{quote}
 will store.close throw any exception so that  boundedThreadPool.shutdown 
be skipped?

4.  "fs.oss.max.total.tasks" is the maximum of waiting queue length, right? 
5.  It seems  fsDataInputStream.seek() is missed between the two asserts. 
{quote}
assertTrue("expected position at:" + 0 + ", but got:"
+ fsDataInputStream.getPos(), fsDataInputStream.getPos() == 0);

assertTrue("expected position at:" + 1048576 + ", but got:"
+ in.getExpectNextPos(),
in.getExpectNextPos() == 1048576);
{quote}
6.  Can we add more test cases to cover the failure cases, to verify the error 
handling functions correctly. 

 

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-04 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15027:
---
Summary: AliyunOSS: Support multi-thread pre-read to improve read from 
Hadoop to Aliyun OSS performance  (was: AliyunOSS: Support multi-thread 
pre-read to improve read from Hadoop to Aliyun OSS performa)

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performa

2018-01-04 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15027:
---
Summary: AliyunOSS: Support multi-thread pre-read to improve read from 
Hadoop to Aliyun OSS performa  (was: AliyunOSS: Support multi-thread pre-read 
to improve read from Hadoop to Aliyun OSS performance)

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performa
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-04 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15027:
---
Summary: AliyunOSS: Support multi-thread pre-read to improve read from 
Hadoop to Aliyun OSS performance  (was: AliyunOSS: Improvements for Hadoop read 
from AliyunOSS)

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15080) Aliyun OSS: update oss sdk from 2.8.1 to 2.8.3 to remove its dependency on Cat-x "json-lib"

2017-12-14 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292132#comment-16292132
 ] 

SammiChen commented on HADOOP-15080:


I see.  Thanks for the explanation [~andrew.wang] .

> Aliyun OSS: update oss sdk from 2.8.1 to 2.8.3 to remove its dependency on 
> Cat-x "json-lib"
> ---
>
> Key: HADOOP-15080
> URL: https://issues.apache.org/jira/browse/HADOOP-15080
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Chris Douglas
>Assignee: SammiChen
>Priority: Blocker
> Fix For: 3.0.0, 3.1.0, 2.10.0, 2.9.1
>
> Attachments: HADOOP-15080-branch-3.0.0.001.patch, 
> HADOOP-15080-branch-3.0.0.002.patch
>
>
> Cat-X dependency on org.json via derived json-lib. OSS SDK has a dependency 
> on json-lib. In LEGAL-245, the org.json library (from which json-lib may be 
> derived) is released under a 
> [category-x|https://www.apache.org/legal/resolved.html#json] license.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15111) AliyunOSS: backport HADOOP-14993 to branch-2

2017-12-14 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15111:
---
Resolution: Fixed
  Assignee: Genmao Yu
Status: Resolved  (was: Patch Available)

> AliyunOSS: backport HADOOP-14993 to branch-2
> 
>
> Key: HADOOP-15111
> URL: https://issues.apache.org/jira/browse/HADOOP-15111
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Fix For: 2.10.0, 2.9.1
>
> Attachments: HADOOP-15111-branch-2.001.patch
>
>
> Do a bulk listing off all entries under a path in one single operation, there 
> is no need to recursively walk the directory tree.
> Updates:
> - override listFiles and listLocatedStatus by using bulk listing
> - some minor updates in hadoop-aliyun index.md



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15111) AliyunOSS: backport HADOOP-14993 to branch-2

2017-12-14 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15111:
---
Fix Version/s: 2.9.1
   2.10.0

> AliyunOSS: backport HADOOP-14993 to branch-2
> 
>
> Key: HADOOP-15111
> URL: https://issues.apache.org/jira/browse/HADOOP-15111
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
> Fix For: 2.10.0, 2.9.1
>
> Attachments: HADOOP-15111-branch-2.001.patch
>
>
> Do a bulk listing off all entries under a path in one single operation, there 
> is no need to recursively walk the directory tree.
> Updates:
> - override listFiles and listLocatedStatus by using bulk listing
> - some minor updates in hadoop-aliyun index.md



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15111) AliyunOSS: backport HADOOP-14993 to branch-2

2017-12-14 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292120#comment-16292120
 ] 

SammiChen commented on HADOOP-15111:


Thanks [~uncleGen] for the work.  My + 1. Committed to branch-2 and branch-2.9. 

> AliyunOSS: backport HADOOP-14993 to branch-2
> 
>
> Key: HADOOP-15111
> URL: https://issues.apache.org/jira/browse/HADOOP-15111
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
> Attachments: HADOOP-15111-branch-2.001.patch
>
>
> Do a bulk listing off all entries under a path in one single operation, there 
> is no need to recursively walk the directory tree.
> Updates:
> - override listFiles and listLocatedStatus by using bulk listing
> - some minor updates in hadoop-aliyun index.md



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15104) AliyunOSS: change the default value of max error retry

2017-12-14 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292072#comment-16292072
 ] 

SammiChen commented on HADOOP-15104:


Committed to branch-2 and branch-2.9. Thanks Jinhu for the work. 

> AliyunOSS: change the default value of max error retry
> --
>
> Key: HADOOP-15104
> URL: https://issues.apache.org/jira/browse/HADOOP-15104
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Fix For: 3.0.0, 3.1.0, 2.10.0, 2.9.1
>
> Attachments: HADOOP-15104.001.patch
>
>
> Currently, default number of times we should retry errors is 20,  however, 
> oss sdk retry delay is   
> {code:java}
> long delay = (long)Math.pow(2, retries) * 0.3
> {code}
>  when one error occurs. So, if we retry 20 times, sleep time will be about 
> 3.64 days and it is unacceptable. So we should change the default behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15104) AliyunOSS: change the default value of max error retry

2017-12-14 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15104:
---
Fix Version/s: 2.9.1
   2.10.0

> AliyunOSS: change the default value of max error retry
> --
>
> Key: HADOOP-15104
> URL: https://issues.apache.org/jira/browse/HADOOP-15104
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Fix For: 3.0.0, 3.1.0, 2.10.0, 2.9.1
>
> Attachments: HADOOP-15104.001.patch
>
>
> Currently, default number of times we should retry errors is 20,  however, 
> oss sdk retry delay is   
> {code:java}
> long delay = (long)Math.pow(2, retries) * 0.3
> {code}
>  when one error occurs. So, if we retry 20 times, sleep time will be about 
> 3.64 days and it is unacceptable. So we should change the default behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15080) Aliyun OSS: update oss sdk from 2.8.1 to 2.8.3 to remove its dependency on Cat-x "json-lib"

2017-12-11 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287095#comment-16287095
 ] 

SammiChen commented on HADOOP-15080:


Hi Andrew,


One simple question. I see there are branches  trunk, branch-3.0 and 
branch-3.0.0.

So I assume 3.1.0 is for trunk,3.0.0 is for branch-3.0.0, and 3.0.1 is for 
branch-3.0.   

3.0.1 for branch-3.0 is not correct, right? And why? 


Bests,
Sammi



> Aliyun OSS: update oss sdk from 2.8.1 to 2.8.3 to remove its dependency on 
> Cat-x "json-lib"
> ---
>
> Key: HADOOP-15080
> URL: https://issues.apache.org/jira/browse/HADOOP-15080
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Chris Douglas
>Assignee: SammiChen
>Priority: Blocker
> Fix For: 3.0.0, 3.1.0, 2.10.0, 2.9.1
>
> Attachments: HADOOP-15080-branch-3.0.0.001.patch, 
> HADOOP-15080-branch-3.0.0.002.patch
>
>
> Cat-X dependency on org.json via derived json-lib. OSS SDK has a dependency 
> on json-lib. In LEGAL-245, the org.json library (from which json-lib may be 
> derived) is released under a 
> [category-x|https://www.apache.org/legal/resolved.html#json] license.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15024) AliyunOSS: support user agent configuration and include that & Hadoop version information to oss server

2017-12-08 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15024:
---
Fix Version/s: 3.0.1
   2.9.1
   2.10.0
   3.0.0

> AliyunOSS: support user agent configuration and include that & Hadoop version 
> information to oss server
> ---
>
> Key: HADOOP-15024
> URL: https://issues.apache.org/jira/browse/HADOOP-15024
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/oss
>Affects Versions: 3.0.0
>Reporter: SammiChen
>Assignee: SammiChen
> Fix For: 3.0.0, 3.1.0, 2.10.0, 2.9.1, 3.0.1
>
> Attachments: HADOOP-15024.000.patch, HADOOP-15024.001.patch, 
> HADOOP-15024.002.patch
>
>
> Provide oss client side Hadoop version to oss server, to help build access 
> statistic metrics. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14993) AliyunOSS: Override listFiles and listLocatedStatus

2017-12-08 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283500#comment-16283500
 ] 

SammiChen commented on HADOOP-14993:


Hi [~uncleGen], the patch cannot apply to branch-2. Would you please take a 
look and provide a new patch for branch-2? 

> AliyunOSS: Override listFiles and listLocatedStatus 
> 
>
> Key: HADOOP-14993
> URL: https://issues.apache.org/jira/browse/HADOOP-14993
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Fix For: 3.0.0, 3.1.0, 3.0.1
>
> Attachments: HADOOP-14993.001.patch, HADOOP-14993.002.patch, 
> HADOOP-14993.003.patch
>
>
> Do a bulk listing off all entries under a path in one single operation, there 
> is no need to recursively walk the directory tree.
> Updates:
> - override listFiles and listLocatedStatus by using bulk listing
> - some minor updates in hadoop-aliyun index.md



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15080) Aliyun OSS: update oss sdk from 2.8.1 to 2.8.3 to remove its dependency on Cat-x "json-lib"

2017-12-08 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15080:
---
Fix Version/s: 3.0.1

> Aliyun OSS: update oss sdk from 2.8.1 to 2.8.3 to remove its dependency on 
> Cat-x "json-lib"
> ---
>
> Key: HADOOP-15080
> URL: https://issues.apache.org/jira/browse/HADOOP-15080
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Chris Douglas
>Assignee: SammiChen
>Priority: Blocker
> Fix For: 3.0.0, 3.1.0, 2.10.0, 2.9.1, 3.0.1
>
> Attachments: HADOOP-15080-branch-3.0.0.001.patch, 
> HADOOP-15080-branch-3.0.0.002.patch
>
>
> Cat-X dependency on org.json via derived json-lib. OSS SDK has a dependency 
> on json-lib. In LEGAL-245, the org.json library (from which json-lib may be 
> derived) is released under a 
> [category-x|https://www.apache.org/legal/resolved.html#json] license.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14993) AliyunOSS: Override listFiles and listLocatedStatus

2017-12-08 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-14993:
---
Fix Version/s: 3.0.1
   3.0.0

> AliyunOSS: Override listFiles and listLocatedStatus 
> 
>
> Key: HADOOP-14993
> URL: https://issues.apache.org/jira/browse/HADOOP-14993
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Fix For: 3.0.0, 3.1.0, 3.0.1
>
> Attachments: HADOOP-14993.001.patch, HADOOP-14993.002.patch, 
> HADOOP-14993.003.patch
>
>
> Do a bulk listing off all entries under a path in one single operation, there 
> is no need to recursively walk the directory tree.
> Updates:
> - override listFiles and listLocatedStatus by using bulk listing
> - some minor updates in hadoop-aliyun index.md



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14997) Add hadoop-aliyun as dependency of hadoop-cloud-storage

2017-12-08 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-14997:
---
Fix Version/s: 2.9.1
   2.10.0

>  Add hadoop-aliyun as dependency of hadoop-cloud-storage
> 
>
> Key: HADOOP-14997
> URL: https://issues.apache.org/jira/browse/HADOOP-14997
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Fix For: 3.0.0, 3.1.0, 2.10.0, 2.9.1
>
> Attachments: HADOOP-14997.001.patch
>
>
> add {{hadoop-aliyun}} dependency in cloud storage modules



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14997) Add hadoop-aliyun as dependency of hadoop-cloud-storage

2017-12-08 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-14997:
---
Fix Version/s: 3.1.0

>  Add hadoop-aliyun as dependency of hadoop-cloud-storage
> 
>
> Key: HADOOP-14997
> URL: https://issues.apache.org/jira/browse/HADOOP-14997
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HADOOP-14997.001.patch
>
>
> add {{hadoop-aliyun}} dependency in cloud storage modules



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15080) Aliyun OSS: update oss sdk from 2.8.1 to 2.8.3 to remove its dependency on Cat-x "json-lib"

2017-12-08 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283298#comment-16283298
 ] 

SammiChen commented on HADOOP-15080:


Thanks [~mackrorysd] for backport to branch-2 & branch-2.9. 

> Aliyun OSS: update oss sdk from 2.8.1 to 2.8.3 to remove its dependency on 
> Cat-x "json-lib"
> ---
>
> Key: HADOOP-15080
> URL: https://issues.apache.org/jira/browse/HADOOP-15080
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Chris Douglas
>Assignee: SammiChen
>Priority: Blocker
> Fix For: 3.0.0, 3.1.0, 2.10.0, 2.9.1
>
> Attachments: HADOOP-15080-branch-3.0.0.001.patch, 
> HADOOP-15080-branch-3.0.0.002.patch
>
>
> Cat-X dependency on org.json via derived json-lib. OSS SDK has a dependency 
> on json-lib. In LEGAL-245, the org.json library (from which json-lib may be 
> derived) is released under a 
> [category-x|https://www.apache.org/legal/resolved.html#json] license.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15080) Aliyun OSS: update oss sdk from 2.8.1 to 2.8.3 to remove its dependency on Cat-x "json-lib"

2017-12-07 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15080:
---
Summary: Aliyun OSS: update oss sdk from 2.8.1 to 2.8.3 to remove its 
dependency on Cat-x "json-lib"  (was: Cat-X dependency on org.json via derived 
json-lib)

> Aliyun OSS: update oss sdk from 2.8.1 to 2.8.3 to remove its dependency on 
> Cat-x "json-lib"
> ---
>
> Key: HADOOP-15080
> URL: https://issues.apache.org/jira/browse/HADOOP-15080
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Chris Douglas
>Priority: Blocker
> Attachments: HADOOP-15080-branch-3.0.0.001.patch, 
> HADOOP-15080-branch-3.0.0.002.patch
>
>
> Cat-X dependency on org.json via derived json-lib. OSS SDK has a dependency 
> on json-lib. In LEGAL-245, the org.json library (from which json-lib may be 
> derived) is released under a 
> [category-x|https://www.apache.org/legal/resolved.html#json] license.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15080) Cat-X dependency on org.json via derived json-lib

2017-12-07 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15080:
---
Description: Cat-X dependency on org.json via derived json-lib. OSS SDK has 
a dependency on json-lib. In LEGAL-245, the org.json library (from which 
json-lib may be derived) is released under a 
[category-x|https://www.apache.org/legal/resolved.html#json] license.  (was: 
The OSS SDK has a dependency on json-lib. In LEGAL-245, the org.json library 
(from which json-lib may be derived) is released under a 
[category-x|https://www.apache.org/legal/resolved.html#json] license.)

> Cat-X dependency on org.json via derived json-lib
> -
>
> Key: HADOOP-15080
> URL: https://issues.apache.org/jira/browse/HADOOP-15080
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Chris Douglas
>Priority: Blocker
> Attachments: HADOOP-15080-branch-3.0.0.001.patch, 
> HADOOP-15080-branch-3.0.0.002.patch
>
>
> Cat-X dependency on org.json via derived json-lib. OSS SDK has a dependency 
> on json-lib. In LEGAL-245, the org.json library (from which json-lib may be 
> derived) is released under a 
> [category-x|https://www.apache.org/legal/resolved.html#json] license.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15080) Cat-X dependency on org.json via derived json-lib

2017-12-07 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281777#comment-16281777
 ] 

SammiChen commented on HADOOP-15080:


Thanks [~drankye] for the review.  I will commit it later.  Thanks 
[~chris.douglas], [~ste...@apache.org] ,  [~mackrorysd] and [~andrew.wang] for 
all your support. 

> Cat-X dependency on org.json via derived json-lib
> -
>
> Key: HADOOP-15080
> URL: https://issues.apache.org/jira/browse/HADOOP-15080
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Chris Douglas
>Priority: Blocker
> Attachments: HADOOP-15080-branch-3.0.0.001.patch, 
> HADOOP-15080-branch-3.0.0.002.patch
>
>
> The OSS SDK has a dependency on json-lib. In LEGAL-245, the org.json library 
> (from which json-lib may be derived) is released under a 
> [category-x|https://www.apache.org/legal/resolved.html#json] license.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15080) Cat-X dependency on org.json via derived json-lib

2017-12-07 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281496#comment-16281496
 ] 

SammiChen edited comment on HADOOP-15080 at 12/7/17 8:32 AM:
-

Aliyun OSS team provides oss sdk 2.8.3 to replace 2.8.1.  json-lib is replaced 
by Jersey-json 1.9 as "test" scope dependency of oss sdk 2.8.3.  Here is my 
verification steps, 

1. delete json-lib in local maven repository
2. clean build Hadoop
3. all Hadoop OSS module UT passed
4. check local maven repository, json-lib is not downloaded



was (Author: sammi):
Aliyun OSS team provides oss sdk 2.8.3 to replace 2.8.1.  json-lib is replaced 
by Jersey-json 1.9 as "test" scope dependency of oss sdk 2.8.3.  Here is my 
verification steps, 

1. delete json-lib in local maven repository
2. clean compiled Hadoop
3. all Hadoop OSS module UT passed
4. check local maven repository, json-lib is not downloaded


> Cat-X dependency on org.json via derived json-lib
> -
>
> Key: HADOOP-15080
> URL: https://issues.apache.org/jira/browse/HADOOP-15080
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Chris Douglas
>Priority: Blocker
> Attachments: HADOOP-15080-branch-3.0.0.001.patch, 
> HADOOP-15080-branch-3.0.0.002.patch
>
>
> The OSS SDK has a dependency on json-lib. In LEGAL-245, the org.json library 
> (from which json-lib may be derived) is released under a 
> [category-x|https://www.apache.org/legal/resolved.html#json] license.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15080) Cat-X dependency on org.json via derived json-lib

2017-12-07 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281496#comment-16281496
 ] 

SammiChen commented on HADOOP-15080:


Aliyun OSS team provides oss sdk 2.8.3 to replace 2.8.1.  json-lib is replaced 
by Jersey-json 1.9 as "test" scope dependency of oss sdk 2.8.3.  Here is my 
verification steps, 

1. delete json-lib in local maven repository
1. clean compiled Hadoop
2. all Hadoop OSS module UT passed
3. check local maven repository, json-lib is not downloaded


> Cat-X dependency on org.json via derived json-lib
> -
>
> Key: HADOOP-15080
> URL: https://issues.apache.org/jira/browse/HADOOP-15080
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Chris Douglas
>Priority: Blocker
> Attachments: HADOOP-15080-branch-3.0.0.001.patch, 
> HADOOP-15080-branch-3.0.0.002.patch
>
>
> The OSS SDK has a dependency on json-lib. In LEGAL-245, the org.json library 
> (from which json-lib may be derived) is released under a 
> [category-x|https://www.apache.org/legal/resolved.html#json] license.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15080) Cat-X dependency on org.json via derived json-lib

2017-12-07 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281496#comment-16281496
 ] 

SammiChen edited comment on HADOOP-15080 at 12/7/17 8:31 AM:
-

Aliyun OSS team provides oss sdk 2.8.3 to replace 2.8.1.  json-lib is replaced 
by Jersey-json 1.9 as "test" scope dependency of oss sdk 2.8.3.  Here is my 
verification steps, 

1. delete json-lib in local maven repository
2. clean compiled Hadoop
3. all Hadoop OSS module UT passed
4. check local maven repository, json-lib is not downloaded



was (Author: sammi):
Aliyun OSS team provides oss sdk 2.8.3 to replace 2.8.1.  json-lib is replaced 
by Jersey-json 1.9 as "test" scope dependency of oss sdk 2.8.3.  Here is my 
verification steps, 

1. delete json-lib in local maven repository
1. clean compiled Hadoop
2. all Hadoop OSS module UT passed
3. check local maven repository, json-lib is not downloaded


> Cat-X dependency on org.json via derived json-lib
> -
>
> Key: HADOOP-15080
> URL: https://issues.apache.org/jira/browse/HADOOP-15080
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Chris Douglas
>Priority: Blocker
> Attachments: HADOOP-15080-branch-3.0.0.001.patch, 
> HADOOP-15080-branch-3.0.0.002.patch
>
>
> The OSS SDK has a dependency on json-lib. In LEGAL-245, the org.json library 
> (from which json-lib may be derived) is released under a 
> [category-x|https://www.apache.org/legal/resolved.html#json] license.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15080) Cat-X dependency on org.json via derived json-lib

2017-12-07 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-15080:
---
Attachment: HADOOP-15080-branch-3.0.0.002.patch

> Cat-X dependency on org.json via derived json-lib
> -
>
> Key: HADOOP-15080
> URL: https://issues.apache.org/jira/browse/HADOOP-15080
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Chris Douglas
>Priority: Blocker
> Attachments: HADOOP-15080-branch-3.0.0.001.patch, 
> HADOOP-15080-branch-3.0.0.002.patch
>
>
> The OSS SDK has a dependency on json-lib. In LEGAL-245, the org.json library 
> (from which json-lib may be derived) is released under a 
> [category-x|https://www.apache.org/legal/resolved.html#json] license.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15080) Cat-X dependency on org.json via derived json-lib

2017-12-05 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279604#comment-16279604
 ] 

SammiChen commented on HADOOP-15080:


Hi [~chris.douglas],  [~ste...@apache.org] and [~mackrorysd], thanks all for 
the information here.  I also have gone through the discussion in LEGAL-349. 
Here I want to further explain the dependency chain. The decision we made this 
time might be the guide for further issues.  

Currently aliyun oss sdk requires json-lib as compile dependency on maven 
repository. Assume we change the dependency scope to "test" as it should be. 
Here is the dependency chain after that. 

{noformat}
  compile dependency
 test dependency 
Hadoop oss storage support module   --> aliyun 
oss sdk  -->  json-lib
{noformat}

json-lib is used in oss sdk test functions, not used in any Hadoop oss storage 
support module code, including test code. Hadoop project doesn't import any 
class from json-lib, and not include it in any pom.xml file.  So basically the 
impact is if when you build Hadoop, maven will download the json-lib library to 
your local maven repository. That's all. 

>From the discussion in LEGAL-349, clearly it's prohibited if json-lib is 
>directly used in Hadoop oss storage support module test code. I'm not sure if  
>the above case is allowed or not. 

In the meanwhile, jersey-json is picked to replace json-lib. It is currently 
the number one used json library in Hadoop. So we assume it's safe to use. If 
anyone do have other opinion, please let me known ASAP.  I will go on with 
solution 2, try to deliver the new oss sdk to catch up the RC. Thanks 
[~andrew.wang]!

 



> Cat-X dependency on org.json via derived json-lib
> -
>
> Key: HADOOP-15080
> URL: https://issues.apache.org/jira/browse/HADOOP-15080
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Chris Douglas
>Priority: Blocker
> Attachments: HADOOP-15080-branch-3.0.0.001.patch
>
>
> The OSS SDK has a dependency on json-lib. In LEGAL-245, the org.json library 
> (from which json-lib may be derived) is released under a 
> [category-x|https://www.apache.org/legal/resolved.html#json] license.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14964) AliyunOSS: backport Aliyun OSS module to branch-2

2017-12-05 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275644#comment-16275644
 ] 

SammiChen edited comment on HADOOP-14964 at 12/5/17 8:08 AM:
-

Hi [~chris.douglas], I see the problem now.  Thanks for such detail 
information. Let's wait for LEGAL's opinion. In the meanwhile, we will look 
into alternatives libraries to replace json-lib.  Also thanks [~djp] for 
considering landing this feature into 2.8.3.  I agree we should go ahead to cut 
2.8.3 CR0. 



was (Author: sammi):
Hi [~chris.douglas], I see the problem now.  Thanks for such detail 
information. Let's wait for LEGAL's opinion. In the meanwhile, we will look 
into alternatives libraries to replace json-lib.  Also thanks @Junping Du for 
considering landing this feature into 2.8.3.  I agree we should go ahead to cut 
2.8.3 CR0. 


> AliyunOSS: backport Aliyun OSS module to branch-2
> -
>
> Key: HADOOP-14964
> URL: https://issues.apache.org/jira/browse/HADOOP-14964
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
>Assignee: SammiChen
> Fix For: 2.9.1
>
> Attachments: HADOOP-14964-branch-2.000.patch, 
> HADOOP-14964-branch-2.8.000.patch, HADOOP-14964-branch-2.8.001.patch, 
> HADOOP-14964-branch-2.9.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15080) Cat-X dependency on org.json via derived json-lib

2017-12-04 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277964#comment-16277964
 ] 

SammiChen commented on HADOOP-15080:


Hi [~andrew.wang],  I'm working with Aliyun oss team to find the solutions.  
The good news is that the json-lib library is only used in oss sdk test 
functions. Currently it's marked as "compile dependency" while actually it only 
requires "test dependency".   So basically two solutions,
1. A new oss sdk which marks "json-lib" as "test dependency".  As test 
dependency is intransitive.  Hadoop will not been impacted. 
2. A new oss sdk, use "jersey-json" 1.9 version.  "jersey-json" is also "test 
dependency".  
Solution 2 is preferred for it solve the problem once-at-all.  It will takes 
2~3 days to prepare the new oss sdk.  Do you have any concerns about solution 
2? 

In the meantime, I agree we should prepare a patch to revert out Aliyun OSS 
support to not block the release. But if we still have time, may I ask a favor 
to hold the revert for several days?   Also thanks [~mackrorysd] for prepare 
the patch. 

> Cat-X dependency on org.json via derived json-lib
> -
>
> Key: HADOOP-15080
> URL: https://issues.apache.org/jira/browse/HADOOP-15080
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Chris Douglas
>Priority: Blocker
> Attachments: HADOOP-15080-branch-3.0.0.001.patch
>
>
> The OSS SDK has a dependency on json-lib. In LEGAL-245, the org.json library 
> (from which json-lib may be derived) is released under a 
> [category-x|https://www.apache.org/legal/resolved.html#json] license.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14964) AliyunOSS: backport Aliyun OSS module to branch-2

2017-12-02 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275644#comment-16275644
 ] 

SammiChen commented on HADOOP-14964:


Hi [~chris.douglas], I see the problem now.  Thanks for such detail 
information. Let's wait for LEGAL's opinion. In the meanwhile, we will look 
into alternatives libraries to replace json-lib.  Also thanks @Junping Du for 
considering landing this feature into 2.8.3.  I agree we should go ahead to cut 
2.8.3 CR0. 


> AliyunOSS: backport Aliyun OSS module to branch-2
> -
>
> Key: HADOOP-14964
> URL: https://issues.apache.org/jira/browse/HADOOP-14964
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
>Assignee: SammiChen
> Fix For: 2.9.1
>
> Attachments: HADOOP-14964-branch-2.000.patch, 
> HADOOP-14964-branch-2.8.000.patch, HADOOP-14964-branch-2.8.001.patch, 
> HADOOP-14964-branch-2.9.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14964) AliyunOSS: backport Aliyun OSS module to branch-2

2017-11-30 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274013#comment-16274013
 ] 

SammiChen commented on HADOOP-14964:


bq. It looks like json-lib is ALv2, but it includes the (Cat-X) json.org 
dependency.

Hi @Chris Douglas, the "json.org" is referred in the {{}} section. 
It's not used in any {{}} of net.sf.json-lib/json-lib. Is it still 
a problem? 

{quote}
 
  
 Douglas Crockford
 json at JSON.org
 JSON.org
 
Original source code developer
 
  
{quote}

> AliyunOSS: backport Aliyun OSS module to branch-2
> -
>
> Key: HADOOP-14964
> URL: https://issues.apache.org/jira/browse/HADOOP-14964
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
>Assignee: SammiChen
> Fix For: 2.9.1
>
> Attachments: HADOOP-14964-branch-2.000.patch, 
> HADOOP-14964-branch-2.8.000.patch, HADOOP-14964-branch-2.8.001.patch, 
> HADOOP-14964-branch-2.9.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14964) AliyunOSS: backport Aliyun OSS module to branch-2

2017-11-29 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270412#comment-16270412
 ] 

SammiChen commented on HADOOP-14964:


Hi [~chris.douglas], thanks for summarize the discussion here. 

bq. Please cherry-pick from branch-2.9 to branch-2.8, then from branch-2.8 to 
branch-2.8.3 so the lineage is clear.

Got it. 

bq. For (1), what are all the transitive dependencies for this module? 

 aliyun-sdk-oss  2.8.1 is used in trunk, branch-2 and branch-2.9.  It has three 
external dependencies, net.sf.json-lib/json-lib(not used in Hadoop), 
org.apache.httpcomponents/httpclient and org.jdom/jdom(same 1.1 version as 
Hadoop). For httpclient, Hadoop use 4.5.2, oss sdk use 4.4.1. 4.5.2 fully 
support the function oss sdk used in 4.4.1. So it's OK for oss-sdk to use the 
current 4.5.2 httpclient in Hadoop, also httpclient dependency exclusion is 
declared for oss-sdk in trunk and branch-2.9 to avoid any conflict.  I would 
say basically the oss sdk transitive dependencies is not a problem. 


bq.  For (2), if we take the current branch-2.9 and cut a 2.9.1 release, that 
would tranquilize anxiety about the upgrade path. SammiChen, would you be able 
to RM this? Arun Suresh and Subru Krishnan may be able to help by providing 
pointers to release docs.

Thanks for the asking. I would like to take the RM role if possible. Also guide 
is strongly needed:) 

bq. As policy, (4) is stickier. Even releasing this with 2.9.1 doesn't strictly 
adhere to our rules, but it's better than adding another, active release 
branch. The case for 2.8.3 is more problematic. The cadence for 2.8.x will 
decelerate more rapidly than 2.9.x, so fixes to Aliyun OSS will be released 
less often. We may not do its users a favor by including an outdated client 
with their 2.8 clusters. Frankly, maintenance is also simpler when we disallow 
feature backports into patch releases, rather than discussing the merits of 
each one. This can't become a precedent; it takes too much time.

I do understand the concerns. Previously we don't know the release process very 
well,  should proposal the request earlier. We will convey community's feedback 
to Aliyun OSS team and discuss the possible solutions.

 








> AliyunOSS: backport Aliyun OSS module to branch-2
> -
>
> Key: HADOOP-14964
> URL: https://issues.apache.org/jira/browse/HADOOP-14964
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
>Assignee: SammiChen
> Fix For: 2.9.1
>
> Attachments: HADOOP-14964-branch-2.000.patch, 
> HADOOP-14964-branch-2.8.000.patch, HADOOP-14964-branch-2.8.001.patch, 
> HADOOP-14964-branch-2.9.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14964) AliyunOSS: backport Aliyun OSS module to branch-2

2017-11-26 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266372#comment-16266372
 ] 

SammiChen commented on HADOOP-14964:


Hi [~djp],  thanks for you feedback and approve. I'm so glad that current 
design timely addressed your concern.  The patch for 2.8.3 will be cherry 
picked from branch-2.  I would like to wait 2~3 days to see if there is other 
feedback from the community. If no further feedback, I will commit the patch to 
2.8.3 and update the release note accordingly. 

> AliyunOSS: backport Aliyun OSS module to branch-2
> -
>
> Key: HADOOP-14964
> URL: https://issues.apache.org/jira/browse/HADOOP-14964
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
>Assignee: SammiChen
> Fix For: 2.9.1
>
> Attachments: HADOOP-14964-branch-2.000.patch, 
> HADOOP-14964-branch-2.8.000.patch, HADOOP-14964-branch-2.8.001.patch, 
> HADOOP-14964-branch-2.9.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14964) AliyunOSS: backport Aliyun OSS module to branch-2

2017-11-26 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266371#comment-16266371
 ] 

SammiChen commented on HADOOP-14964:


Hi [~chris.douglas],  thanks for the suggestion. I recommitted the code to 
branch-2.9 with the "cherry-pick" note. The release not is added. Currently, 
it's just for 2.9.1. Once the patch goes into 2.8.3, I will update the release 
note. 

> AliyunOSS: backport Aliyun OSS module to branch-2
> -
>
> Key: HADOOP-14964
> URL: https://issues.apache.org/jira/browse/HADOOP-14964
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
>Assignee: SammiChen
> Fix For: 2.9.1
>
> Attachments: HADOOP-14964-branch-2.000.patch, 
> HADOOP-14964-branch-2.8.000.patch, HADOOP-14964-branch-2.8.001.patch, 
> HADOOP-14964-branch-2.9.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14964) AliyunOSS: backport Aliyun OSS module to branch-2

2017-11-26 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-14964:
---
Release Note: OSS is widely used among China’s cloud users and this work 
implemented a new Hadoop compatible filesystem AliyunOSSFileSystem with oss 
scheme, similar to the s3a and azure support. Currently, the feature is support 
in 2.9.1. 

> AliyunOSS: backport Aliyun OSS module to branch-2
> -
>
> Key: HADOOP-14964
> URL: https://issues.apache.org/jira/browse/HADOOP-14964
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
>Assignee: SammiChen
> Fix For: 2.9.1
>
> Attachments: HADOOP-14964-branch-2.000.patch, 
> HADOOP-14964-branch-2.8.000.patch, HADOOP-14964-branch-2.8.001.patch, 
> HADOOP-14964-branch-2.9.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14964) AliyunOSS: backport Aliyun OSS module to branch-2

2017-11-26 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-14964:
---
Fix Version/s: 2.9.1

> AliyunOSS: backport Aliyun OSS module to branch-2
> -
>
> Key: HADOOP-14964
> URL: https://issues.apache.org/jira/browse/HADOOP-14964
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
>Assignee: SammiChen
> Fix For: 2.9.1
>
> Attachments: HADOOP-14964-branch-2.000.patch, 
> HADOOP-14964-branch-2.8.000.patch, HADOOP-14964-branch-2.8.001.patch, 
> HADOOP-14964-branch-2.9.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14964) AliyunOSS: backport Aliyun OSS module to branch-2

2017-11-26 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-14964:
---
Summary: AliyunOSS: backport Aliyun OSS module to branch-2  (was: 
AliyunOSS: backport Aliyun OSS module to branch-2 and 2.8+ branches)

> AliyunOSS: backport Aliyun OSS module to branch-2
> -
>
> Key: HADOOP-14964
> URL: https://issues.apache.org/jira/browse/HADOOP-14964
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
>Assignee: SammiChen
> Attachments: HADOOP-14964-branch-2.000.patch, 
> HADOOP-14964-branch-2.8.000.patch, HADOOP-14964-branch-2.8.001.patch, 
> HADOOP-14964-branch-2.9.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



  1   2   >