[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize

2022-12-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642456#comment-17642456
 ] 

ASF GitHub Bot commented on HADOOP-18543:
-

steveloughran commented on PR #5172:
URL: https://github.com/apache/hadoop/pull/5172#issuecomment-1335114105

   > I'd like to make oss also implement openFile() in this pr as s3a does 
which could also meet our needs.
   
   This is exactly what the API was designed for -to let people provide extra 
hints/options. to the object stores.
   
   Do read the filesystem.md and related docs on the topic, and look at the 
abfs implementation as well as the s3a one.
   
   If your implementation takes a FileStatus or length option, it can then skip 
the HEAD request on opening and save time and money. All the hadoop internal 
uses of openFile() do this. My own copies of the parquet and avro readers also 
do it for better cloud reading performance




> AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use 
> buffer size as its downloadPartSize
> -
>
> Key: HADOOP-18543
> URL: https://issues.apache.org/jira/browse/HADOOP-18543
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Reporter: Hangxiang Yu
>Priority: Major
>  Labels: pull-request-available
>
> In our application, different components have their own suitable buffer size 
> to download.
> But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get 
> downloadPartSize from configuration.
> We cannnot use different value for different components in our programs.
> I think we should the method should use the buffer size from the paramater.
> AliyunOSSFileSystem#open(Path path) could have default value as current 
> default downloadPartSize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize

2022-12-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642272#comment-17642272
 ] 

ASF GitHub Bot commented on HADOOP-18543:
-

masteryhx commented on PR #5172:
URL: https://github.com/apache/hadoop/pull/5172#issuecomment-1334751421

   
   > sorry, but I'm going to say -1 to using the normal IO buffer size as the 
GET range. The default value of 4k is way too small even for parquet/orc reads, 
it will break all existing apps in performance terms: distcp, parquet library, 
avro, ORC, everything, as they all use the default value.
   > 
   > 1. there is a configuration option for multipart download size, which is 
filesystem-wide. Not as flexible, but something everyone will expect to work.
   > 2. If you want better control of read policy, buffer sizes etc, then this 
connector needs to implement openFile(), as s3a and abfs do. that will let you 
add a new option to specify the range for GET calls.
   
   Thanks a lot for your detailed explaination and suggestion.
   I agree that it's better not to change the implementaion of current 
interface.
   I'd like to make oss also implement openFile() in this pr as s3a does which 
could also meet our needs.
   WDYT?




> AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use 
> buffer size as its downloadPartSize
> -
>
> Key: HADOOP-18543
> URL: https://issues.apache.org/jira/browse/HADOOP-18543
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Reporter: Hangxiang Yu
>Priority: Major
>  Labels: pull-request-available
>
> In our application, different components have their own suitable buffer size 
> to download.
> But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get 
> downloadPartSize from configuration.
> We cannnot use different value for different components in our programs.
> I think we should the method should use the buffer size from the paramater.
> AliyunOSSFileSystem#open(Path path) could have default value as current 
> default downloadPartSize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize

2022-12-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642065#comment-17642065
 ] 

ASF GitHub Bot commented on HADOOP-18543:
-

steveloughran commented on PR #5172:
URL: https://github.com/apache/hadoop/pull/5172#issuecomment-1334109192

   (note also that includes letting you declare read policy (whole-file, 
sequential, random, vectoredthat can be used to change default block size 
too)




> AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use 
> buffer size as its downloadPartSize
> -
>
> Key: HADOOP-18543
> URL: https://issues.apache.org/jira/browse/HADOOP-18543
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Reporter: Hangxiang Yu
>Priority: Major
>  Labels: pull-request-available
>
> In our application, different components have their own suitable buffer size 
> to download.
> But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get 
> downloadPartSize from configuration.
> We cannnot use different value for different components in our programs.
> I think we should the method should use the buffer size from the paramater.
> AliyunOSSFileSystem#open(Path path) could have default value as current 
> default downloadPartSize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize

2022-12-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642064#comment-17642064
 ] 

ASF GitHub Bot commented on HADOOP-18543:
-

steveloughran commented on PR #5172:
URL: https://github.com/apache/hadoop/pull/5172#issuecomment-1334108437

   sorry, but I'm going to say -1 to using the normal IO buffer size as the GET 
range. The default value of 4k is way too small even for parquet/orc reads, it 
will break all existing apps in performance terms: distcp, parquet library, 
avro, ORC, everything, as they all use the default value.
   
   1. there is a configuration option for multipart download size, which is 
filesystem-wide. Not as flexible, but something everyone will expect to work.
   2. If you want better control of read policy, buffer sizes etc, then this 
connector needs to implement openFile(), as s3a and abfs do. that will let you 
add a new option to specify the range for GET calls.




> AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use 
> buffer size as its downloadPartSize
> -
>
> Key: HADOOP-18543
> URL: https://issues.apache.org/jira/browse/HADOOP-18543
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Reporter: Hangxiang Yu
>Priority: Major
>  Labels: pull-request-available
>
> In our application, different components have their own suitable buffer size 
> to download.
> But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get 
> downloadPartSize from configuration.
> We cannnot use different value for different components in our programs.
> I think we should the method should use the buffer size from the paramater.
> AliyunOSSFileSystem#open(Path path) could have default value as current 
> default downloadPartSize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize

2022-12-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642061#comment-17642061
 ] 

ASF GitHub Bot commented on HADOOP-18543:
-

steveloughran commented on code in PR #5172:
URL: https://github.com/apache/hadoop/pull/5172#discussion_r1037378725


##
hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSInputStream.java:
##
@@ -57,18 +57,21 @@ public class AliyunOSSInputStream extends FSInputStream {
   private ExecutorService readAheadExecutorService;
   private Queue readBufferQueue = new ArrayDeque<>();
 
-  public AliyunOSSInputStream(Configuration conf,
-  ExecutorService readAheadExecutorService, int maxReadAheadPartNumber,
-  AliyunOSSFileSystemStore store, String key, Long contentLength,
-  Statistics statistics) throws IOException {
+  public AliyunOSSInputStream(
+  long downloadPartSize,
+  ExecutorService readAheadExecutorService,
+  int maxReadAheadPartNumber,
+  AliyunOSSFileSystemStore store,
+  String key,
+  Long contentLength,
+  Statistics statistics) throws IOException {
 this.readAheadExecutorService =
-MoreExecutors.listeningDecorator(readAheadExecutorService);
+MoreExecutors.listeningDecorator(readAheadExecutorService);
 this.store = store;
 this.key = key;
 this.statistics = statistics;
 this.contentLength = contentLength;
-downloadPartSize = conf.getLong(MULTIPART_DOWNLOAD_SIZE_KEY,
-MULTIPART_DOWNLOAD_SIZE_DEFAULT);
+this.downloadPartSize = downloadPartSize;

Review Comment:
   its about the efficiencies of a GET call, overhead of creating HTTPS 
connections etc. which comes down to
   
   * reading a whole file is the wrong strategy for random IO formats (ORC, 
parquet)
   * random IO/small ranged GETs wrong for whole files.
   * even with random io, KB is way too small.
   
   This is why there's a tendency for the stores to do adaptive "first 
backwards/big forward seek means random IO", and since HADOOP-16202 let caller 
declared read policy.
   
   If the existing code was changed to say "we set GET range to be the buffer 
size you passed in on open()", then everyone's existing code is going to suffer 
really badly on performance.
   





> AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use 
> buffer size as its downloadPartSize
> -
>
> Key: HADOOP-18543
> URL: https://issues.apache.org/jira/browse/HADOOP-18543
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Reporter: Hangxiang Yu
>Priority: Major
>  Labels: pull-request-available
>
> In our application, different components have their own suitable buffer size 
> to download.
> But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get 
> downloadPartSize from configuration.
> We cannnot use different value for different components in our programs.
> I think we should the method should use the buffer size from the paramater.
> AliyunOSSFileSystem#open(Path path) could have default value as current 
> default downloadPartSize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize

2022-11-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641302#comment-17641302
 ] 

ASF GitHub Bot commented on HADOOP-18543:
-

masteryhx commented on code in PR #5172:
URL: https://github.com/apache/hadoop/pull/5172#discussion_r1035962332


##
hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSInputStream.java:
##
@@ -57,18 +57,21 @@ public class AliyunOSSInputStream extends FSInputStream {
   private ExecutorService readAheadExecutorService;
   private Queue readBufferQueue = new ArrayDeque<>();
 
-  public AliyunOSSInputStream(Configuration conf,
-  ExecutorService readAheadExecutorService, int maxReadAheadPartNumber,
-  AliyunOSSFileSystemStore store, String key, Long contentLength,
-  Statistics statistics) throws IOException {
+  public AliyunOSSInputStream(
+  long downloadPartSize,
+  ExecutorService readAheadExecutorService,
+  int maxReadAheadPartNumber,
+  AliyunOSSFileSystemStore store,
+  String key,
+  Long contentLength,
+  Statistics statistics) throws IOException {
 this.readAheadExecutorService =
-MoreExecutors.listeningDecorator(readAheadExecutorService);
+MoreExecutors.listeningDecorator(readAheadExecutorService);
 this.store = store;
 this.key = key;
 this.statistics = statistics;
 this.contentLength = contentLength;
-downloadPartSize = conf.getLong(MULTIPART_DOWNLOAD_SIZE_KEY,
-MULTIPART_DOWNLOAD_SIZE_DEFAULT);
+this.downloadPartSize = downloadPartSize;

Review Comment:
   I think we could see different performance between uploading/requesting 4KB 
and 4MB ?
   In my some cases, some data are orgnazied with unit of ~16KB, and I will 
read them randomly.
   In this case, I am sure what I need is just these KB, more data will cost 
more time and bandwidth.





> AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use 
> buffer size as its downloadPartSize
> -
>
> Key: HADOOP-18543
> URL: https://issues.apache.org/jira/browse/HADOOP-18543
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Reporter: Hangxiang Yu
>Priority: Major
>  Labels: pull-request-available
>
> In our application, different components have their own suitable buffer size 
> to download.
> But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get 
> downloadPartSize from configuration.
> We cannnot use different value for different components in our programs.
> I think we should the method should use the buffer size from the paramater.
> AliyunOSSFileSystem#open(Path path) could have default value as current 
> default downloadPartSize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize

2022-11-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641278#comment-17641278
 ] 

ASF GitHub Bot commented on HADOOP-18543:
-

steveloughran commented on code in PR #5172:
URL: https://github.com/apache/hadoop/pull/5172#discussion_r1035921694


##
hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSInputStream.java:
##
@@ -57,18 +57,21 @@ public class AliyunOSSInputStream extends FSInputStream {
   private ExecutorService readAheadExecutorService;
   private Queue readBufferQueue = new ArrayDeque<>();
 
-  public AliyunOSSInputStream(Configuration conf,
-  ExecutorService readAheadExecutorService, int maxReadAheadPartNumber,
-  AliyunOSSFileSystemStore store, String key, Long contentLength,
-  Statistics statistics) throws IOException {
+  public AliyunOSSInputStream(
+  long downloadPartSize,
+  ExecutorService readAheadExecutorService,
+  int maxReadAheadPartNumber,
+  AliyunOSSFileSystemStore store,
+  String key,
+  Long contentLength,
+  Statistics statistics) throws IOException {
 this.readAheadExecutorService =
-MoreExecutors.listeningDecorator(readAheadExecutorService);
+MoreExecutors.listeningDecorator(readAheadExecutorService);
 this.store = store;
 this.key = key;
 this.statistics = statistics;
 this.contentLength = contentLength;
-downloadPartSize = conf.getLong(MULTIPART_DOWNLOAD_SIZE_KEY,
-MULTIPART_DOWNLOAD_SIZE_DEFAULT);
+this.downloadPartSize = downloadPartSize;

Review Comment:
   i owrry that the downloadpart size of kb is way too small for efficient http 
GET requests; kilobytes to megabytes are better





> AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use 
> buffer size as its downloadPartSize
> -
>
> Key: HADOOP-18543
> URL: https://issues.apache.org/jira/browse/HADOOP-18543
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Reporter: Hangxiang Yu
>Priority: Major
>  Labels: pull-request-available
>
> In our application, different components have their own suitable buffer size 
> to download.
> But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get 
> downloadPartSize from configuration.
> We cannnot use different value for different components in our programs.
> I think we should the method should use the buffer size from the paramater.
> AliyunOSSFileSystem#open(Path path) could have default value as current 
> default downloadPartSize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize

2022-11-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641150#comment-17641150
 ] 

ASF GitHub Bot commented on HADOOP-18543:
-

hadoop-yetus commented on PR #5172:
URL: https://github.com/apache/hadoop/pull/5172#issuecomment-1331799936

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  39m 59s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 36s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 49s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m  0s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 32s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 23s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 23s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javac  |   0m 23s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 25s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 25s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 22s | 
[/results-checkstyle-hadoop-tools_hadoop-aliyun.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5172/2/artifact/out/results-checkstyle-hadoop-tools_hadoop-aliyun.txt)
 |  hadoop-tools/hadoop-aliyun: The patch generated 2 new + 0 unchanged - 0 
fixed = 2 total (was 0)  |
   | +1 :green_heart: |  mvnsite  |   0m 28s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 22s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   0m 50s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 11s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   0m 31s |  |  hadoop-aliyun in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 41s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  93m 23s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5172/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5172 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 2276ff2601e8 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 76ce3cb5dcd36949f916098aa8a58e29c6f7664a |
   | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5172/2/testReport/ |
   | Max. process+thread count | 679 (vs. ulimit of 5500) |
   | modules | C: hadoop-tools/hadoop-aliyun U: hadoop-tools/hadoop-aliyun |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5172/2/console |
   | ver

[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize

2022-11-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641087#comment-17641087
 ] 

ASF GitHub Bot commented on HADOOP-18543:
-

masteryhx commented on code in PR #5172:
URL: https://github.com/apache/hadoop/pull/5172#discussion_r1035582679


##
hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSInputStream.java:
##
@@ -57,18 +57,21 @@ public class AliyunOSSInputStream extends FSInputStream {
   private ExecutorService readAheadExecutorService;
   private Queue readBufferQueue = new ArrayDeque<>();
 
-  public AliyunOSSInputStream(Configuration conf,
-  ExecutorService readAheadExecutorService, int maxReadAheadPartNumber,
-  AliyunOSSFileSystemStore store, String key, Long contentLength,
-  Statistics statistics) throws IOException {
+  public AliyunOSSInputStream(
+  long downloadPartSize,
+  ExecutorService readAheadExecutorService,
+  int maxReadAheadPartNumber,
+  AliyunOSSFileSystemStore store,
+  String key,
+  Long contentLength,
+  Statistics statistics) throws IOException {
 this.readAheadExecutorService =
-MoreExecutors.listeningDecorator(readAheadExecutorService);
+MoreExecutors.listeningDecorator(readAheadExecutorService);
 this.store = store;
 this.key = key;
 this.statistics = statistics;
 this.contentLength = contentLength;
-downloadPartSize = conf.getLong(MULTIPART_DOWNLOAD_SIZE_KEY,
-MULTIPART_DOWNLOAD_SIZE_DEFAULT);
+this.downloadPartSize = downloadPartSize;

Review Comment:
   Good point. I'd like use IO_FILE_BUFFER_SIZE_DEFAULT(4KB) as its min size, 
WDYT?





> AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use 
> buffer size as its downloadPartSize
> -
>
> Key: HADOOP-18543
> URL: https://issues.apache.org/jira/browse/HADOOP-18543
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Reporter: Hangxiang Yu
>Priority: Major
>  Labels: pull-request-available
>
> In our application, different components have their own suitable buffer size 
> to download.
> But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get 
> downloadPartSize from configuration.
> We cannnot use different value for different components in our programs.
> I think we should the method should use the buffer size from the paramater.
> AliyunOSSFileSystem#open(Path path) could have default value as current 
> default downloadPartSize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize

2022-11-29 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640599#comment-17640599
 ] 

Steve Loughran commented on HADOOP-18543:
-

often the buffer size passed in is small, say 16-32 kb. would that be too small

this is a great time to implement openFile() as s3a and abfs does, which lets 
caller pass in a list of options, a file length/status (saves on the HEAD) and 
a read policy (random, whole file, sequential). 

all the open() calls in hadoop codebase now use this and pass in the read 
policy, length if known, and we do this internally in our own avro jars for for 
avro file reads so as to guarantee sequential reads of iceberg manifests even 
in clusters with the s3a read policy == random

> AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use 
> buffer size as its downloadPartSize
> -
>
> Key: HADOOP-18543
> URL: https://issues.apache.org/jira/browse/HADOOP-18543
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Reporter: Hangxiang Yu
>Priority: Major
>  Labels: pull-request-available
>
> In our application, different components have their own suitable buffer size 
> to download.
> But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get 
> downloadPartSize from configuration.
> We cannnot use different value for different components in our programs.
> I think we should the method should use the buffer size from the paramater.
> AliyunOSSFileSystem#open(Path path) could have default value as current 
> default downloadPartSize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize

2022-11-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640597#comment-17640597
 ] 

ASF GitHub Bot commented on HADOOP-18543:
-

steveloughran commented on code in PR #5172:
URL: https://github.com/apache/hadoop/pull/5172#discussion_r1034580639


##
hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSInputStream.java:
##
@@ -57,18 +57,21 @@ public class AliyunOSSInputStream extends FSInputStream {
   private ExecutorService readAheadExecutorService;
   private Queue readBufferQueue = new ArrayDeque<>();
 
-  public AliyunOSSInputStream(Configuration conf,
-  ExecutorService readAheadExecutorService, int maxReadAheadPartNumber,
-  AliyunOSSFileSystemStore store, String key, Long contentLength,
-  Statistics statistics) throws IOException {
+  public AliyunOSSInputStream(
+  long downloadPartSize,
+  ExecutorService readAheadExecutorService,
+  int maxReadAheadPartNumber,
+  AliyunOSSFileSystemStore store,
+  String key,
+  Long contentLength,
+  Statistics statistics) throws IOException {
 this.readAheadExecutorService =
-MoreExecutors.listeningDecorator(readAheadExecutorService);
+MoreExecutors.listeningDecorator(readAheadExecutorService);
 this.store = store;
 this.key = key;
 this.statistics = statistics;
 this.contentLength = contentLength;
-downloadPartSize = conf.getLong(MULTIPART_DOWNLOAD_SIZE_KEY,
-MULTIPART_DOWNLOAD_SIZE_DEFAULT);
+this.downloadPartSize = downloadPartSize;

Review Comment:
   you might want to have a minimum size for this





> AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use 
> buffer size as its downloadPartSize
> -
>
> Key: HADOOP-18543
> URL: https://issues.apache.org/jira/browse/HADOOP-18543
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Reporter: Hangxiang Yu
>Priority: Major
>  Labels: pull-request-available
>
> In our application, different components have their own suitable buffer size 
> to download.
> But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get 
> downloadPartSize from configuration.
> We cannnot use different value for different components in our programs.
> I think we should the method should use the buffer size from the paramater.
> AliyunOSSFileSystem#open(Path path) could have default value as current 
> default downloadPartSize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize

2022-11-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640161#comment-17640161
 ] 

ASF GitHub Bot commented on HADOOP-18543:
-

hadoop-yetus commented on PR #5172:
URL: https://github.com/apache/hadoop/pull/5172#issuecomment-1329502962

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  4s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  41m 19s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 44s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   0m 39s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 45s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 46s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 45s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m  6s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 47s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 24s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 25s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 25s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 24s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   0m 24s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 22s | 
[/results-checkstyle-hadoop-tools_hadoop-aliyun.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5172/1/artifact/out/results-checkstyle-hadoop-tools_hadoop-aliyun.txt)
 |  hadoop-tools/hadoop-aliyun: The patch generated 2 new + 0 unchanged - 0 
fixed = 2 total (was 0)  |
   | +1 :green_heart: |  mvnsite  |   0m 28s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   0m 50s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 21s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   0m 31s |  |  hadoop-aliyun in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 45s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  96m 17s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5172/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5172 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 5bc2bfc76972 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 7580a7cb63c066cc56d2277a5f1829e207b072a3 |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5172/1/testReport/ |
   | Max. process+thread count | 731 (vs. ulimit of 5500) |
   | modules | C: hadoop-tools/hadoop-aliyun U: hadoop-tools/hadoop-aliyun |
   | Console output | 
https://ci-hadoop.apache.org/job/ha

[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize

2022-11-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640117#comment-17640117
 ] 

ASF GitHub Bot commented on HADOOP-18543:
-

masteryhx opened a new pull request, #5172:
URL: https://github.com/apache/hadoop/pull/5172

   
   
   ### Description of PR
   
   In our application, different components have their own suitable buffer size 
to download.
   
   But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get 
downloadPartSize from configuration.
   
   We cannnot use different value for different components in our programs.
   
   I think we should the method should use the buffer size from the paramater.
   
   AliyunOSSFileSystem#open(Path path) could have default value as current 
default downloadPartSize.
   
   ### How was this patch tested?
   Added method TestAliyunOSSInputStream#testConfiguration to test.
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use 
> buffer size as its downloadPartSize
> -
>
> Key: HADOOP-18543
> URL: https://issues.apache.org/jira/browse/HADOOP-18543
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Reporter: Hangxiang Yu
>Priority: Major
>
> In our application, different components have their own suitable buffer size 
> to download.
> But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get 
> downloadPartSize from configuration.
> We cannnot use different value for different components in our programs.
> I think we should the method should use the buffer size from the paramater.
> AliyunOSSFileSystem#open(Path path) could have default value as current 
> default downloadPartSize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize

2022-11-28 Thread Hangxiang Yu (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640053#comment-17640053
 ] 

Hangxiang Yu commented on HADOOP-18543:
---

[~wujinhu] WDYT?
If right, I'd like to create a pr to fix it.

> AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use 
> buffer size as its downloadPartSize
> -
>
> Key: HADOOP-18543
> URL: https://issues.apache.org/jira/browse/HADOOP-18543
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Reporter: Hangxiang Yu
>Priority: Major
>
> In our application, different components have their own suitable buffer size 
> to download.
> But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get 
> downloadPartSize from configuration.
> We cannnot use different value for different components in our programs.
> I think we should the method should use the buffer size from the paramater.
> AliyunOSSFileSystem#open(Path path) could have default value as current 
> default downloadPartSize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org