[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize
[ https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642456#comment-17642456 ] ASF GitHub Bot commented on HADOOP-18543: - steveloughran commented on PR #5172: URL: https://github.com/apache/hadoop/pull/5172#issuecomment-1335114105 > I'd like to make oss also implement openFile() in this pr as s3a does which could also meet our needs. This is exactly what the API was designed for -to let people provide extra hints/options. to the object stores. Do read the filesystem.md and related docs on the topic, and look at the abfs implementation as well as the s3a one. If your implementation takes a FileStatus or length option, it can then skip the HEAD request on opening and save time and money. All the hadoop internal uses of openFile() do this. My own copies of the parquet and avro readers also do it for better cloud reading performance > AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use > buffer size as its downloadPartSize > - > > Key: HADOOP-18543 > URL: https://issues.apache.org/jira/browse/HADOOP-18543 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Reporter: Hangxiang Yu >Priority: Major > Labels: pull-request-available > > In our application, different components have their own suitable buffer size > to download. > But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get > downloadPartSize from configuration. > We cannnot use different value for different components in our programs. > I think we should the method should use the buffer size from the paramater. > AliyunOSSFileSystem#open(Path path) could have default value as current > default downloadPartSize. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize
[ https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642272#comment-17642272 ] ASF GitHub Bot commented on HADOOP-18543: - masteryhx commented on PR #5172: URL: https://github.com/apache/hadoop/pull/5172#issuecomment-1334751421 > sorry, but I'm going to say -1 to using the normal IO buffer size as the GET range. The default value of 4k is way too small even for parquet/orc reads, it will break all existing apps in performance terms: distcp, parquet library, avro, ORC, everything, as they all use the default value. > > 1. there is a configuration option for multipart download size, which is filesystem-wide. Not as flexible, but something everyone will expect to work. > 2. If you want better control of read policy, buffer sizes etc, then this connector needs to implement openFile(), as s3a and abfs do. that will let you add a new option to specify the range for GET calls. Thanks a lot for your detailed explaination and suggestion. I agree that it's better not to change the implementaion of current interface. I'd like to make oss also implement openFile() in this pr as s3a does which could also meet our needs. WDYT? > AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use > buffer size as its downloadPartSize > - > > Key: HADOOP-18543 > URL: https://issues.apache.org/jira/browse/HADOOP-18543 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Reporter: Hangxiang Yu >Priority: Major > Labels: pull-request-available > > In our application, different components have their own suitable buffer size > to download. > But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get > downloadPartSize from configuration. > We cannnot use different value for different components in our programs. > I think we should the method should use the buffer size from the paramater. > AliyunOSSFileSystem#open(Path path) could have default value as current > default downloadPartSize. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize
[ https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642065#comment-17642065 ] ASF GitHub Bot commented on HADOOP-18543: - steveloughran commented on PR #5172: URL: https://github.com/apache/hadoop/pull/5172#issuecomment-1334109192 (note also that includes letting you declare read policy (whole-file, sequential, random, vectoredthat can be used to change default block size too) > AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use > buffer size as its downloadPartSize > - > > Key: HADOOP-18543 > URL: https://issues.apache.org/jira/browse/HADOOP-18543 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Reporter: Hangxiang Yu >Priority: Major > Labels: pull-request-available > > In our application, different components have their own suitable buffer size > to download. > But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get > downloadPartSize from configuration. > We cannnot use different value for different components in our programs. > I think we should the method should use the buffer size from the paramater. > AliyunOSSFileSystem#open(Path path) could have default value as current > default downloadPartSize. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize
[ https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642064#comment-17642064 ] ASF GitHub Bot commented on HADOOP-18543: - steveloughran commented on PR #5172: URL: https://github.com/apache/hadoop/pull/5172#issuecomment-1334108437 sorry, but I'm going to say -1 to using the normal IO buffer size as the GET range. The default value of 4k is way too small even for parquet/orc reads, it will break all existing apps in performance terms: distcp, parquet library, avro, ORC, everything, as they all use the default value. 1. there is a configuration option for multipart download size, which is filesystem-wide. Not as flexible, but something everyone will expect to work. 2. If you want better control of read policy, buffer sizes etc, then this connector needs to implement openFile(), as s3a and abfs do. that will let you add a new option to specify the range for GET calls. > AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use > buffer size as its downloadPartSize > - > > Key: HADOOP-18543 > URL: https://issues.apache.org/jira/browse/HADOOP-18543 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Reporter: Hangxiang Yu >Priority: Major > Labels: pull-request-available > > In our application, different components have their own suitable buffer size > to download. > But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get > downloadPartSize from configuration. > We cannnot use different value for different components in our programs. > I think we should the method should use the buffer size from the paramater. > AliyunOSSFileSystem#open(Path path) could have default value as current > default downloadPartSize. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize
[ https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642061#comment-17642061 ] ASF GitHub Bot commented on HADOOP-18543: - steveloughran commented on code in PR #5172: URL: https://github.com/apache/hadoop/pull/5172#discussion_r1037378725 ## hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSInputStream.java: ## @@ -57,18 +57,21 @@ public class AliyunOSSInputStream extends FSInputStream { private ExecutorService readAheadExecutorService; private Queue readBufferQueue = new ArrayDeque<>(); - public AliyunOSSInputStream(Configuration conf, - ExecutorService readAheadExecutorService, int maxReadAheadPartNumber, - AliyunOSSFileSystemStore store, String key, Long contentLength, - Statistics statistics) throws IOException { + public AliyunOSSInputStream( + long downloadPartSize, + ExecutorService readAheadExecutorService, + int maxReadAheadPartNumber, + AliyunOSSFileSystemStore store, + String key, + Long contentLength, + Statistics statistics) throws IOException { this.readAheadExecutorService = -MoreExecutors.listeningDecorator(readAheadExecutorService); +MoreExecutors.listeningDecorator(readAheadExecutorService); this.store = store; this.key = key; this.statistics = statistics; this.contentLength = contentLength; -downloadPartSize = conf.getLong(MULTIPART_DOWNLOAD_SIZE_KEY, -MULTIPART_DOWNLOAD_SIZE_DEFAULT); +this.downloadPartSize = downloadPartSize; Review Comment: its about the efficiencies of a GET call, overhead of creating HTTPS connections etc. which comes down to * reading a whole file is the wrong strategy for random IO formats (ORC, parquet) * random IO/small ranged GETs wrong for whole files. * even with random io, KB is way too small. This is why there's a tendency for the stores to do adaptive "first backwards/big forward seek means random IO", and since HADOOP-16202 let caller declared read policy. If the existing code was changed to say "we set GET range to be the buffer size you passed in on open()", then everyone's existing code is going to suffer really badly on performance. > AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use > buffer size as its downloadPartSize > - > > Key: HADOOP-18543 > URL: https://issues.apache.org/jira/browse/HADOOP-18543 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Reporter: Hangxiang Yu >Priority: Major > Labels: pull-request-available > > In our application, different components have their own suitable buffer size > to download. > But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get > downloadPartSize from configuration. > We cannnot use different value for different components in our programs. > I think we should the method should use the buffer size from the paramater. > AliyunOSSFileSystem#open(Path path) could have default value as current > default downloadPartSize. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize
[ https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641302#comment-17641302 ] ASF GitHub Bot commented on HADOOP-18543: - masteryhx commented on code in PR #5172: URL: https://github.com/apache/hadoop/pull/5172#discussion_r1035962332 ## hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSInputStream.java: ## @@ -57,18 +57,21 @@ public class AliyunOSSInputStream extends FSInputStream { private ExecutorService readAheadExecutorService; private Queue readBufferQueue = new ArrayDeque<>(); - public AliyunOSSInputStream(Configuration conf, - ExecutorService readAheadExecutorService, int maxReadAheadPartNumber, - AliyunOSSFileSystemStore store, String key, Long contentLength, - Statistics statistics) throws IOException { + public AliyunOSSInputStream( + long downloadPartSize, + ExecutorService readAheadExecutorService, + int maxReadAheadPartNumber, + AliyunOSSFileSystemStore store, + String key, + Long contentLength, + Statistics statistics) throws IOException { this.readAheadExecutorService = -MoreExecutors.listeningDecorator(readAheadExecutorService); +MoreExecutors.listeningDecorator(readAheadExecutorService); this.store = store; this.key = key; this.statistics = statistics; this.contentLength = contentLength; -downloadPartSize = conf.getLong(MULTIPART_DOWNLOAD_SIZE_KEY, -MULTIPART_DOWNLOAD_SIZE_DEFAULT); +this.downloadPartSize = downloadPartSize; Review Comment: I think we could see different performance between uploading/requesting 4KB and 4MB ? In my some cases, some data are orgnazied with unit of ~16KB, and I will read them randomly. In this case, I am sure what I need is just these KB, more data will cost more time and bandwidth. > AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use > buffer size as its downloadPartSize > - > > Key: HADOOP-18543 > URL: https://issues.apache.org/jira/browse/HADOOP-18543 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Reporter: Hangxiang Yu >Priority: Major > Labels: pull-request-available > > In our application, different components have their own suitable buffer size > to download. > But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get > downloadPartSize from configuration. > We cannnot use different value for different components in our programs. > I think we should the method should use the buffer size from the paramater. > AliyunOSSFileSystem#open(Path path) could have default value as current > default downloadPartSize. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize
[ https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641278#comment-17641278 ] ASF GitHub Bot commented on HADOOP-18543: - steveloughran commented on code in PR #5172: URL: https://github.com/apache/hadoop/pull/5172#discussion_r1035921694 ## hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSInputStream.java: ## @@ -57,18 +57,21 @@ public class AliyunOSSInputStream extends FSInputStream { private ExecutorService readAheadExecutorService; private Queue readBufferQueue = new ArrayDeque<>(); - public AliyunOSSInputStream(Configuration conf, - ExecutorService readAheadExecutorService, int maxReadAheadPartNumber, - AliyunOSSFileSystemStore store, String key, Long contentLength, - Statistics statistics) throws IOException { + public AliyunOSSInputStream( + long downloadPartSize, + ExecutorService readAheadExecutorService, + int maxReadAheadPartNumber, + AliyunOSSFileSystemStore store, + String key, + Long contentLength, + Statistics statistics) throws IOException { this.readAheadExecutorService = -MoreExecutors.listeningDecorator(readAheadExecutorService); +MoreExecutors.listeningDecorator(readAheadExecutorService); this.store = store; this.key = key; this.statistics = statistics; this.contentLength = contentLength; -downloadPartSize = conf.getLong(MULTIPART_DOWNLOAD_SIZE_KEY, -MULTIPART_DOWNLOAD_SIZE_DEFAULT); +this.downloadPartSize = downloadPartSize; Review Comment: i owrry that the downloadpart size of kb is way too small for efficient http GET requests; kilobytes to megabytes are better > AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use > buffer size as its downloadPartSize > - > > Key: HADOOP-18543 > URL: https://issues.apache.org/jira/browse/HADOOP-18543 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Reporter: Hangxiang Yu >Priority: Major > Labels: pull-request-available > > In our application, different components have their own suitable buffer size > to download. > But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get > downloadPartSize from configuration. > We cannnot use different value for different components in our programs. > I think we should the method should use the buffer size from the paramater. > AliyunOSSFileSystem#open(Path path) could have default value as current > default downloadPartSize. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize
[ https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641150#comment-17641150 ] ASF GitHub Bot commented on HADOOP-18543: - hadoop-yetus commented on PR #5172: URL: https://github.com/apache/hadoop/pull/5172#issuecomment-1331799936 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 38s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 39m 59s | | trunk passed | | +1 :green_heart: | compile | 0m 40s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 0m 40s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 36s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 37s | | trunk passed | | +1 :green_heart: | javadoc | 0m 49s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 0m 42s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 0s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 32s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 23s | | the patch passed | | +1 :green_heart: | compile | 0m 23s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 0m 23s | | the patch passed | | +1 :green_heart: | compile | 0m 25s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 0m 25s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 22s | [/results-checkstyle-hadoop-tools_hadoop-aliyun.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5172/2/artifact/out/results-checkstyle-hadoop-tools_hadoop-aliyun.txt) | hadoop-tools/hadoop-aliyun: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) | | +1 :green_heart: | mvnsite | 0m 28s | | the patch passed | | +1 :green_heart: | javadoc | 0m 22s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 0m 26s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 0m 50s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 11s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 0m 31s | | hadoop-aliyun in the patch passed. | | +1 :green_heart: | asflicense | 0m 41s | | The patch does not generate ASF License warnings. | | | | 93m 23s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5172/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5172 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 2276ff2601e8 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 76ce3cb5dcd36949f916098aa8a58e29c6f7664a | | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5172/2/testReport/ | | Max. process+thread count | 679 (vs. ulimit of 5500) | | modules | C: hadoop-tools/hadoop-aliyun U: hadoop-tools/hadoop-aliyun | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5172/2/console | | ver
[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize
[ https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641087#comment-17641087 ] ASF GitHub Bot commented on HADOOP-18543: - masteryhx commented on code in PR #5172: URL: https://github.com/apache/hadoop/pull/5172#discussion_r1035582679 ## hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSInputStream.java: ## @@ -57,18 +57,21 @@ public class AliyunOSSInputStream extends FSInputStream { private ExecutorService readAheadExecutorService; private Queue readBufferQueue = new ArrayDeque<>(); - public AliyunOSSInputStream(Configuration conf, - ExecutorService readAheadExecutorService, int maxReadAheadPartNumber, - AliyunOSSFileSystemStore store, String key, Long contentLength, - Statistics statistics) throws IOException { + public AliyunOSSInputStream( + long downloadPartSize, + ExecutorService readAheadExecutorService, + int maxReadAheadPartNumber, + AliyunOSSFileSystemStore store, + String key, + Long contentLength, + Statistics statistics) throws IOException { this.readAheadExecutorService = -MoreExecutors.listeningDecorator(readAheadExecutorService); +MoreExecutors.listeningDecorator(readAheadExecutorService); this.store = store; this.key = key; this.statistics = statistics; this.contentLength = contentLength; -downloadPartSize = conf.getLong(MULTIPART_DOWNLOAD_SIZE_KEY, -MULTIPART_DOWNLOAD_SIZE_DEFAULT); +this.downloadPartSize = downloadPartSize; Review Comment: Good point. I'd like use IO_FILE_BUFFER_SIZE_DEFAULT(4KB) as its min size, WDYT? > AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use > buffer size as its downloadPartSize > - > > Key: HADOOP-18543 > URL: https://issues.apache.org/jira/browse/HADOOP-18543 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Reporter: Hangxiang Yu >Priority: Major > Labels: pull-request-available > > In our application, different components have their own suitable buffer size > to download. > But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get > downloadPartSize from configuration. > We cannnot use different value for different components in our programs. > I think we should the method should use the buffer size from the paramater. > AliyunOSSFileSystem#open(Path path) could have default value as current > default downloadPartSize. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize
[ https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640599#comment-17640599 ] Steve Loughran commented on HADOOP-18543: - often the buffer size passed in is small, say 16-32 kb. would that be too small this is a great time to implement openFile() as s3a and abfs does, which lets caller pass in a list of options, a file length/status (saves on the HEAD) and a read policy (random, whole file, sequential). all the open() calls in hadoop codebase now use this and pass in the read policy, length if known, and we do this internally in our own avro jars for for avro file reads so as to guarantee sequential reads of iceberg manifests even in clusters with the s3a read policy == random > AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use > buffer size as its downloadPartSize > - > > Key: HADOOP-18543 > URL: https://issues.apache.org/jira/browse/HADOOP-18543 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Reporter: Hangxiang Yu >Priority: Major > Labels: pull-request-available > > In our application, different components have their own suitable buffer size > to download. > But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get > downloadPartSize from configuration. > We cannnot use different value for different components in our programs. > I think we should the method should use the buffer size from the paramater. > AliyunOSSFileSystem#open(Path path) could have default value as current > default downloadPartSize. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize
[ https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640597#comment-17640597 ] ASF GitHub Bot commented on HADOOP-18543: - steveloughran commented on code in PR #5172: URL: https://github.com/apache/hadoop/pull/5172#discussion_r1034580639 ## hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSInputStream.java: ## @@ -57,18 +57,21 @@ public class AliyunOSSInputStream extends FSInputStream { private ExecutorService readAheadExecutorService; private Queue readBufferQueue = new ArrayDeque<>(); - public AliyunOSSInputStream(Configuration conf, - ExecutorService readAheadExecutorService, int maxReadAheadPartNumber, - AliyunOSSFileSystemStore store, String key, Long contentLength, - Statistics statistics) throws IOException { + public AliyunOSSInputStream( + long downloadPartSize, + ExecutorService readAheadExecutorService, + int maxReadAheadPartNumber, + AliyunOSSFileSystemStore store, + String key, + Long contentLength, + Statistics statistics) throws IOException { this.readAheadExecutorService = -MoreExecutors.listeningDecorator(readAheadExecutorService); +MoreExecutors.listeningDecorator(readAheadExecutorService); this.store = store; this.key = key; this.statistics = statistics; this.contentLength = contentLength; -downloadPartSize = conf.getLong(MULTIPART_DOWNLOAD_SIZE_KEY, -MULTIPART_DOWNLOAD_SIZE_DEFAULT); +this.downloadPartSize = downloadPartSize; Review Comment: you might want to have a minimum size for this > AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use > buffer size as its downloadPartSize > - > > Key: HADOOP-18543 > URL: https://issues.apache.org/jira/browse/HADOOP-18543 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Reporter: Hangxiang Yu >Priority: Major > Labels: pull-request-available > > In our application, different components have their own suitable buffer size > to download. > But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get > downloadPartSize from configuration. > We cannnot use different value for different components in our programs. > I think we should the method should use the buffer size from the paramater. > AliyunOSSFileSystem#open(Path path) could have default value as current > default downloadPartSize. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize
[ https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640161#comment-17640161 ] ASF GitHub Bot commented on HADOOP-18543: - hadoop-yetus commented on PR #5172: URL: https://github.com/apache/hadoop/pull/5172#issuecomment-1329502962 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 4s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 41m 19s | | trunk passed | | +1 :green_heart: | compile | 0m 39s | | trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 44s | | trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 0m 39s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 45s | | trunk passed | | +1 :green_heart: | javadoc | 0m 46s | | trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 45s | | trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 1m 6s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 47s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 24s | | the patch passed | | +1 :green_heart: | compile | 0m 25s | | the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 25s | | the patch passed | | +1 :green_heart: | compile | 0m 24s | | the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 0m 24s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 22s | [/results-checkstyle-hadoop-tools_hadoop-aliyun.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5172/1/artifact/out/results-checkstyle-hadoop-tools_hadoop-aliyun.txt) | hadoop-tools/hadoop-aliyun: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) | | +1 :green_heart: | mvnsite | 0m 28s | | the patch passed | | +1 :green_heart: | javadoc | 0m 27s | | the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 26s | | the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 0m 50s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 21s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 0m 31s | | hadoop-aliyun in the patch passed. | | +1 :green_heart: | asflicense | 0m 45s | | The patch does not generate ASF License warnings. | | | | 96m 17s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5172/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5172 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 5bc2bfc76972 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 7580a7cb63c066cc56d2277a5f1829e207b072a3 | | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5172/1/testReport/ | | Max. process+thread count | 731 (vs. ulimit of 5500) | | modules | C: hadoop-tools/hadoop-aliyun U: hadoop-tools/hadoop-aliyun | | Console output | https://ci-hadoop.apache.org/job/ha
[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize
[ https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640117#comment-17640117 ] ASF GitHub Bot commented on HADOOP-18543: - masteryhx opened a new pull request, #5172: URL: https://github.com/apache/hadoop/pull/5172 ### Description of PR In our application, different components have their own suitable buffer size to download. But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get downloadPartSize from configuration. We cannnot use different value for different components in our programs. I think we should the method should use the buffer size from the paramater. AliyunOSSFileSystem#open(Path path) could have default value as current default downloadPartSize. ### How was this patch tested? Added method TestAliyunOSSInputStream#testConfiguration to test. ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use > buffer size as its downloadPartSize > - > > Key: HADOOP-18543 > URL: https://issues.apache.org/jira/browse/HADOOP-18543 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Reporter: Hangxiang Yu >Priority: Major > > In our application, different components have their own suitable buffer size > to download. > But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get > downloadPartSize from configuration. > We cannnot use different value for different components in our programs. > I think we should the method should use the buffer size from the paramater. > AliyunOSSFileSystem#open(Path path) could have default value as current > default downloadPartSize. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize
[ https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640053#comment-17640053 ] Hangxiang Yu commented on HADOOP-18543: --- [~wujinhu] WDYT? If right, I'd like to create a pr to fix it. > AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use > buffer size as its downloadPartSize > - > > Key: HADOOP-18543 > URL: https://issues.apache.org/jira/browse/HADOOP-18543 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Reporter: Hangxiang Yu >Priority: Major > > In our application, different components have their own suitable buffer size > to download. > But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get > downloadPartSize from configuration. > We cannnot use different value for different components in our programs. > I think we should the method should use the buffer size from the paramater. > AliyunOSSFileSystem#open(Path path) could have default value as current > default downloadPartSize. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org