[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize

ASF GitHub Bot (Jira) Thu, 01 Dec 2022 09:20:52 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-18543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642061#comment-17642061
 ]


ASF GitHub Bot commented on HADOOP-18543:
-----------------------------------------

steveloughran commented on code in PR #5172:
URL: https://github.com/apache/hadoop/pull/5172#discussion_r1037378725


##########
hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSInputStream.java:
##########
@@ -57,18 +57,21 @@ public class AliyunOSSInputStream extends FSInputStream {
   private ExecutorService readAheadExecutorService;
   private Queue<ReadBuffer> readBufferQueue = new ArrayDeque<>();
 
-  public AliyunOSSInputStream(Configuration conf,
-      ExecutorService readAheadExecutorService, int maxReadAheadPartNumber,
-      AliyunOSSFileSystemStore store, String key, Long contentLength,
-      Statistics statistics) throws IOException {
+  public AliyunOSSInputStream(
+          long downloadPartSize,
+          ExecutorService readAheadExecutorService,
+          int maxReadAheadPartNumber,
+          AliyunOSSFileSystemStore store,
+          String key,
+          Long contentLength,
+          Statistics statistics) throws IOException {
     this.readAheadExecutorService =
-        MoreExecutors.listeningDecorator(readAheadExecutorService);
+            MoreExecutors.listeningDecorator(readAheadExecutorService);
     this.store = store;
     this.key = key;
     this.statistics = statistics;
     this.contentLength = contentLength;
-    downloadPartSize = conf.getLong(MULTIPART_DOWNLOAD_SIZE_KEY,
-        MULTIPART_DOWNLOAD_SIZE_DEFAULT);
+    this.downloadPartSize = downloadPartSize;

Review Comment:
   its about the efficiencies of a GET call, overhead of creating HTTPS 
connections etc. which comes down to
   
   * reading a whole file is the wrong strategy for random IO formats (ORC, 
parquet)
   * random IO/small ranged GETs wrong for whole files.
   * even with random io, KB is way too small.
   
   This is why there's a tendency for the stores to do adaptive "first 
backwards/big forward seek means random IO", and since HADOOP-16202 let caller 
declared read policy.
   
   If the existing code was changed to say "we set GET range to be the buffer 
size you passed in on open()", then everyone's existing code is going to suffer 
really badly on performance.
   





> AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use 
> buffer size as its downloadPartSize
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-18543
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18543
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/oss
>            Reporter: Hangxiang Yu
>            Priority: Major
>              Labels: pull-request-available
>
> In our application, different components have their own suitable buffer size 
> to download.
> But currently, AliyunOSSFileSystem#open(Path path, int bufferSize) just get 
> downloadPartSize from configuration.
> We cannnot use different value for different components in our programs.
> I think we should the method should use the buffer size from the paramater.
> AliyunOSSFileSystem#open(Path path) could have default value as current 
> default downloadPartSize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-18543) AliyunOSS: AliyunOSSFileSystem#open(Path path, int bufferSize) should use buffer size as its downloadPartSize

Reply via email to