[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-03-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=747977&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-747977
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 25/Mar/22 20:24
Start Date: 25/Mar/22 20:24
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-1079407424


   I've created a feature branch for this and a PR which takes this change 
(with test file renames & some other small changes) as an initial PR to apply 
to it: #4109
   
   can everyone who has outstanding issues/comments look at that patch and 
create some subtasks on the JIRA itself...we can fix the issues (test failures 
etc) in individual subtasks.
   
   note: this feature branch can be rebased as we go along, though I'd like to 
get it in asap. I do want to get #2584 in first, *which needs reviews from 
other people*. That change should allow for the new input stream to decide 
policy from the openFile() parameters


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 747977)
Time Spent: 12.5h  (was: 12h 20m)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-03-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=747971&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-747971
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 25/Mar/22 20:17
Start Date: 25/Mar/22 20:17
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-985081348


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  2s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 22 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 17s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 49s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 32s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 48s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 14s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 26s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 22s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/10/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 3 new + 3 unchanged - 4 fixed 
= 6 total (was 7)  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 19s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   2m 50s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/10/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 31s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  88m 34s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.fs.s3a.s3guard.TestObjectChangeDetectionAttributes |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/10/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux b1da0e6ffd6a 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 29a482ce35fb191e9414aaea58a41482c86bed24 |
   | Default Java 

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-03-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=747972&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-747972
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 25/Mar/22 20:17
Start Date: 25/Mar/22 20:17
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-1036981660


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 41s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 21 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 13s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 45s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   0m 29s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 46s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m  7s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 30s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  20m 49s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 44s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 20s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/22/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 2 new + 5 unchanged - 2 fixed 
= 7 total (was 7)  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  19m 58s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 30s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 35s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  87m  5s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/22/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux f85d1cda261a 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 7cc1453c847ac41590daa04a21038820729669bf |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-1

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-03-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=747970&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-747970
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 25/Mar/22 20:16
Start Date: 25/Mar/22 20:16
Worklog Time Spent: 10m 
  Work Description: steveloughran opened a new pull request #4109:
URL: https://github.com/apache/hadoop/pull/4109


   ### Description of PR
   
   This is the PR of #3736 applied to a dedicated feature branch, with some 
minor pre-merge tuning with all subsequent changes to be their own PR
   
   * rename test classes, have AbstractHadoopTestBase as the base
   * package info files for new packages
   * import ordering
   * move to intercept() for assertions; ExceptionAsserts is invoking it and 
can be removed in future.
   
   this adds a dependency on a twitter lib which looks like scala code. that 
MUST be cut before we can merge.
   
   ### How was this patch tested?
   
   s3 london with `-Dparallel-tests -DtestsThreadCount=8  -Dmarkers=keep 
-Dscale`
   
   there are some failing integration tests which will need to be fixed before 
the feature branch is merged to trunk.
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [X] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 747970)
Time Spent: 12h  (was: 11h 50m)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-03-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=747920&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-747920
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 25/Mar/22 18:22
Start Date: 25/Mar/22 18:22
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r835515328



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/read/S3EInputStream.java
##
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.read;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.fs.CanSetReadahead;
+import org.apache.hadoop.fs.FSExceptionMessages;
+import org.apache.hadoop.fs.FSInputStream;
+import org.apache.hadoop.fs.StreamCapabilities;
+import org.apache.hadoop.fs.common.Validate;
+import org.apache.hadoop.fs.s3a.S3AInputStream;
+import org.apache.hadoop.fs.s3a.S3AReadOpContext;
+import org.apache.hadoop.fs.s3a.S3ObjectAttributes;
+import org.apache.hadoop.fs.s3a.statistics.S3AInputStreamStatistics;
+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.fs.statistics.IOStatisticsSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+
+/**
+ * Enhanced {@code InputStream} for reading from S3.
+ *
+ * This implementation provides improved read throughput by asynchronously 
prefetching
+ * blocks of configurable size from the underlying S3 file.
+ */
+public class S3EInputStream

Review comment:
   i agree that saving time is generally the best action to prioritise, as 
if you are running in ec2 your cpu rental is a bigger cost. it does increase 
the risk of overloading the IO capacity of a shard in a bucket, but we have 
that with the current random IO implementation anyway




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 747920)
Time Spent: 11h 50m  (was: 11h 40m)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-03-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=747918&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-747918
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 25/Mar/22 18:11
Start Date: 25/Mar/22 18:11
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-1026648916


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 45s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 21 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 48s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   0m 27s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 44s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 25s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m  7s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 17s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  23m 38s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 51s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 48s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   0m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 22s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/21/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 2 new + 5 unchanged - 2 fixed 
= 7 total (was 7)  |
   | +1 :green_heart: |  mvnsite  |   0m 43s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 20s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m 28s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 36s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 36s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  98m 38s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/21/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux 5ae2d1e7c1db 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 7f80c5fca27e050a0e854a885e78320aef7ad224 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-03-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=747915&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-747915
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 25/Mar/22 18:10
Start Date: 25/Mar/22 18:10
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-1005283237






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 747915)
Time Spent: 11.5h  (was: 11h 20m)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-03-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=747914&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-747914
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 25/Mar/22 18:10
Start Date: 25/Mar/22 18:10
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-990537861






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 747914)
Time Spent: 11h 20m  (was: 11h 10m)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-03-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=747913&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-747913
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 25/Mar/22 18:09
Start Date: 25/Mar/22 18:09
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-989436895


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 47s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 23 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 28s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 48s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 48s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 12s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 23s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/13/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 6 new + 5 unchanged - 3 fixed 
= 11 total (was 8)  |
   | +1 :green_heart: |  mvnsite  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 16s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | -1 :x: |  javadoc  |   0m 26s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/13/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  
hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 
with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 3 new + 
62 unchanged - 0 fixed = 65 total (was 62)  |
   | +1 :green_heart: |  spotbugs  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 14s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 37s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 33s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  87m 48s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/13/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux 5b3c26fd9634 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personali

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=729381&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-729381
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 18/Feb/22 01:35
Start Date: 18/Feb/22 01:35
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-1043710783


   Hi Steve,
   Thanks for your earlier reviews. Take your time, I do not want to make your
   workload worse. I am happy to work with you using any process that you
   recommend. Let me know what your recommendations are and I will adopt them.
   
   On Thu, Feb 17, 2022 at 12:40 PM Steve Loughran ***@***.***>
   wrote:
   
   > I'm not deliberately ignoring you, just so overloaded with my own work
   > that I've not been able to review anything.
   >
   > we've put mukund's vectored IO patch into a feature branch with the idea
   > being we will still do the normal review process before every patch, but we
   > can get that short chain of patches lined up before the big merge into
   > trunk. we can also play rebase and interactive rebase too.
   >
   > would that work for you too? so patch 1 is "take what there is", patch 2+
   > being the changes we want in before shipping
   >
   > the risk is always it gets forgotten, so we still need to push hard to get
   > it into a state where it can be used as an option, the goal being "no side
   > effects if not used", including nothing extra on the classpath.
   >
   > meanwhile, have a look at this #2584
   > 
   >
   > its a big patch, but a key feature is you can declare your read policy,
   > with whole-file being an option as well as random, sequential and vectored.
   >
   > distcp and the CLI tools all declare their read plans this way
   >
   > i'd like this in before both the vectored io and your stream, so you can
   > use it to help decide whether to cache etc, and to support custom options
   > as well as parse the newly defined standard ones
   >
   > —
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   > Triage notifications on the go with GitHub Mobile for iOS
   > 

   > or Android
   > 
.
   >
   > You are receiving this because you authored the thread.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 729381)
Time Spent: 11h  (was: 10h 50m)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=729240&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-729240
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 17/Feb/22 20:40
Start Date: 17/Feb/22 20:40
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-1043409450


   I'm not deliberately ignoring you, just so overloaded with my own work that 
I've not been able to review anything.
   
   we've put mukund's vectored IO patch into a feature branch with the idea 
being we will still do the normal review process before every patch, but we can 
get that short chain of patches lined up before the big merge into trunk. we 
can also play rebase and interactive rebase too.

   would that work for you too? so patch 1 is "take what there is", patch 2+ 
being the changes we want in before shipping
   
   the risk is always it gets forgotten, so we still need to push hard to get 
it into a state where it can be used as an option, the goal being "no side 
effects if not used", including nothing extra on the classpath.
   
   
   meanwhile, have a look at this https://github.com/apache/hadoop/pull/2584
   
   its a big patch, but a key feature is you can declare your read policy, with 
whole-file being an option as well as random, sequential and vectored.
   
   distcp and the CLI tools all declare their read plans this way
   
   i'd like this in before both the vectored io and your stream, so you can use 
it to help decide whether to cache etc, and to support custom options as well 
as parse the newly defined standard ones


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 729240)
Time Spent: 10h 50m  (was: 10h 40m)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=725569&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-725569
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 12/Feb/22 04:04
Start Date: 12/Feb/22 04:04
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-1036981660


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 41s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 21 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 13s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 45s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   0m 29s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 46s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m  7s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 30s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  20m 49s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 44s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 20s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/22/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 2 new + 5 unchanged - 2 fixed 
= 7 total (was 7)  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  19m 58s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 30s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 35s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  87m  5s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/22/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux f85d1cda261a 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 7cc1453c847ac41590daa04a21038820729669bf |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjd

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=725560&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-725560
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 12/Feb/22 02:26
Start Date: 12/Feb/22 02:26
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r805105020



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/BlockOperations.java
##
@@ -0,0 +1,417 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.common;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.DoubleSummaryStatistics;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+/**
+ * Block level operations performed on a file.
+ * This class is meant to be used by {@code BlockManager}.
+ * It is separated out in its own file due to its size.
+ */
+public class BlockOperations {
+  private static final Logger LOG = 
LoggerFactory.getLogger(BlockOperations.class);
+
+  public enum Kind {

Review comment:
   This class is for debugging/logging purposes only. It is not a part of 
the public interface. I added explanation in the class header.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 725560)
Time Spent: 10.5h  (was: 10h 20m)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=725559&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-725559
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 12/Feb/22 02:22
Start Date: 12/Feb/22 02:22
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r805104460



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/BlockData.java
##
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.common;
+
+/**
+ * Holds information about blocks of data in a file.
+ */
+public class BlockData {

Review comment:
   added more details




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 725559)
Time Spent: 10h 20m  (was: 10h 10m)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=725557&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-725557
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 12/Feb/22 02:20
Start Date: 12/Feb/22 02:20
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r805104237



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/SingleFilePerBlockCache.java
##
@@ -0,0 +1,328 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.common;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.channels.WritableByteChannel;
+import java.nio.file.Files;
+import java.nio.file.OpenOption;
+import java.nio.file.Path;
+import java.nio.file.StandardOpenOption;
+import java.nio.file.attribute.FileAttribute;
+import java.nio.file.attribute.PosixFilePermission;
+import java.nio.file.attribute.PosixFilePermissions;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * Provides functionality necessary for caching blocks of data read from 
FileSystem.
+ * Each cache block is stored on the local disk as a separate file.
+ */
+public class SingleFilePerBlockCache implements BlockCache {
+  private static final Logger LOG = 
LoggerFactory.getLogger(SingleFilePerBlockCache.class);
+
+  // Blocks stored in this cache.
+  private Map blocks = new ConcurrentHashMap<>();
+
+  // Number of times a block was read from this cache.
+  // Used for determining cache utilization factor.
+  private int numGets = 0;
+
+  private boolean closed;
+
+  // Cache entry.
+  // Each block is stored as a separate file.
+  private static class Entry {
+private final int blockNumber;
+private final Path path;
+private final int size;
+private final long checksum;
+
+Entry(int blockNumber, Path path, int size, long checksum) {
+  this.blockNumber = blockNumber;
+  this.path = path;
+  this.size = size;
+  this.checksum = checksum;
+}
+
+@Override
+public String toString() {
+  return String.format(
+  "([%03d] %s: size = %d, checksum = %d)",
+  this.blockNumber, this.path, this.size, this.checksum);
+}
+  }
+
+  public SingleFilePerBlockCache() {
+  }
+
+  /**
+   * Indicates whether the given block is in this cache.
+   */
+  @Override
+  public boolean containsBlock(int blockNumber) {
+return this.blocks.containsKey(blockNumber);
+  }
+
+  /**
+   * Gets the blocks in this cache.
+   */
+  @Override
+  public Iterable blocks() {
+return Collections.unmodifiableList(new 
ArrayList(this.blocks.keySet()));
+  }
+
+  /**
+   * Gets the number of blocks in this cache.
+   */
+  @Override
+  public int size() {
+return this.blocks.size();
+  }
+
+  /**
+   * Gets the block having the given {@code blockNumber}.
+   *
+   * @throws IllegalArgumentException if buffer is null.
+   */
+  @Override
+  public void get(int blockNumber, ByteBuffer buffer) throws IOException {
+if (this.closed) {
+  return;
+}
+
+Validate.checkNotNull(buffer, "buffer");
+
+Entry entry = this.getEntry(blockNumber);
+buffer.clear();
+this.readFile(entry.path, buffer);
+buffer.rewind();
+
+validateEntry(entry, buffer);
+  }
+
+  protected int readFile(Path path, ByteBuffer buffer) throws IOException {
+int numBytesRead = 0;
+int numBytes;
+FileChannel channel = FileChannel.open(path, StandardOpenOption.READ);
+while ((numBytes = channel.read(buffer)) > 0) {
+  numBytesRead += numBytes;
+}
+buffer.limit(buffer.position());
+channel.close();
+return numBytesRead;
+  }
+
+  private Entry getEntry(int blockNumber) {
+   

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=725556&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-725556
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 12/Feb/22 02:19
Start Date: 12/Feb/22 02:19
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r805104105



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/SingleFilePerBlockCache.java
##
@@ -0,0 +1,328 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.common;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.channels.WritableByteChannel;
+import java.nio.file.Files;
+import java.nio.file.OpenOption;
+import java.nio.file.Path;
+import java.nio.file.StandardOpenOption;
+import java.nio.file.attribute.FileAttribute;
+import java.nio.file.attribute.PosixFilePermission;
+import java.nio.file.attribute.PosixFilePermissions;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * Provides functionality necessary for caching blocks of data read from 
FileSystem.
+ * Each cache block is stored on the local disk as a separate file.
+ */
+public class SingleFilePerBlockCache implements BlockCache {
+  private static final Logger LOG = 
LoggerFactory.getLogger(SingleFilePerBlockCache.class);
+
+  // Blocks stored in this cache.
+  private Map blocks = new ConcurrentHashMap<>();
+
+  // Number of times a block was read from this cache.
+  // Used for determining cache utilization factor.
+  private int numGets = 0;
+
+  private boolean closed;
+
+  // Cache entry.
+  // Each block is stored as a separate file.
+  private static class Entry {
+private final int blockNumber;
+private final Path path;
+private final int size;
+private final long checksum;
+
+Entry(int blockNumber, Path path, int size, long checksum) {
+  this.blockNumber = blockNumber;
+  this.path = path;
+  this.size = size;
+  this.checksum = checksum;
+}
+
+@Override
+public String toString() {
+  return String.format(
+  "([%03d] %s: size = %d, checksum = %d)",
+  this.blockNumber, this.path, this.size, this.checksum);
+}
+  }
+
+  public SingleFilePerBlockCache() {
+  }
+
+  /**
+   * Indicates whether the given block is in this cache.
+   */
+  @Override
+  public boolean containsBlock(int blockNumber) {
+return this.blocks.containsKey(blockNumber);
+  }
+
+  /**
+   * Gets the blocks in this cache.
+   */
+  @Override
+  public Iterable blocks() {
+return Collections.unmodifiableList(new 
ArrayList(this.blocks.keySet()));
+  }
+
+  /**
+   * Gets the number of blocks in this cache.
+   */
+  @Override
+  public int size() {
+return this.blocks.size();
+  }
+
+  /**
+   * Gets the block having the given {@code blockNumber}.
+   *
+   * @throws IllegalArgumentException if buffer is null.
+   */
+  @Override
+  public void get(int blockNumber, ByteBuffer buffer) throws IOException {
+if (this.closed) {
+  return;
+}
+
+Validate.checkNotNull(buffer, "buffer");
+
+Entry entry = this.getEntry(blockNumber);
+buffer.clear();
+this.readFile(entry.path, buffer);
+buffer.rewind();
+
+validateEntry(entry, buffer);
+  }
+
+  protected int readFile(Path path, ByteBuffer buffer) throws IOException {
+int numBytesRead = 0;
+int numBytes;
+FileChannel channel = FileChannel.open(path, StandardOpenOption.READ);
+while ((numBytes = channel.read(buffer)) > 0) {
+  numBytesRead += numBytes;
+}
+buffer.limit(buffer.position());
+channel.close();
+return numBytesRead;
+  }
+
+  private Entry getEntry(int blockNumber) {
+   

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=72&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-72
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 12/Feb/22 02:16
Start Date: 12/Feb/22 02:16
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r805103846



##
File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AUnbuffer.java
##
@@ -67,10 +69,10 @@ public void testUnbuffer() throws IOException {
 
 // Call read and then unbuffer
 FSDataInputStream stream = fs.open(path);
-assertEquals(0, stream.read(new byte[8])); // mocks read 0 bytes
+assertEquals(-1, stream.read(new byte[8])); // mocks read 0 bytes

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 72)
Time Spent: 9h 50m  (was: 9h 40m)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=725554&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-725554
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 12/Feb/22 02:13
Start Date: 12/Feb/22 02:13
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r805103066



##
File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AUnbuffer.java
##
@@ -67,10 +69,10 @@ public void testUnbuffer() throws IOException {
 
 // Call read and then unbuffer
 FSDataInputStream stream = fs.open(path);
-assertEquals(0, stream.read(new byte[8])); // mocks read 0 bytes
+assertEquals(-1, stream.read(new byte[8])); // mocks read 0 bytes

Review comment:
   that is the correct behavior. see line 62 & 63. read of underlying 
objectstream returns -1 which should be reflected in input stream read.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 725554)
Time Spent: 9h 40m  (was: 9.5h)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=725553&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-725553
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 12/Feb/22 02:08
Start Date: 12/Feb/22 02:08
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r805102478



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
##
@@ -276,6 +279,19 @@
   private TransferManager transfers;
   private ExecutorService boundedThreadPool;
   private ThreadPoolExecutor unboundedThreadPool;
+
+  // S3 reads are prefetched asynchronously using this future pool.
+  private FuturePool futurePool;
+
+  // If true, the prefetching input stream is used for reads.
+  private boolean prefetchEnabled;
+
+  // Size in bytes of a single prefetch block.
+  private int prefetchBlockSize;
+
+  // Size of prefetch queue (in number of blocks).
+  private int prefetchBlockCount;

Review comment:
   I will keep name as - is because the number of prefetch threads is 
independent of number of prefetch blocks. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 725553)
Time Spent: 9.5h  (was: 9h 20m)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=725552&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-725552
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 12/Feb/22 02:07
Start Date: 12/Feb/22 02:07
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r805102332



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/read/S3EInputStream.java
##
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.read;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.fs.CanSetReadahead;
+import org.apache.hadoop.fs.FSExceptionMessages;
+import org.apache.hadoop.fs.FSInputStream;
+import org.apache.hadoop.fs.StreamCapabilities;
+import org.apache.hadoop.fs.common.Validate;
+import org.apache.hadoop.fs.s3a.S3AInputStream;
+import org.apache.hadoop.fs.s3a.S3AReadOpContext;
+import org.apache.hadoop.fs.s3a.S3ObjectAttributes;
+import org.apache.hadoop.fs.s3a.statistics.S3AInputStreamStatistics;
+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.fs.statistics.IOStatisticsSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+
+/**
+ * Enhanced {@code InputStream} for reading from S3.
+ *
+ * This implementation provides improved read throughput by asynchronously 
prefetching
+ * blocks of configurable size from the underlying S3 file.
+ */
+public class S3EInputStream

Review comment:
   agreed for #1. I will rename to a suitable name to avoid E in the name. 
   
   as for 2, the impact is negligible. here is a back of the envelope 
calculation:
   for a 1TB dataset and 8MB block size (default), we will make 125000 
accesses. That translates to about $0.05 increase in job cost (assuming list 
prices from AWS : $0.0004 per 1000 requests). The small increase is more than 
offset by the reduction in runtime by orders of magnitude.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 725552)
Time Spent: 9h 20m  (was: 9h 10m)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-02-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=723909&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-723909
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 09/Feb/22 18:57
Start Date: 09/Feb/22 18:57
Worklog Time Spent: 10m 
  Work Description: yzhangal commented on a change in pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r802985115



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/BlockOperations.java
##
@@ -0,0 +1,417 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.common;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.DoubleSummaryStatistics;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+/**
+ * Block level operations performed on a file.
+ * This class is meant to be used by {@code BlockManager}.
+ * It is separated out in its own file due to its size.
+ */
+public class BlockOperations {
+  private static final Logger LOG = 
LoggerFactory.getLogger(BlockOperations.class);
+
+  public enum Kind {

Review comment:
   Please add a clear description what each of these operations does.  
Though some seems to be intuitive, others not. 
   
   And it would be good to describe the lifecycle of a block data somewhere.
   
   Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 723909)
Time Spent: 9h 10m  (was: 9h)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-02-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=723884&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-723884
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 09/Feb/22 18:39
Start Date: 09/Feb/22 18:39
Worklog Time Spent: 10m 
  Work Description: yzhangal commented on a change in pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r802970939



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/SingleFilePerBlockCache.java
##
@@ -0,0 +1,328 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.common;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.channels.WritableByteChannel;
+import java.nio.file.Files;
+import java.nio.file.OpenOption;
+import java.nio.file.Path;
+import java.nio.file.StandardOpenOption;
+import java.nio.file.attribute.FileAttribute;
+import java.nio.file.attribute.PosixFilePermission;
+import java.nio.file.attribute.PosixFilePermissions;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * Provides functionality necessary for caching blocks of data read from 
FileSystem.
+ * Each cache block is stored on the local disk as a separate file.
+ */
+public class SingleFilePerBlockCache implements BlockCache {
+  private static final Logger LOG = 
LoggerFactory.getLogger(SingleFilePerBlockCache.class);
+
+  // Blocks stored in this cache.
+  private Map blocks = new ConcurrentHashMap<>();
+
+  // Number of times a block was read from this cache.
+  // Used for determining cache utilization factor.
+  private int numGets = 0;
+
+  private boolean closed;
+
+  // Cache entry.
+  // Each block is stored as a separate file.
+  private static class Entry {
+private final int blockNumber;
+private final Path path;
+private final int size;
+private final long checksum;
+
+Entry(int blockNumber, Path path, int size, long checksum) {
+  this.blockNumber = blockNumber;
+  this.path = path;
+  this.size = size;
+  this.checksum = checksum;
+}
+
+@Override
+public String toString() {
+  return String.format(
+  "([%03d] %s: size = %d, checksum = %d)",
+  this.blockNumber, this.path, this.size, this.checksum);
+}
+  }
+
+  public SingleFilePerBlockCache() {
+  }
+
+  /**
+   * Indicates whether the given block is in this cache.
+   */
+  @Override
+  public boolean containsBlock(int blockNumber) {
+return this.blocks.containsKey(blockNumber);
+  }
+
+  /**
+   * Gets the blocks in this cache.
+   */
+  @Override
+  public Iterable blocks() {
+return Collections.unmodifiableList(new 
ArrayList(this.blocks.keySet()));
+  }
+
+  /**
+   * Gets the number of blocks in this cache.
+   */
+  @Override
+  public int size() {
+return this.blocks.size();
+  }
+
+  /**
+   * Gets the block having the given {@code blockNumber}.
+   *
+   * @throws IllegalArgumentException if buffer is null.
+   */
+  @Override
+  public void get(int blockNumber, ByteBuffer buffer) throws IOException {
+if (this.closed) {
+  return;
+}
+
+Validate.checkNotNull(buffer, "buffer");
+
+Entry entry = this.getEntry(blockNumber);
+buffer.clear();
+this.readFile(entry.path, buffer);
+buffer.rewind();
+
+validateEntry(entry, buffer);
+  }
+
+  protected int readFile(Path path, ByteBuffer buffer) throws IOException {
+int numBytesRead = 0;
+int numBytes;
+FileChannel channel = FileChannel.open(path, StandardOpenOption.READ);
+while ((numBytes = channel.read(buffer)) > 0) {
+  numBytesRead += numBytes;
+}
+buffer.limit(buffer.position());
+channel.close();
+return numBytesRead;
+  }
+
+  private Entry getEntry(int blockNumber) {
+Vali

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-02-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=718524&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-718524
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 01/Feb/22 09:42
Start Date: 01/Feb/22 09:42
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-1026648916


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 45s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 21 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 48s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   0m 27s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 44s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 25s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m  7s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 17s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  23m 38s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 51s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 48s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   0m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 22s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/21/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 2 new + 5 unchanged - 2 fixed 
= 7 total (was 7)  |
   | +1 :green_heart: |  mvnsite  |   0m 43s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 20s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m 28s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 36s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 36s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  98m 38s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/21/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux 5ae2d1e7c1db 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 7f80c5fca27e050a0e854a885e78320aef7ad224 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=703713&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703713
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 05/Jan/22 02:57
Start Date: 05/Jan/22 02:57
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-1005339212


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 48s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 24 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 42s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 49s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 31s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 48s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 28s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 21s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/20/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 2 new + 7 unchanged - 3 fixed 
= 9 total (was 10)  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m  8s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 36s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 35s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  88m 23s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/20/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux 74bcb490c0f4 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / bdd40fd8a33d45a690800f20f42a2c56e25a01f9 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=703677&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703677
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 05/Jan/22 01:20
Start Date: 05/Jan/22 01:20
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r778488178



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/read/S3Reader.java
##
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.read;
+
+import org.apache.hadoop.fs.common.Validate;
+import org.apache.hadoop.fs.s3a.Invoker;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.EOFException;
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.SocketTimeoutException;
+import java.nio.ByteBuffer;
+
+/**
+ * Provides functionality to read S3 file one block at a time.
+ */
+public class S3Reader implements Closeable {
+  private static final Logger LOG = LoggerFactory.getLogger(S3Reader.class);
+
+  // The S3 file to read.
+  private final S3File s3File;
+
+  // Set to true by close().
+  private volatile boolean closed;
+
+  /**
+   * Constructs an instance of {@link S3Reader}.
+   *
+   * @param s3File The S3 file to read.
+   *
+   * @throws IllegalArgumentException if s3File is null.
+   */
+  public S3Reader(S3File s3File) {
+Validate.checkNotNull(s3File, "s3File");
+
+this.s3File = s3File;
+  }
+
+  /**
+   * Stars reading at {@code offset} and reads upto {@code size} bytes into 
{@code buffer}.
+   *
+   * @param buffer the buffer into which data is returned
+   * @param offset the absolute offset into the underlying file where reading 
starts.
+   * @param size the number of bytes to be read.
+   *
+   * @return number of bytes actually read.
+   * @throws IOException if there is an error reading from the file.
+   *
+   * @throws IllegalArgumentException if buffer is null.
+   * @throws IllegalArgumentException if offset is outside of the range [0, 
file size].
+   * @throws IllegalArgumentException if size is zero or negative.
+   */
+  public int read(ByteBuffer buffer, long offset, int size) throws IOException 
{
+Validate.checkNotNull(buffer, "buffer");
+Validate.checkWithinRange(offset, "offset", 0, this.s3File.size());
+Validate.checkPositiveInteger(size, "size");
+
+if (this.closed) {
+  return -1;
+}
+
+int reqSize = (int) Math.min(size, this.s3File.size() - offset);
+return readOneBlockWithRetries(buffer, offset, reqSize);
+  }
+
+  @Override
+  public void close() {
+this.closed = true;
+  }
+
+  private int readOneBlockWithRetries(ByteBuffer buffer, long offset, int size)
+  throws IOException {
+
+this.s3File.getStatistics().readOperationStarted(offset, size);
+Invoker invoker = this.s3File.getReadInvoker();
+
+invoker.retry(
+"read", this.s3File.getPath(), true,
+() -> {
+  try {
+this.readOneBlock(buffer, offset, size);
+  } catch (EOFException e) {
+// the base implementation swallows EOFs.
+return -1;
+  } catch (SocketTimeoutException e) {
+this.s3File.getStatistics().readException();
+throw e;
+  } catch (IOException e) {
+this.s3File.getStatistics().readException();
+throw e;
+  }
+  return 0;
+});
+
+int numBytesRead = buffer.position();
+buffer.limit(numBytesRead);
+this.s3File.getStatistics().readOperationCompleted(size, numBytesRead);
+return numBytesRead;
+  }
+
+  private static final int READ_BUFFER_SIZE = 64 * 1024;

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apach

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=703676&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703676
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 05/Jan/22 01:17
Start Date: 05/Jan/22 01:17
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r778487448



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/read/S3InputStream.java
##
@@ -0,0 +1,459 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.read;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.fs.CanSetReadahead;
+import org.apache.hadoop.fs.FSExceptionMessages;
+import org.apache.hadoop.fs.StreamCapabilities;
+import org.apache.hadoop.fs.common.BlockData;
+import org.apache.hadoop.fs.common.FilePosition;
+import org.apache.hadoop.fs.common.Validate;
+import org.apache.hadoop.fs.s3a.S3AInputPolicy;
+import org.apache.hadoop.fs.s3a.S3AInputStream;
+import org.apache.hadoop.fs.s3a.S3AReadOpContext;
+import org.apache.hadoop.fs.s3a.S3ObjectAttributes;
+import org.apache.hadoop.fs.s3a.impl.ChangeTracker;
+import org.apache.hadoop.fs.s3a.statistics.S3AInputStreamStatistics;
+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.fs.statistics.IOStatisticsSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+
+/**
+ * Provides an {@link InputStream} that allows reading from an S3 file.
+ */
+public abstract class S3InputStream
+extends InputStream
+implements CanSetReadahead, StreamCapabilities, IOStatisticsSource {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(S3InputStream.class);
+
+  // The S3 file read by this instance.
+  private S3File s3File;
+
+  // Reading of S3 file takes place through this reader.
+  private S3Reader reader;
+
+  // Name of this stream. Used only for logging.
+  private final String name;
+
+  // Indicates whether the stream has been closed.
+  private volatile boolean closed;
+
+  // Current position within the file.
+  private FilePosition fpos;
+
+  // The target of the most recent seek operation.
+  private long seekTargetPos;
+
+  // Information about each block of the mapped S3 file.
+  private BlockData blockData;
+
+  // Read-specific operation context.
+  private S3AReadOpContext context;
+
+  // Attributes of the S3 object being read.
+  private S3ObjectAttributes s3Attributes;
+
+  // Callbacks used for interacting with the underlying S3 client.
+  private S3AInputStream.InputStreamCallbacks client;
+
+  // Used for reporting input stream access statistics.
+  private final S3AInputStreamStatistics streamStatistics;
+
+  private S3AInputPolicy inputPolicy;
+  private final ChangeTracker changeTracker;
+  private final IOStatistics ioStatistics;
+
+  /**
+   * Initializes a new instance of the {@code S3InputStream} class.
+   *
+   * @param context read-specific operation context.
+   * @param s3Attributes attributes of the S3 object being read.
+   * @param client callbacks used for interacting with the underlying S3 
client.
+   *
+   * @throws IllegalArgumentException if context is null.
+   * @throws IllegalArgumentException if s3Attributes is null.
+   * @throws IllegalArgumentException if client is null.
+   */
+  public S3InputStream(
+  S3AReadOpContext context,
+  S3ObjectAttributes s3Attributes,
+  S3AInputStream.InputStreamCallbacks client) {
+
+Validate.checkNotNull(context, "context");
+Validate.checkNotNull(s3Attributes, "s3Attributes");
+Validate.checkNotNull(client, "client");
+
+this.context = context;
+this.s3Attributes = s3Attributes;
+this.client = client;
+this.streamStatistics = 
context.getS3AStatisticsContext().newInputStreamStatistics

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=703675&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703675
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 05/Jan/22 01:15
Start Date: 05/Jan/22 01:15
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r778486837



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/read/S3InputStream.java
##
@@ -0,0 +1,459 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.read;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.fs.CanSetReadahead;
+import org.apache.hadoop.fs.FSExceptionMessages;
+import org.apache.hadoop.fs.StreamCapabilities;
+import org.apache.hadoop.fs.common.BlockData;
+import org.apache.hadoop.fs.common.FilePosition;
+import org.apache.hadoop.fs.common.Validate;
+import org.apache.hadoop.fs.s3a.S3AInputPolicy;
+import org.apache.hadoop.fs.s3a.S3AInputStream;
+import org.apache.hadoop.fs.s3a.S3AReadOpContext;
+import org.apache.hadoop.fs.s3a.S3ObjectAttributes;
+import org.apache.hadoop.fs.s3a.impl.ChangeTracker;
+import org.apache.hadoop.fs.s3a.statistics.S3AInputStreamStatistics;
+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.fs.statistics.IOStatisticsSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+
+/**
+ * Provides an {@link InputStream} that allows reading from an S3 file.
+ */
+public abstract class S3InputStream
+extends InputStream
+implements CanSetReadahead, StreamCapabilities, IOStatisticsSource {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(S3InputStream.class);
+
+  // The S3 file read by this instance.
+  private S3File s3File;
+
+  // Reading of S3 file takes place through this reader.
+  private S3Reader reader;
+
+  // Name of this stream. Used only for logging.
+  private final String name;
+
+  // Indicates whether the stream has been closed.
+  private volatile boolean closed;
+
+  // Current position within the file.
+  private FilePosition fpos;
+
+  // The target of the most recent seek operation.
+  private long seekTargetPos;
+
+  // Information about each block of the mapped S3 file.
+  private BlockData blockData;
+
+  // Read-specific operation context.
+  private S3AReadOpContext context;
+
+  // Attributes of the S3 object being read.
+  private S3ObjectAttributes s3Attributes;
+
+  // Callbacks used for interacting with the underlying S3 client.
+  private S3AInputStream.InputStreamCallbacks client;
+
+  // Used for reporting input stream access statistics.
+  private final S3AInputStreamStatistics streamStatistics;
+
+  private S3AInputPolicy inputPolicy;
+  private final ChangeTracker changeTracker;
+  private final IOStatistics ioStatistics;
+
+  /**
+   * Initializes a new instance of the {@code S3InputStream} class.
+   *
+   * @param context read-specific operation context.
+   * @param s3Attributes attributes of the S3 object being read.
+   * @param client callbacks used for interacting with the underlying S3 
client.
+   *
+   * @throws IllegalArgumentException if context is null.
+   * @throws IllegalArgumentException if s3Attributes is null.
+   * @throws IllegalArgumentException if client is null.
+   */
+  public S3InputStream(
+  S3AReadOpContext context,
+  S3ObjectAttributes s3Attributes,
+  S3AInputStream.InputStreamCallbacks client) {
+
+Validate.checkNotNull(context, "context");
+Validate.checkNotNull(s3Attributes, "s3Attributes");
+Validate.checkNotNull(client, "client");
+
+this.context = context;
+this.s3Attributes = s3Attributes;
+this.client = client;
+this.streamStatistics = 
context.getS3AStatisticsContext().newInputStreamStatistics

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=703673&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703673
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 05/Jan/22 01:13
Start Date: 05/Jan/22 01:13
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r778486303



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/read/S3InputStream.java
##
@@ -0,0 +1,459 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.read;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.fs.CanSetReadahead;
+import org.apache.hadoop.fs.FSExceptionMessages;
+import org.apache.hadoop.fs.StreamCapabilities;
+import org.apache.hadoop.fs.common.BlockData;
+import org.apache.hadoop.fs.common.FilePosition;
+import org.apache.hadoop.fs.common.Validate;
+import org.apache.hadoop.fs.s3a.S3AInputPolicy;
+import org.apache.hadoop.fs.s3a.S3AInputStream;
+import org.apache.hadoop.fs.s3a.S3AReadOpContext;
+import org.apache.hadoop.fs.s3a.S3ObjectAttributes;
+import org.apache.hadoop.fs.s3a.impl.ChangeTracker;
+import org.apache.hadoop.fs.s3a.statistics.S3AInputStreamStatistics;
+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.fs.statistics.IOStatisticsSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+
+/**
+ * Provides an {@link InputStream} that allows reading from an S3 file.
+ */
+public abstract class S3InputStream
+extends InputStream
+implements CanSetReadahead, StreamCapabilities, IOStatisticsSource {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(S3InputStream.class);
+
+  // The S3 file read by this instance.
+  private S3File s3File;
+
+  // Reading of S3 file takes place through this reader.
+  private S3Reader reader;
+
+  // Name of this stream. Used only for logging.
+  private final String name;
+
+  // Indicates whether the stream has been closed.
+  private volatile boolean closed;
+
+  // Current position within the file.
+  private FilePosition fpos;
+
+  // The target of the most recent seek operation.
+  private long seekTargetPos;
+
+  // Information about each block of the mapped S3 file.
+  private BlockData blockData;
+
+  // Read-specific operation context.
+  private S3AReadOpContext context;
+
+  // Attributes of the S3 object being read.
+  private S3ObjectAttributes s3Attributes;
+
+  // Callbacks used for interacting with the underlying S3 client.
+  private S3AInputStream.InputStreamCallbacks client;
+
+  // Used for reporting input stream access statistics.
+  private final S3AInputStreamStatistics streamStatistics;
+
+  private S3AInputPolicy inputPolicy;
+  private final ChangeTracker changeTracker;
+  private final IOStatistics ioStatistics;
+
+  /**
+   * Initializes a new instance of the {@code S3InputStream} class.
+   *
+   * @param context read-specific operation context.
+   * @param s3Attributes attributes of the S3 object being read.
+   * @param client callbacks used for interacting with the underlying S3 
client.
+   *
+   * @throws IllegalArgumentException if context is null.
+   * @throws IllegalArgumentException if s3Attributes is null.
+   * @throws IllegalArgumentException if client is null.
+   */
+  public S3InputStream(
+  S3AReadOpContext context,
+  S3ObjectAttributes s3Attributes,
+  S3AInputStream.InputStreamCallbacks client) {
+
+Validate.checkNotNull(context, "context");
+Validate.checkNotNull(s3Attributes, "s3Attributes");
+Validate.checkNotNull(client, "client");
+
+this.context = context;
+this.s3Attributes = s3Attributes;
+this.client = client;
+this.streamStatistics = 
context.getS3AStatisticsContext().newInputStreamStatistics

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=703671&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703671
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 05/Jan/22 01:12
Start Date: 05/Jan/22 01:12
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r778485852



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/read/S3CachingInputStream.java
##
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.read;
+
+import com.twitter.util.FuturePool;
+import org.apache.hadoop.fs.common.BlockData;
+import org.apache.hadoop.fs.common.BlockManager;
+import org.apache.hadoop.fs.common.BufferData;
+import org.apache.hadoop.fs.s3a.S3AInputStream;
+import org.apache.hadoop.fs.s3a.S3AReadOpContext;
+import org.apache.hadoop.fs.s3a.S3ObjectAttributes;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+
+/**
+ * Provides an {@code InputStream} that allows reading from an S3 file.
+ * Prefetched blocks are cached to local disk if a seek away from the
+ * current block is issued.
+ */
+public class S3CachingInputStream extends S3InputStream {
+  private static final Logger LOG = 
LoggerFactory.getLogger(S3CachingInputStream.class);
+
+  // Number of blocks queued for prefching.
+  private int numBlocksToPrefetch;
+
+  private BlockManager blockManager;
+
+  /**
+   * Initializes a new instance of the {@code S3CachingInputStream} class.
+   *
+   * @param context read-specific operation context.
+   * @param s3Attributes attributes of the S3 object being read.
+   * @param client callbacks used for interacting with the underlying S3 
client.
+   *
+   * @throws IllegalArgumentException if context is null.
+   * @throws IllegalArgumentException if s3Attributes is null.
+   * @throws IllegalArgumentException if client is null.
+   */
+  public S3CachingInputStream(
+  S3AReadOpContext context,
+  S3ObjectAttributes s3Attributes,
+  S3AInputStream.InputStreamCallbacks client) {
+super(context, s3Attributes, client);
+
+this.numBlocksToPrefetch = this.getContext().getPrefetchBlockCount();
+int bufferPoolSize = this.numBlocksToPrefetch + 1;
+this.blockManager = this.createBlockManager(
+this.getContext().getFuturePool(),
+this.getReader(),
+this.getBlockData(),
+bufferPoolSize);
+int fileSize = (int) s3Attributes.getLen();
+LOG.debug("Created caching input stream for {} (size = {})", 
this.getName(), fileSize);
+  }
+
+  /**
+   * Moves the current read position so that the next read will occur at 
{@code pos}.
+   *
+   * @param pos the next read will take place at this position.
+   *
+   * @throws IllegalArgumentException if pos is outside of the range [0, file 
size].
+   */
+  @Override
+  public void seek(long pos) throws IOException {
+this.throwIfClosed();
+this.throwIfInvalidSeek(pos);
+
+if (!this.getFilePosition().setAbsolute(pos)) {

Review comment:
   done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703671)
Time Spent: 7h 50m  (was: 7h 40m)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 50m
> 

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=703656&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703656
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 05/Jan/22 00:41
Start Date: 05/Jan/22 00:41
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-1005283237


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 49s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 24 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 25s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 48s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 32s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 48s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 11s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 23s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 59s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 22s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/19/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 6 new + 7 unchanged - 3 fixed 
= 13 total (was 10)  |
   | +1 :green_heart: |  mvnsite  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m  6s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 34s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 34s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  88m 21s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/19/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux fa6ea20f14fe 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4392e97bbb5df2dfaff500fc1b44edefd62dc421 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=703650&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703650
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 05/Jan/22 00:36
Start Date: 05/Jan/22 00:36
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r778475249



##
File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/read/S3FileTest.java
##
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.read;
+
+import static org.junit.Assert.*;
+
+import com.twitter.util.ExecutorServiceFuturePool;
+import com.twitter.util.FuturePool;
+import org.apache.hadoop.fs.common.ExceptionAsserts;
+import org.apache.hadoop.fs.s3a.S3AInputStream;
+import org.apache.hadoop.fs.s3a.S3AReadOpContext;
+import org.apache.hadoop.fs.s3a.S3ObjectAttributes;
+import org.apache.hadoop.fs.s3a.impl.ChangeTracker;
+import org.apache.hadoop.fs.s3a.statistics.S3AInputStreamStatistics;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+
+public class S3FileTest {
+  private final ExecutorService threadPool = Executors.newFixedThreadPool(1);
+  private final FuturePool futurePool = new 
ExecutorServiceFuturePool(threadPool);
+  private final S3AInputStream.InputStreamCallbacks client = 
TestS3File.createClient("bucket");
+
+  @Test
+  public void testArgChecks() {
+S3AReadOpContext readContext = Fakes.createReadContext(futurePool, "key", 
10, 10, 1);
+S3ObjectAttributes attrs = Fakes.createObjectAttributes("bucket", "key", 
10);
+S3AInputStreamStatistics stats =
+readContext.getS3AStatisticsContext().newInputStreamStatistics();
+ChangeTracker changeTracker = Fakes.createChangeTracker("bucket", "key", 
10);
+
+// Should not throw.
+new S3File(readContext, attrs, client, stats, changeTracker);
+
+ExceptionAsserts.assertThrows(
+IllegalArgumentException.class,
+"'context' must not be null",
+() -> new S3File(null, attrs, client, stats, changeTracker));
+
+ExceptionAsserts.assertThrows(
+IllegalArgumentException.class,
+"'s3Attributes' must not be null",
+() -> new S3File(readContext, null, client, stats, changeTracker));
+
+ExceptionAsserts.assertThrows(
+IllegalArgumentException.class,
+"'client' must not be null",
+() -> new S3File(readContext, attrs, null, stats, changeTracker));
+
+ExceptionAsserts.assertThrows(
+IllegalArgumentException.class,
+"'streamStatistics' must not be null",
+() -> new S3File(readContext, attrs, client, null, changeTracker));
+
+ExceptionAsserts.assertThrows(
+IllegalArgumentException.class,
+"'changeTracker' must not be null",
+() -> new S3File(readContext, attrs, client, stats, null));
+  }
+
+  @Test
+  public void testProperties() throws IOException {

Review comment:
   yes. removed.

##
File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/read/S3FileTest.java
##
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the L

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=703651&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703651
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 05/Jan/22 00:36
Start Date: 05/Jan/22 00:36
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r778475353



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/read/S3File.java
##
@@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.read;
+
+import com.amazonaws.services.s3.model.GetObjectRequest;
+import com.amazonaws.services.s3.model.S3Object;
+import org.apache.hadoop.fs.common.Io;
+import org.apache.hadoop.fs.common.Validate;
+import org.apache.hadoop.fs.s3a.Invoker;
+import org.apache.hadoop.fs.s3a.S3AInputStream;
+import org.apache.hadoop.fs.s3a.S3AReadOpContext;
+import org.apache.hadoop.fs.s3a.S3ObjectAttributes;
+import org.apache.hadoop.fs.s3a.impl.ChangeTracker;
+import org.apache.hadoop.fs.s3a.statistics.S3AInputStreamStatistics;
+import org.apache.hadoop.fs.statistics.DurationTracker;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.IdentityHashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Ecapsulates low level interactions with S3 object on AWS.

Review comment:
   good catch. fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703651)
Time Spent: 7.5h  (was: 7h 20m)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=703633&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703633
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 04/Jan/22 23:37
Start Date: 04/Jan/22 23:37
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r778456947



##
File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestObjectChangeDetectionAttributes.java
##
@@ -234,6 +234,7 @@ private void assertContent(String file, Path path, byte[] 
content,
 s3Object.setObjectMetadata(metadata);
 s3Object.setObjectContent(new ByteArrayInputStream(content));
 when(s3.getObject(argThat(correctGetObjectRequest(file, eTag, versionId
+// when(s3.getObject(any()))

Review comment:
   oops. sorry all changes in this file were for local testing and left 
there accidentally. reverted all of them.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703633)
Time Spent: 7h 10m  (was: 7h)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2022-01-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=703205&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703205
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 04/Jan/22 06:23
Start Date: 04/Jan/22 06:23
Worklog Time Spent: 10m 
  Work Description: mehakmeet commented on a change in pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r772337743



##
File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/read/S3FileTest.java
##
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.read;
+
+import static org.junit.Assert.*;

Review comment:
   not used. can be remove

##
File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestObjectChangeDetectionAttributes.java
##
@@ -234,6 +234,7 @@ private void assertContent(String file, Path path, byte[] 
content,
 s3Object.setObjectMetadata(metadata);
 s3Object.setObjectContent(new ByteArrayInputStream(content));
 when(s3.getObject(argThat(correctGetObjectRequest(file, eTag, versionId
+// when(s3.getObject(any()))

Review comment:
   nit: seems accidental?

##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/read/S3InputStream.java
##
@@ -0,0 +1,459 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.read;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.fs.CanSetReadahead;
+import org.apache.hadoop.fs.FSExceptionMessages;
+import org.apache.hadoop.fs.StreamCapabilities;
+import org.apache.hadoop.fs.common.BlockData;
+import org.apache.hadoop.fs.common.FilePosition;
+import org.apache.hadoop.fs.common.Validate;
+import org.apache.hadoop.fs.s3a.S3AInputPolicy;
+import org.apache.hadoop.fs.s3a.S3AInputStream;
+import org.apache.hadoop.fs.s3a.S3AReadOpContext;
+import org.apache.hadoop.fs.s3a.S3ObjectAttributes;
+import org.apache.hadoop.fs.s3a.impl.ChangeTracker;
+import org.apache.hadoop.fs.s3a.statistics.S3AInputStreamStatistics;
+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.fs.statistics.IOStatisticsSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+
+/**
+ * Provides an {@link InputStream} that allows reading from an S3 file.
+ */
+public abstract class S3InputStream
+extends InputStream
+implements CanSetReadahead, StreamCapabilities, IOStatisticsSource {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(S3InputStream.class);
+
+  // The S3 file read by this instance.
+  private S3File s3File;
+
+  // Reading of S3 file takes place through this reader.
+  private S3Reader reader;
+
+  // Name of this stream. Used only for logging.
+  private final String name;
+
+  // Indicates whether the stream has been closed.
+  private volatile boolean closed;
+
+  // Current position within the file.
+  private FilePosition fpos;
+
+  // The target of the most recent seek operation.
+  private long seekTargetPos;
+
+  // Information 

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=697936&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-697936
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 17/Dec/21 14:54
Start Date: 17/Dec/21 14:54
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-990284225


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 34s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 24 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  35m 40s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 50s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 33s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 51s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 20s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 50s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 42s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 22s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/14/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 6 new + 7 unchanged - 3 fixed 
= 13 total (was 10)  |
   | +1 :green_heart: |  mvnsite  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 19s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | -1 :x: |  javadoc  |   0m 28s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/14/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  
hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 
with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 4 new + 
62 unchanged - 0 fixed = 66 total (was 62)  |
   | +1 :green_heart: |  spotbugs  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 42s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 51s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 35s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  98m 43s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/14/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux c0d4b2c92ca8 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Person

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=697935&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-697935
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 17/Dec/21 14:54
Start Date: 17/Dec/21 14:54
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-989428320


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 41s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 22 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 19s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 48s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 31s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 48s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 13s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 15s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 21s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/12/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 3 new + 4 unchanged - 3 fixed 
= 7 total (was 7)  |
   | +1 :green_heart: |  mvnsite  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 11s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   2m 42s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/12/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 33s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  87m 48s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.fs.s3a.s3guard.TestObjectChangeDetectionAttributes |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/12/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux 0f094e9598ec 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / ea7e19c627e282d84aa2f8bd6ff1d24a83cc8330 |
   | Default Java 

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=697934&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-697934
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 17/Dec/21 14:53
Start Date: 17/Dec/21 14:53
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-985957944


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 22 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  35m 17s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 46s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 32s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 47s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 13s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 15s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 21s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/11/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 3 new + 4 unchanged - 3 fixed 
= 7 total (was 7)  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m  1s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   2m 45s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/11/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 36s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  90m 17s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.fs.s3a.s3guard.TestObjectChangeDetectionAttributes |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/11/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux bc50b40143fe 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 3ce0f6b760eaa586bb5e7ddaadeb89b294ccf212 |
   | Default Java 

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=695306&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695306
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 13/Dec/21 19:42
Start Date: 13/Dec/21 19:42
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-992810929


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 58s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 24 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 35s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 48s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 32s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 48s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 13s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 36s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 46s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 21s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/18/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 6 new + 7 unchanged - 3 fixed 
= 13 total (was 10)  |
   | +1 :green_heart: |  mvnsite  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  19m 59s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 43s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 34s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  89m 30s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/18/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux 9c73a636ffa1 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 0c361c5361f04ea60224971b370499b3195ee3a9 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=695221&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695221
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 13/Dec/21 17:21
Start Date: 13/Dec/21 17:21
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-992699045


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 42s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 24 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  34m 14s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 47s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 43s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 26s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 46s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 29s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 47s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 21s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/17/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 8 new + 7 unchanged - 3 fixed 
= 15 total (was 10)  |
   | +1 :green_heart: |  mvnsite  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | -1 :x: |  javadoc  |   0m 28s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/17/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  
hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 
with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 4 new + 
62 unchanged - 0 fixed = 66 total (was 62)  |
   | +1 :green_heart: |  spotbugs  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 17s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 41s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 30s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  90m 57s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/17/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux 927911b2db67 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | 

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=693678&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-693678
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 10/Dec/21 02:22
Start Date: 10/Dec/21 02:22
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-990543613


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 52s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 24 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  36m 22s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 45s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 27s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 44s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 41s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 44s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 44s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 19s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/15/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 8 new + 7 unchanged - 3 fixed 
= 15 total (was 10)  |
   | +1 :green_heart: |  mvnsite  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 16s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | -1 :x: |  javadoc  |   0m 24s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/15/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  
hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 
with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 4 new + 
62 unchanged - 0 fixed = 66 total (was 62)  |
   | +1 :green_heart: |  spotbugs  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 45s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 49s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 32s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  98m 19s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/15/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux f0b99c6f772a 4.15.0-162-generic #170-Ubuntu SMP Mon Oct 18 
11:38:05 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality |

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=693676&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-693676
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 10/Dec/21 02:13
Start Date: 10/Dec/21 02:13
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-990537861


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 50s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 24 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 16s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 47s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 32s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 47s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 23s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 47s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 21s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/16/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 8 new + 7 unchanged - 3 fixed 
= 15 total (was 10)  |
   | +1 :green_heart: |  mvnsite  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | -1 :x: |  javadoc  |   0m 27s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/16/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  
hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 
with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 4 new + 
62 unchanged - 0 fixed = 66 total (was 62)  |
   | +1 :green_heart: |  spotbugs  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m  1s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 34s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 34s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  87m 54s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/16/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux bba996d83221 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | de

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=693565&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-693565
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 09/Dec/21 21:31
Start Date: 09/Dec/21 21:31
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-990284225


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 34s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 24 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  35m 40s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 50s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 33s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 51s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 20s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 50s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 42s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 22s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/14/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 6 new + 7 unchanged - 3 fixed 
= 13 total (was 10)  |
   | +1 :green_heart: |  mvnsite  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 19s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | -1 :x: |  javadoc  |   0m 28s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/14/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  
hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 
with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 4 new + 
62 unchanged - 0 fixed = 66 total (was 62)  |
   | +1 :green_heart: |  spotbugs  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 42s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 51s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 35s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  98m 43s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/14/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux c0d4b2c92ca8 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | 

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=692945&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-692945
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 09/Dec/21 02:09
Start Date: 09/Dec/21 02:09
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-989436895


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 47s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 23 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 28s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 48s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 48s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 12s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 23s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/13/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 6 new + 5 unchanged - 3 fixed 
= 11 total (was 8)  |
   | +1 :green_heart: |  mvnsite  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 16s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | -1 :x: |  javadoc  |   0m 26s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/13/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  
hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 
with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 3 new + 
62 unchanged - 0 fixed = 65 total (was 62)  |
   | +1 :green_heart: |  spotbugs  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 14s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 37s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 33s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  87m 48s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/13/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux 5b3c26fd9634 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=692941&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-692941
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 09/Dec/21 01:54
Start Date: 09/Dec/21 01:54
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-989428320


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 41s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 22 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 19s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 48s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 31s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 48s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 13s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 15s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 21s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/12/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 3 new + 4 unchanged - 3 fixed 
= 7 total (was 7)  |
   | +1 :green_heart: |  mvnsite  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 11s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   2m 42s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/12/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 33s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  87m 48s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.fs.s3a.s3guard.TestObjectChangeDetectionAttributes |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/12/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux 0f094e9598ec 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / ea7e19c627e282d84aa2f8bd6ff1d24a83cc8330 |
   | Default Java | Privat

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=690435&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690435
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 04/Dec/21 03:25
Start Date: 04/Dec/21 03:25
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-985957944


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 22 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  35m 17s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 46s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 32s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 47s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 13s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 15s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 21s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/11/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 3 new + 4 unchanged - 3 fixed 
= 7 total (was 7)  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m  1s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   2m 45s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/11/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 36s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  90m 17s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.fs.s3a.s3guard.TestObjectChangeDetectionAttributes |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/11/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux bc50b40143fe 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 3ce0f6b760eaa586bb5e7ddaadeb89b294ccf212 |
   | Default Java | Privat

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=690405&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690405
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 04/Dec/21 00:53
Start Date: 04/Dec/21 00:53
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r762361371



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
##
@@ -540,7 +540,7 @@ private Constants() {
 
   public static final String USER_AGENT_PREFIX = "fs.s3a.user.agent.prefix";
 
-  /** Whether or not to allow MetadataStore to be source of truth for a path 
prefix */
+  /** Whether or not to allow MetadataStore to be source of truth for a path 
prefix. */

Review comment:
   done. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 690405)
Time Spent: 5h  (was: 4h 50m)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=690404&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690404
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 04/Dec/21 00:51
Start Date: 04/Dec/21 00:51
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r762361014



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
##
@@ -3926,12 +3985,12 @@ public void copyFromLocalFile(boolean delSrc, boolean 
overwrite, Path src,
   }
 
   protected CopyFromLocalOperation.CopyFromLocalOperationCallbacks
-  createCopyFromLocalCallbacks() throws IOException {
+  createCopyFromLocalCallbacks() throws IOException {

Review comment:
   sorry, I did not understand the suggestion. can you please clarify?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 690404)
Time Spent: 4h 50m  (was: 4h 40m)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=690403&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690403
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 04/Dec/21 00:50
Start Date: 04/Dec/21 00:50
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r762360921



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/BlockData.java
##
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.common;
+
+/**
+ * Holds information about blocks of data in an S3 file.
+ */
+public class BlockData {
+  // State of each block of data.
+  enum State {
+// Data is not yet ready to be read.
+NOT_READY,
+
+// A read of this block has been queued.
+QUEUED,
+
+// This block is ready to be read.
+READY,
+
+// This block has been cached.
+CACHED
+  }
+
+  // State of all blocks in an S3 file.
+  private State[] state;
+
+  // The size of an S3 file.
+  private final long fileSize;
+
+  // The S3 file is divided into blocks of this size.
+  private final int blockSize;
+
+  // The S3 file has these many blocks.
+  private final int numBlocks;
+
+  /**
+   * Constructs an instance of {@link BlockData}.
+   *
+   * @param fileSize the size of an S3 file.
+   * @param blockSize the S3 file is divided into blocks of this size.
+   */
+  public BlockData(long fileSize, int blockSize) {
+Validate.checkNotNegative(fileSize, "fileSize");
+if (fileSize == 0) {
+  Validate.checkNotNegative(blockSize, "blockSize");
+} else {
+  Validate.checkPositiveInteger(blockSize, "blockSize");
+}
+
+this.fileSize = fileSize;
+this.blockSize = blockSize;
+this.numBlocks =
+(fileSize == 0) ? 0 : ((int) (fileSize / blockSize)) + (fileSize % 
blockSize > 0 ? 1 : 0);
+this.state = new State[this.numBlocks];
+for (int b = 0; b < this.numBlocks; b++) {
+  this.setState(b, State.NOT_READY);
+}
+  }
+
+  public int getBlockSize() {
+return this.blockSize;
+  }
+
+  public long getFileSize() {
+return this.fileSize;
+  }
+
+  public int getNumBlocks() {
+return this.numBlocks;
+  }
+
+  public boolean isLastBlock(int blockNumber) {
+if (this.fileSize == 0) {
+  return false;
+}
+
+throwIfInvalidBlockNumber(blockNumber);
+
+return blockNumber == (this.numBlocks - 1);
+  }
+
+  public int getBlockNumber(long offset) {
+throwIfInvalidOffset(offset);
+
+return (int) (offset / this.blockSize);
+  }
+
+  public int getSize(int blockNumber) {
+if (this.fileSize == 0) {
+  return 0;
+}
+
+if (this.isLastBlock(blockNumber)) {
+  return (int) (this.fileSize - (((long) this.blockSize) * (this.numBlocks 
- 1)));
+} else {
+  return this.blockSize;
+}
+  }
+
+  public boolean isValidOffset(long offset) {
+return (offset >= 0) && (offset < this.fileSize);
+  }
+
+  public long getStartOffset(int blockNumber) {
+throwIfInvalidBlockNumber(blockNumber);
+
+return blockNumber * (long) this.blockSize;
+  }
+
+  public int getRelativeOffset(int blockNumber, long offset) {
+throwIfInvalidOffset(offset);
+
+return (int) (offset - this.getStartOffset(blockNumber));
+  }
+
+  public State getState(int blockNumber) {
+throwIfInvalidBlockNumber(blockNumber);
+
+return this.state[blockNumber];
+  }
+
+  public State setState(int blockNumber, State blockState) {
+throwIfInvalidBlockNumber(blockNumber);
+
+this.state[blockNumber] = blockState;
+return blockState;
+  }
+
+  // Debug helper.
+  public String getStateString() {
+StringBuilder sb = new StringBuilder();
+int blockNumber = 0;
+while (blockNumber < this.numBlocks) {
+  State tstate = this.getState(blockNumber);
+  int endBlockNumber = blockNumber;
+  while ((endBlockNumber < this.numBlocks) && 
(this.getState(endBlockNumber) == tstate)) {
+endBlockNumber+

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=690398&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690398
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 04/Dec/21 00:20
Start Date: 04/Dec/21 00:20
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r762354915



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/Validate.java
##
@@ -0,0 +1,405 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.common;
+
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.Collection;
+
+/**
+ * A superset of Validate class in Apache commons lang3.
+ *
+ * It provides consistent message strings for frequently encountered checks.
+ * That simplifies callers because they have to supply only the name of the 
argument
+ * that failed a check instead of having to supply the entire message.
+ */
+public final class Validate {
+  private Validate() {}
+
+  /**
+   * Validates that the given reference argument is not null.
+   *
+   * @param obj the argument reference to validate.
+   * @param argName the name of the argument being validated.
+   */
+  public static void checkNotNull(Object obj, String argName) {
+checkArgument(obj != null, "'%s' must not be null.", argName);
+  }
+
+  /**
+   * Validates that the given integer argument is not zero or negative.
+   *
+   * @param value the argument value to validate
+   * @param argName the name of the argument being validated.
+   */
+  public static void checkPositiveInteger(long value, String argName) {
+checkArgument(value > 0, "'%s' must be a positive integer.", argName);
+  }
+
+  /**
+   * Validates that the given integer argument is not negative.
+   *
+   * @param value the argument value to validate
+   * @param argName the name of the argument being validated.
+   */
+  public static void checkNotNegative(long value, String argName) {
+checkArgument(value >= 0, "'%s' must not be negative.", argName);
+  }
+
+  /*
+   * Validates that the expression (that checks a required field is present) 
is true.
+   *
+   * @param isPresent indicates whether the given argument is present.
+   * @param argName the name of the argument being validated.
+   */
+  public static void checkRequired(boolean isPresent, String argName) {
+checkArgument(isPresent, "'%s' is required.", argName);
+  }
+
+  /**
+   * Validates that the expression (that checks a field is valid) is true.
+   *
+   * @param isValid indicates whether the given argument is valid.
+   * @param argName the name of the argument being validated.
+   */
+  public static void checkValid(boolean isValid, String argName) {
+checkArgument(isValid, "'%s' is invalid.", argName);
+  }
+
+  /**
+   * Validates that the expression (that checks a field is valid) is true.
+   *
+   * @param isValid indicates whether the given argument is valid.
+   * @param argName the name of the argument being validated.
+   * @param validValues the list of values that are allowed.
+   */
+  public static void checkValid(boolean isValid, String argName, String 
validValues) {
+checkArgument(isValid, "'%s' is invalid. Valid values are: %s.", argName, 
validValues);
+  }
+
+  /**
+   * Validates that the given string is not null and has non-zero length.
+   *
+   * @param arg the argument reference to validate.
+   * @param argName the name of the argument being validated.
+   */
+  public static void checkNotNullAndNotEmpty(String arg, String argName) {
+Validate.checkNotNull(arg, argName);
+Validate.checkArgument(
+arg.length() > 0,
+"'%s' must not be empty.",
+argName);
+  }
+
+  /**
+   * Validates that the given array is not null and has at least one element.
+   *
+   * @param  the type of array's elements.
+   * @param array the argument reference to validate.
+   * @param argName the name of the argument being validated.
+   */
+  p

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=690391&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690391
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 04/Dec/21 00:00
Start Date: 04/Dec/21 00:00
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r762350797



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/BlockData.java
##
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.common;
+
+/**
+ * Holds information about blocks of data in an S3 file.
+ */
+public class BlockData {
+  // State of each block of data.
+  enum State {
+// Data is not yet ready to be read.
+NOT_READY,
+
+// A read of this block has been queued.
+QUEUED,
+
+// This block is ready to be read.
+READY,
+
+// This block has been cached.
+CACHED
+  }
+
+  // State of all blocks in an S3 file.
+  private State[] state;
+
+  // The size of an S3 file.
+  private final long fileSize;
+
+  // The S3 file is divided into blocks of this size.
+  private final int blockSize;
+
+  // The S3 file has these many blocks.
+  private final int numBlocks;
+
+  /**
+   * Constructs an instance of {@link BlockData}.
+   *
+   * @param fileSize the size of an S3 file.
+   * @param blockSize the S3 file is divided into blocks of this size.
+   */
+  public BlockData(long fileSize, int blockSize) {
+Validate.checkNotNegative(fileSize, "fileSize");
+if (fileSize == 0) {
+  Validate.checkNotNegative(blockSize, "blockSize");
+} else {
+  Validate.checkPositiveInteger(blockSize, "blockSize");
+}
+
+this.fileSize = fileSize;
+this.blockSize = blockSize;
+this.numBlocks =
+(fileSize == 0) ? 0 : ((int) (fileSize / blockSize)) + (fileSize % 
blockSize > 0 ? 1 : 0);
+this.state = new State[this.numBlocks];
+for (int b = 0; b < this.numBlocks; b++) {
+  this.setState(b, State.NOT_READY);
+}
+  }
+
+  public int getBlockSize() {
+return this.blockSize;
+  }
+
+  public long getFileSize() {
+return this.fileSize;
+  }
+
+  public int getNumBlocks() {
+return this.numBlocks;
+  }
+
+  public boolean isLastBlock(int blockNumber) {
+if (this.fileSize == 0) {
+  return false;
+}
+
+throwIfInvalidBlockNumber(blockNumber);
+
+return blockNumber == (this.numBlocks - 1);
+  }
+
+  public int getBlockNumber(long offset) {
+throwIfInvalidOffset(offset);
+
+return (int) (offset / this.blockSize);
+  }
+
+  public int getSize(int blockNumber) {

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 690391)
Time Spent: 4h 20m  (was: 4h 10m)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at P

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=690284&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690284
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 03/Dec/21 19:31
Start Date: 03/Dec/21 19:31
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r762192697



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/BlockData.java
##
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.common;
+
+/**
+ * Holds information about blocks of data in an S3 file.

Review comment:
   thanks. removed this and all other refs to S3 under common/




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 690284)
Time Spent: 4h 10m  (was: 4h)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=690283&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690283
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 03/Dec/21 19:27
Start Date: 03/Dec/21 19:27
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r762190855



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/BlockCache.java
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.common;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+
+/**
+ * Provides functionality necessary for caching blocks of data read from 
FileSystem.
+ */
+public interface BlockCache extends Closeable {
+
+  /**
+   * Indicates whether the given block is in this cache.
+   *
+   * @param blockNumber the id of the given block.
+   * @return true if the given block is in this cache, false otherwise.
+   */
+  boolean containsBlock(Integer blockNumber);

Review comment:
   Good catch. It was an artifact of an earlier implementation choice. 
Changed to int.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 690283)
Time Spent: 4h  (was: 3h 50m)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=690170&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690170
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 03/Dec/21 16:41
Start Date: 03/Dec/21 16:41
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r761936264



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/BlockCache.java
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.common;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+
+/**
+ * Provides functionality necessary for caching blocks of data read from 
FileSystem.
+ */
+public interface BlockCache extends Closeable {
+
+  /**
+   * Indicates whether the given block is in this cache.
+   *
+   * @param blockNumber the id of the given block.
+   * @return true if the given block is in this cache, false otherwise.
+   */
+  boolean containsBlock(Integer blockNumber);

Review comment:
   why not int?

##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/BlockData.java
##
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.common;
+
+/**
+ * Holds information about blocks of data in an S3 file.
+ */
+public class BlockData {
+  // State of each block of data.
+  enum State {
+// Data is not yet ready to be read.
+NOT_READY,
+
+// A read of this block has been queued.
+QUEUED,
+
+// This block is ready to be read.
+READY,
+
+// This block has been cached.
+CACHED
+  }
+
+  // State of all blocks in an S3 file.
+  private State[] state;
+
+  // The size of an S3 file.
+  private final long fileSize;
+
+  // The S3 file is divided into blocks of this size.
+  private final int blockSize;
+
+  // The S3 file has these many blocks.
+  private final int numBlocks;
+
+  /**
+   * Constructs an instance of {@link BlockData}.
+   *
+   * @param fileSize the size of an S3 file.
+   * @param blockSize the S3 file is divided into blocks of this size.
+   */
+  public BlockData(long fileSize, int blockSize) {
+Validate.checkNotNegative(fileSize, "fileSize");
+if (fileSize == 0) {
+  Validate.checkNotNegative(blockSize, "blockSize");
+} else {
+  Validate.checkPositiveInteger(blockSize, "blockSize");
+}
+
+this.fileSize = fileSize;
+this.blockSize = blockSize;
+this.numBlocks =
+(fileSize == 0) ? 0 : ((int) (fileSize / blockSize)) + (fileSize % 
blockSize > 0 ? 1 : 0);
+this.state = new State[this.numBlocks];
+for (int b = 0; b < this.numBlocks; b++) {
+  this.setState(b, State.NOT_READY);
+}
+  }
+
+  public int getBlockSize() {
+return this.blockSize;
+  }
+
+  public long getFileSize() {
+return this.fileSize;
+  }
+
+  public int getNumBlocks() {
+return this.numBlocks;
+  }
+
+  public boolean isLastBlock(int blockNumber) {
+if (this.fileSize == 0) {
+  return false;
+}
+
+throwIfInvalidBlockNumber(blockNumber);
+
+return blockNumber == (this.numBlocks - 1);
+  }
+
+  public int getBlockNumber(lon

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=689968&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689968
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 03/Dec/21 12:22
Start Date: 03/Dec/21 12:22
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-984029933






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 689968)
Time Spent: 3h 40m  (was: 3.5h)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=689965&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689965
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 03/Dec/21 12:21
Start Date: 03/Dec/21 12:21
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-982982812






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 689965)
Time Spent: 3h 20m  (was: 3h 10m)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=689964&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689964
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 03/Dec/21 12:21
Start Date: 03/Dec/21 12:21
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-982313913


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 52s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 21 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  44m 55s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 56s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 43s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 34s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 52s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 33s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m 39s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 46s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | -1 :x: |  javac  |   0m 40s | 
[/results-compile-javac-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/3/artifact/out/results-compile-javac-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt)
 |  hadoop-tools_hadoop-aws-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 1 new + 64 unchanged - 1 fixed 
= 65 total (was 65)  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | -1 :x: |  javac  |   0m 33s | 
[/results-compile-javac-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/3/artifact/out/results-compile-javac-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  
hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 
with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 1 new + 
64 unchanged - 1 fixed = 65 total (was 65)  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 23s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/3/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 106 new + 5 unchanged - 0 
fixed = 111 total (was 5)  |
   | +1 :green_heart: |  mvnsite  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | -1 :x: |  javadoc  |   0m 26s | 
[/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/3/artifact/out/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  hadoop-aws in the patch failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.  |
   | -1 :x: |  spotbugs  |   1m 16s | 
[/new-spotbugs-hadoop-tools_hadoop-aws.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/3/artifact/out/new-spotbugs-hadoop-tools_hadoop-aws.html)
 |  hadoop-tools/hadoop-aws generated 2 new + 0 unchanged - 0 fixed = 2 tota

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=689967&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689967
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 03/Dec/21 12:21
Start Date: 03/Dec/21 12:21
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-983265939


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 23s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 22 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  42m 50s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 49s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 29s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 46s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 30s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 21s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/6/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 18 new + 3 unchanged - 4 fixed 
= 21 total (was 7)  |
   | +1 :green_heart: |  mvnsite  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 16s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | -1 :x: |  spotbugs  |   1m 17s | 
[/new-spotbugs-hadoop-tools_hadoop-aws.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/6/artifact/out/new-spotbugs-hadoop-tools_hadoop-aws.html)
 |  hadoop-tools/hadoop-aws generated 3 new + 0 unchanged - 0 fixed = 3 total 
(was 0)  |
   | +1 :green_heart: |  shadedclient  |  20m 13s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   2m 53s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/6/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 38s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  99m 23s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-tools/hadoop-aws |
   |  |  Result of integer multiplication cast to long in 
org.apache.hadoop.fs.common.BlockData.getSize(int)  At BlockData.java:to long 
in org.apache.hadoop.fs.common.BlockData.getSize(int)  At BlockData.java:[line 
111] |
   |  |  Inconsistent synchronization of 
org.apache.hadoop.fs.common.BufferData.buffer; locked 71% of time  
Unsynchronized access at BufferData.java:71% of time  Unsynchronized access at 
BufferData.java:[line 260] |
   |  |  Inconsistent synchronization of 
org.apache.hadoop.fs.s3a.S3AFileSystem.futurePoo

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=689963&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689963
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 03/Dec/21 12:20
Start Date: 03/Dec/21 12:20
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-982302964


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 43s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  1s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 21 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  50m  3s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 51s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 29s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 48s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 17s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 21s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 49s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 22s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/2/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 106 new + 5 unchanged - 0 
fixed = 111 total (was 5)  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | -1 :x: |  javadoc  |   0m 27s | 
[/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/2/artifact/out/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  hadoop-aws in the patch failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.  |
   | -1 :x: |  spotbugs  |   1m 18s | 
[/new-spotbugs-hadoop-tools_hadoop-aws.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/2/artifact/out/new-spotbugs-hadoop-tools_hadoop-aws.html)
 |  hadoop-tools/hadoop-aws generated 9 new + 0 unchanged - 0 fixed = 9 total 
(was 0)  |
   | +1 :green_heart: |  shadedclient  |  20m 27s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   2m 45s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/2/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch passed.  |
   | -1 :x: |  asflicense  |   0m 33s | 
[/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/2/artifact/out/results-asflicense.txt)
 |  The patch generated 1 ASF License warnings.  |
   |  |   | 106m 50s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-tools/hadoop-aws |
   |  |  Format string should use %n rather than n in 
org.apache.hadoop.fs.common.BlockData.getStateString()  At 
BlockData.java:rather

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=689961&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689961
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 03/Dec/21 12:18
Start Date: 03/Dec/21 12:18
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-981921238


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  7s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 21 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 22s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 50s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 32s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 47s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 31s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 13s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 30s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 46s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | -1 :x: |  javac  |   0m 40s | 
[/results-compile-javac-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/1/artifact/out/results-compile-javac-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt)
 |  hadoop-tools_hadoop-aws-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 1 new + 64 unchanged - 1 fixed 
= 65 total (was 65)  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | -1 :x: |  javac  |   0m 33s | 
[/results-compile-javac-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/1/artifact/out/results-compile-javac-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  
hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 
with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 1 new + 
64 unchanged - 1 fixed = 65 total (was 65)  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 23s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/1/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 183 new + 5 unchanged - 0 
fixed = 188 total (was 5)  |
   | +1 :green_heart: |  mvnsite  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 20s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | -1 :x: |  javadoc  |   0m 27s | 
[/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/1/artifact/out/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  hadoop-aws in the patch failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.  |
   | -1 :x: |  spotbugs  |   1m 15s | 
[/new-spotbugs-hadoop-tools_hadoop-aws.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/1/artifact/out/new-spotbugs-hadoop-tools_hadoop-aws.html)
 |  hadoop-tools/hadoop-aws generated 8 new + 0 unchanged - 0 fixed = 8 tota

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=689658&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689658
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 02/Dec/21 23:21
Start Date: 02/Dec/21 23:21
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-985081348


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  2s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 22 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 17s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 49s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 32s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 48s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 14s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 26s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 22s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/10/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 3 new + 3 unchanged - 4 fixed 
= 6 total (was 7)  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 19s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   2m 50s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/10/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 31s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  88m 34s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.fs.s3a.s3guard.TestObjectChangeDetectionAttributes |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/10/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux b1da0e6ffd6a 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 29a482ce35fb191e9414aaea58a41482c86bed24 |
   | Default Java | Privat

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=689599&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689599
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 02/Dec/21 20:41
Start Date: 02/Dec/21 20:41
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-984985003


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 52s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 22 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 21s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 48s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 32s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 47s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 11s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 22s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 47s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 22s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/9/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 3 new + 3 unchanged - 4 fixed 
= 6 total (was 7)  |
   | +1 :green_heart: |  mvnsite  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m  2s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   2m 44s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/9/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 35s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  87m 56s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.fs.s3a.s3guard.TestObjectChangeDetectionAttributes |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/9/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux 89b1043f53c2 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / da5b7128087ad1df244a224bbbf495f98f4b46fd |
   | Default Java | Private B

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=689553&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689553
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 02/Dec/21 19:08
Start Date: 02/Dec/21 19:08
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r761389560



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/BoundedResourcePool.java
##
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.common;
+
+import java.util.Collections;
+import java.util.IdentityHashMap;
+import java.util.Set;
+import java.util.concurrent.ArrayBlockingQueue;
+
+/**
+ * Manages a fixed pool of resources.
+ *
+ * Avoids creating a new resource if a previously created instance is already 
available.
+ */
+public abstract class BoundedResourcePool extends ResourcePool {
+  // The size of this pool. Fixed at creation time.
+  private final int size;
+
+  // Items currently available in the pool.
+  private ArrayBlockingQueue items;
+
+  // Items that have been created so far (regardless of whether they are 
currently available).
+  private Set createdItems;
+
+  /**
+   * Constructs a resource pool of the given size.
+   *
+   * @param size the size of this pool. Cannot be changed post creation.
+   */
+  public BoundedResourcePool(int size) {
+Validate.checkPositiveInteger(size, "size");
+
+this.size = size;
+this.items = new ArrayBlockingQueue(size);
+
+// The created items are identified based on their object reference.
+this.createdItems = Collections.newSetFromMap(new IdentityHashMap());
+  }
+
+  /**
+   * Acquires a resource blocking if necessary until one becomes available.
+   */
+  @Override
+  public T acquire() {
+return this.acquireHelper(true);
+  }
+
+  /**
+   * Acquires a resource blocking if one is immediately available. Otherwise 
returns null.
+   */
+  @Override
+  public T tryAcquire() {
+return this.acquireHelper(false);
+  }
+
+  /**
+   * Releases a previously acquired resource.
+   */
+  @Override
+  public void release(T item) {
+Validate.checkNotNull(item, "item");
+
+synchronized (this.createdItems) {
+  if (!this.createdItems.contains(item)) {
+throw new IllegalArgumentException("This item is not a part of this 
pool");
+  }
+}
+
+// Return if this item was released earlier.
+// We cannot use this.items.contains() because that check is not based on 
reference equality.
+for (T entry : this.items) {
+  if (entry == item) {
+return;
+  }
+}
+
+while (true) {
+  try {
+this.items.put(item);
+return;
+  } catch (InterruptedException e) {
+throw new IllegalStateException("release() should never block");
+  }
+}
+  }
+
+  @Override
+  public synchronized void close() {
+for (T item : this.createdItems) {
+  this.close(item);
+}
+
+this.items.clear();
+this.items = null;
+
+this.createdItems.clear();
+this.createdItems = null;
+  }
+
+  /**
+   * Derived classes may implement a way to cleanup each item.
+   */
+  @Override
+  protected synchronized void close(T item) {
+// Do nothing in this class. Allow overriding classes to take any cleanup 
action.
+  }
+
+  // Number of items created so far. Mostly for testing purposes.
+  public int numCreated() {
+synchronized (this.createdItems) {
+  return this.createdItems.size();
+}
+  }
+
+  // Number of items available to be acquired. Mostly for testing purposes.
+  public synchronized int numAvailable() {
+return (this.size - this.numCreated()) + this.items.size();
+  }
+
+  // For debugging purposes.
+  @Override
+  public synchronized String toString() {
+return String.format(
+"size = %d, #created = %d, #in-queue = %d, #available = %d",
+this.size, this.numCreated(), this.items.size(), this.numAvailable());
+  }
+
+  /

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=689538&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689538
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 02/Dec/21 18:37
Start Date: 02/Dec/21 18:37
Worklog Time Spent: 10m 
  Work Description: bhalchandrap commented on a change in pull request 
#3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r761368184



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/BoundedResourcePool.java
##
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.common;
+
+import java.util.Collections;
+import java.util.IdentityHashMap;
+import java.util.Set;
+import java.util.concurrent.ArrayBlockingQueue;
+
+/**
+ * Manages a fixed pool of resources.
+ *
+ * Avoids creating a new resource if a previously created instance is already 
available.
+ */
+public abstract class BoundedResourcePool extends ResourcePool {
+  // The size of this pool. Fixed at creation time.
+  private final int size;
+
+  // Items currently available in the pool.
+  private ArrayBlockingQueue items;
+
+  // Items that have been created so far (regardless of whether they are 
currently available).
+  private Set createdItems;
+
+  /**
+   * Constructs a resource pool of the given size.
+   *
+   * @param size the size of this pool. Cannot be changed post creation.
+   */
+  public BoundedResourcePool(int size) {
+Validate.checkPositiveInteger(size, "size");
+
+this.size = size;
+this.items = new ArrayBlockingQueue(size);
+
+// The created items are identified based on their object reference.
+this.createdItems = Collections.newSetFromMap(new IdentityHashMap());
+  }
+
+  /**
+   * Acquires a resource blocking if necessary until one becomes available.
+   */
+  @Override
+  public T acquire() {
+return this.acquireHelper(true);
+  }
+
+  /**
+   * Acquires a resource blocking if one is immediately available. Otherwise 
returns null.
+   */
+  @Override
+  public T tryAcquire() {
+return this.acquireHelper(false);
+  }
+
+  /**
+   * Releases a previously acquired resource.
+   */
+  @Override
+  public void release(T item) {
+Validate.checkNotNull(item, "item");
+
+synchronized (this.createdItems) {
+  if (!this.createdItems.contains(item)) {
+throw new IllegalArgumentException("This item is not a part of this 
pool");
+  }
+}
+
+// Return if this item was released earlier.
+// We cannot use this.items.contains() because that check is not based on 
reference equality.
+for (T entry : this.items) {
+  if (entry == item) {
+return;
+  }
+}
+
+while (true) {
+  try {
+this.items.put(item);

Review comment:
   Correct. Fixed. An earlier implementation ignored InterruptedException 
which required the while loop. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 689538)
Time Spent: 2h 10m  (was: 2h)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> imp

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=688949&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-688949
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 01/Dec/21 22:31
Start Date: 01/Dec/21 22:31
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r760620449



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/BoundedResourcePool.java
##
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.common;
+
+import java.util.Collections;
+import java.util.IdentityHashMap;
+import java.util.Set;
+import java.util.concurrent.ArrayBlockingQueue;
+
+/**
+ * Manages a fixed pool of resources.
+ *
+ * Avoids creating a new resource if a previously created instance is already 
available.
+ */
+public abstract class BoundedResourcePool extends ResourcePool {
+  // The size of this pool. Fixed at creation time.
+  private final int size;
+
+  // Items currently available in the pool.
+  private ArrayBlockingQueue items;
+
+  // Items that have been created so far (regardless of whether they are 
currently available).
+  private Set createdItems;
+
+  /**
+   * Constructs a resource pool of the given size.
+   *
+   * @param size the size of this pool. Cannot be changed post creation.
+   */
+  public BoundedResourcePool(int size) {
+Validate.checkPositiveInteger(size, "size");
+
+this.size = size;
+this.items = new ArrayBlockingQueue(size);
+
+// The created items are identified based on their object reference.
+this.createdItems = Collections.newSetFromMap(new IdentityHashMap());
+  }
+
+  /**
+   * Acquires a resource blocking if necessary until one becomes available.
+   */
+  @Override
+  public T acquire() {
+return this.acquireHelper(true);
+  }
+
+  /**
+   * Acquires a resource blocking if one is immediately available. Otherwise 
returns null.
+   */
+  @Override
+  public T tryAcquire() {
+return this.acquireHelper(false);
+  }
+
+  /**
+   * Releases a previously acquired resource.
+   */
+  @Override
+  public void release(T item) {
+Validate.checkNotNull(item, "item");
+
+synchronized (this.createdItems) {
+  if (!this.createdItems.contains(item)) {
+throw new IllegalArgumentException("This item is not a part of this 
pool");
+  }
+}
+
+// Return if this item was released earlier.
+// We cannot use this.items.contains() because that check is not based on 
reference equality.
+for (T entry : this.items) {
+  if (entry == item) {
+return;
+  }
+}
+
+while (true) {
+  try {
+this.items.put(item);
+return;
+  } catch (InterruptedException e) {
+throw new IllegalStateException("release() should never block");
+  }
+}
+  }
+
+  @Override
+  public synchronized void close() {
+for (T item : this.createdItems) {
+  this.close(item);
+}
+
+this.items.clear();
+this.items = null;
+
+this.createdItems.clear();
+this.createdItems = null;
+  }
+
+  /**
+   * Derived classes may implement a way to cleanup each item.
+   */
+  @Override
+  protected synchronized void close(T item) {
+// Do nothing in this class. Allow overriding classes to take any cleanup 
action.
+  }
+
+  // Number of items created so far. Mostly for testing purposes.
+  public int numCreated() {
+synchronized (this.createdItems) {
+  return this.createdItems.size();
+}
+  }
+
+  // Number of items available to be acquired. Mostly for testing purposes.
+  public synchronized int numAvailable() {
+return (this.size - this.numCreated()) + this.items.size();
+  }
+
+  // For debugging purposes.
+  @Override
+  public synchronized String toString() {
+return String.format(
+"size = %d, #created = %d, #in-queue = %d, #available = %d",
+this.size, this.numCreated(), this.items.size(), this.numAvailable());
+  }
+
+  /**

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=688948&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-688948
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 01/Dec/21 22:27
Start Date: 01/Dec/21 22:27
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r760618253



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/BoundedResourcePool.java
##
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.common;
+
+import java.util.Collections;
+import java.util.IdentityHashMap;
+import java.util.Set;
+import java.util.concurrent.ArrayBlockingQueue;
+
+/**
+ * Manages a fixed pool of resources.
+ *
+ * Avoids creating a new resource if a previously created instance is already 
available.
+ */
+public abstract class BoundedResourcePool extends ResourcePool {
+  // The size of this pool. Fixed at creation time.
+  private final int size;
+
+  // Items currently available in the pool.
+  private ArrayBlockingQueue items;
+
+  // Items that have been created so far (regardless of whether they are 
currently available).
+  private Set createdItems;
+
+  /**
+   * Constructs a resource pool of the given size.
+   *
+   * @param size the size of this pool. Cannot be changed post creation.
+   */
+  public BoundedResourcePool(int size) {
+Validate.checkPositiveInteger(size, "size");
+
+this.size = size;
+this.items = new ArrayBlockingQueue(size);
+
+// The created items are identified based on their object reference.
+this.createdItems = Collections.newSetFromMap(new IdentityHashMap());
+  }
+
+  /**
+   * Acquires a resource blocking if necessary until one becomes available.
+   */
+  @Override
+  public T acquire() {
+return this.acquireHelper(true);
+  }
+
+  /**
+   * Acquires a resource blocking if one is immediately available. Otherwise 
returns null.
+   */
+  @Override
+  public T tryAcquire() {
+return this.acquireHelper(false);
+  }
+
+  /**
+   * Releases a previously acquired resource.
+   */
+  @Override
+  public void release(T item) {
+Validate.checkNotNull(item, "item");
+
+synchronized (this.createdItems) {
+  if (!this.createdItems.contains(item)) {
+throw new IllegalArgumentException("This item is not a part of this 
pool");
+  }
+}
+
+// Return if this item was released earlier.
+// We cannot use this.items.contains() because that check is not based on 
reference equality.
+for (T entry : this.items) {
+  if (entry == item) {
+return;
+  }
+}
+
+while (true) {
+  try {
+this.items.put(item);

Review comment:
   While loop isn't needed?. ArrayBlockingQueue inherently waits for space 
to become available?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 688948)
Time Spent: 1h 50m  (was: 1h 40m)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves rea

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=688946&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-688946
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 01/Dec/21 22:27
Start Date: 01/Dec/21 22:27
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#discussion_r760618253



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/BoundedResourcePool.java
##
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.common;
+
+import java.util.Collections;
+import java.util.IdentityHashMap;
+import java.util.Set;
+import java.util.concurrent.ArrayBlockingQueue;
+
+/**
+ * Manages a fixed pool of resources.
+ *
+ * Avoids creating a new resource if a previously created instance is already 
available.
+ */
+public abstract class BoundedResourcePool extends ResourcePool {
+  // The size of this pool. Fixed at creation time.
+  private final int size;
+
+  // Items currently available in the pool.
+  private ArrayBlockingQueue items;
+
+  // Items that have been created so far (regardless of whether they are 
currently available).
+  private Set createdItems;
+
+  /**
+   * Constructs a resource pool of the given size.
+   *
+   * @param size the size of this pool. Cannot be changed post creation.
+   */
+  public BoundedResourcePool(int size) {
+Validate.checkPositiveInteger(size, "size");
+
+this.size = size;
+this.items = new ArrayBlockingQueue(size);
+
+// The created items are identified based on their object reference.
+this.createdItems = Collections.newSetFromMap(new IdentityHashMap());
+  }
+
+  /**
+   * Acquires a resource blocking if necessary until one becomes available.
+   */
+  @Override
+  public T acquire() {
+return this.acquireHelper(true);
+  }
+
+  /**
+   * Acquires a resource blocking if one is immediately available. Otherwise 
returns null.
+   */
+  @Override
+  public T tryAcquire() {
+return this.acquireHelper(false);
+  }
+
+  /**
+   * Releases a previously acquired resource.
+   */
+  @Override
+  public void release(T item) {
+Validate.checkNotNull(item, "item");
+
+synchronized (this.createdItems) {
+  if (!this.createdItems.contains(item)) {
+throw new IllegalArgumentException("This item is not a part of this 
pool");
+  }
+}
+
+// Return if this item was released earlier.
+// We cannot use this.items.contains() because that check is not based on 
reference equality.
+for (T entry : this.items) {
+  if (entry == item) {
+return;
+  }
+}
+
+while (true) {
+  try {
+this.items.put(item);

Review comment:
   While loop isn't needed?. Doesn't ArrayBlockingQueue inherently wait for 
space to become available?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 688946)
Time Spent: 1h 40m  (was: 1.5h)

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improve

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=688906&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-688906
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 01/Dec/21 20:30
Start Date: 01/Dec/21 20:30
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-984029933


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 45s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 22 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 41s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 48s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 31s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 47s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 13s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 32s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 22s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/8/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 3 new + 3 unchanged - 4 fixed 
= 6 total (was 7)  |
   | +1 :green_heart: |  mvnsite  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 19s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m  0s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   2m 41s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/8/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 33s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  88m 17s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.fs.s3a.s3guard.TestObjectChangeDetectionAttributes |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/8/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3736 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell xml spotbugs checkstyle markdownlint |
   | uname | Linux 3a9e9e945b2d 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 3ad67644bf8b24f44013b14bd0dd19560f654fd9 |
   | Default Java | Private Bui

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-12-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=688891&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-688891
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 01/Dec/21 20:03
Start Date: 01/Dec/21 20:03
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-984009917


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  8s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 22 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  37m 43s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 55s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 43s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 31s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 53s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 23s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  27m 25s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 54s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 52s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 52s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 24s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/7/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 3 new + 3 unchanged - 4 fixed 
= 6 total (was 7)  |
   | +1 :green_heart: |  mvnsite  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 20s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | -1 :x: |  spotbugs  |   1m 38s | 
[/new-spotbugs-hadoop-tools_hadoop-aws.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/7/artifact/out/new-spotbugs-hadoop-tools_hadoop-aws.html)
 |  hadoop-tools/hadoop-aws generated 3 new + 0 unchanged - 0 fixed = 3 total 
(was 0)  |
   | +1 :green_heart: |  shadedclient  |  27m 46s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   3m 19s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/7/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 36s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 110m 16s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-tools/hadoop-aws |
   |  |  Result of integer multiplication cast to long in 
org.apache.hadoop.fs.common.BlockData.getSize(int)  At BlockData.java:to long 
in org.apache.hadoop.fs.common.BlockData.getSize(int)  At BlockData.java:[line 
111] |
   |  |  Inconsistent synchronization of 
org.apache.hadoop.fs.common.BufferData.buffer; locked 71% of time  
Unsynchronized access at BufferData.java:71% of time  Unsynchronized access at 
BufferData.java:[line 260] |
   |  |  Inconsistent synchronization of 
org.apache.hadoop.fs.s3a.S3AFileSystem.futurePool; locked 

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=688408&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-688408
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 01/Dec/21 04:04
Start Date: 01/Dec/21 04:04
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-983265939


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 23s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 22 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  42m 50s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 49s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 29s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 46s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 30s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 21s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/6/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 18 new + 3 unchanged - 4 fixed 
= 21 total (was 7)  |
   | +1 :green_heart: |  mvnsite  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 16s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | -1 :x: |  spotbugs  |   1m 17s | 
[/new-spotbugs-hadoop-tools_hadoop-aws.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/6/artifact/out/new-spotbugs-hadoop-tools_hadoop-aws.html)
 |  hadoop-tools/hadoop-aws generated 3 new + 0 unchanged - 0 fixed = 3 total 
(was 0)  |
   | +1 :green_heart: |  shadedclient  |  20m 13s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   2m 53s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/6/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 38s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  99m 23s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-tools/hadoop-aws |
   |  |  Result of integer multiplication cast to long in 
org.apache.hadoop.fs.common.BlockData.getSize(int)  At BlockData.java:to long 
in org.apache.hadoop.fs.common.BlockData.getSize(int)  At BlockData.java:[line 
111] |
   |  |  Inconsistent synchronization of 
org.apache.hadoop.fs.common.BufferData.buffer; locked 71% of time  
Unsynchronized access at BufferData.java:71% of time  Unsynchronized access at 
BufferData.java:[line 260] |
   |  |  Inconsistent synchronization of 
org.apache.hadoop.fs.s3a.S3AFileSystem.futurePool; locke

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=688340&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-688340
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 30/Nov/21 23:31
Start Date: 30/Nov/21 23:31
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-983122632


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 53s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 21 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 57s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 47s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 48s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m  6s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 22s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/5/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 106 new + 5 unchanged - 0 
fixed = 111 total (was 5)  |
   | +1 :green_heart: |  mvnsite  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | -1 :x: |  spotbugs  |   1m 18s | 
[/new-spotbugs-hadoop-tools_hadoop-aws.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/5/artifact/out/new-spotbugs-hadoop-tools_hadoop-aws.html)
 |  hadoop-tools/hadoop-aws generated 2 new + 0 unchanged - 0 fixed = 2 total 
(was 0)  |
   | +1 :green_heart: |  shadedclient  |  20m  3s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   2m 45s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/5/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 34s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  87m  8s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-tools/hadoop-aws |
   |  |  Inconsistent synchronization of 
org.apache.hadoop.fs.common.BufferData.buffer; locked 71% of time  
Unsynchronized access at BufferData.java:71% of time  Unsynchronized access at 
BufferData.java:[line 256] |
   |  |  Inconsistent synchronization of 
org.apache.hadoop.fs.s3a.S3AFileSystem.futurePool; locked 50% of time  
Unsynchronized access at S3AFileSystem.java:50% of time  Unsynchronized access 
at S3AFileSystem.java:[line 762] |
   | Failed junit tests | 
hadoop.fs.s3a.s3guard.TestObjectChangeDetectionAttributes |
   
 

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=688259&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-688259
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 30/Nov/21 20:13
Start Date: 30/Nov/21 20:13
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-982982812


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 55s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 21 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  36m 44s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 46s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 33s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 50s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 13s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 24s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 47s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | -1 :x: |  javac  |   0m 40s | 
[/results-compile-javac-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/4/artifact/out/results-compile-javac-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt)
 |  hadoop-tools_hadoop-aws-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 1 new + 64 unchanged - 1 fixed 
= 65 total (was 65)  |
   | +1 :green_heart: |  compile  |   0m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | -1 :x: |  javac  |   0m 32s | 
[/results-compile-javac-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/4/artifact/out/results-compile-javac-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  
hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 
with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 1 new + 
64 unchanged - 1 fixed = 65 total (was 65)  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 22s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/4/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 106 new + 5 unchanged - 0 
fixed = 111 total (was 5)  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 19s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | -1 :x: |  spotbugs  |   1m 15s | 
[/new-spotbugs-hadoop-tools_hadoop-aws.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/4/artifact/out/new-spotbugs-hadoop-tools_hadoop-aws.html)
 |  hadoop-tools/hadoop-aws generated 2 new + 0 unchanged - 0 fixed = 2 total 
(was 0)  |
   | +1 :green_heart: |  shadedclient  |  20m  1s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   2m 43s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multib

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=68&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-68
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 30/Nov/21 06:02
Start Date: 30/Nov/21 06:02
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-982313913


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 52s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 21 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  44m 55s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 56s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 43s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 34s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 52s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 33s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m 39s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 46s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | -1 :x: |  javac  |   0m 40s | 
[/results-compile-javac-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/3/artifact/out/results-compile-javac-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt)
 |  hadoop-tools_hadoop-aws-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 1 new + 64 unchanged - 1 fixed 
= 65 total (was 65)  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | -1 :x: |  javac  |   0m 33s | 
[/results-compile-javac-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/3/artifact/out/results-compile-javac-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  
hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 
with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 1 new + 
64 unchanged - 1 fixed = 65 total (was 65)  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 23s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/3/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 106 new + 5 unchanged - 0 
fixed = 111 total (was 5)  |
   | +1 :green_heart: |  mvnsite  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | -1 :x: |  javadoc  |   0m 26s | 
[/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/3/artifact/out/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  hadoop-aws in the patch failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.  |
   | -1 :x: |  spotbugs  |   1m 16s | 
[/new-spotbugs-hadoop-tools_hadoop-aws.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/3/artifact/out/new-spotbugs-hadoop-tools_hadoop-aws.html)
 |  hadoop-tools/hadoop-aws generated 2 new + 0 unchanged - 0 fixed = 2 total 
(was 

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=687773&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687773
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 30/Nov/21 05:34
Start Date: 30/Nov/21 05:34
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-982302964


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 43s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  1s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 21 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  50m  3s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 51s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 29s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 48s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 17s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 21s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 49s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 22s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/2/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 106 new + 5 unchanged - 0 
fixed = 111 total (was 5)  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | -1 :x: |  javadoc  |   0m 27s | 
[/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/2/artifact/out/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  hadoop-aws in the patch failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.  |
   | -1 :x: |  spotbugs  |   1m 18s | 
[/new-spotbugs-hadoop-tools_hadoop-aws.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/2/artifact/out/new-spotbugs-hadoop-tools_hadoop-aws.html)
 |  hadoop-tools/hadoop-aws generated 9 new + 0 unchanged - 0 fixed = 9 total 
(was 0)  |
   | +1 :green_heart: |  shadedclient  |  20m 27s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   2m 45s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/2/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch passed.  |
   | -1 :x: |  asflicense  |   0m 33s | 
[/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/2/artifact/out/results-asflicense.txt)
 |  The patch generated 1 ASF License warnings.  |
   |  |   | 106m 50s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-tools/hadoop-aws |
   |  |  Format string should use %n rather than n in 
org.apache.hadoop.fs.common.BlockData.getStateString()  At 
BlockData.java:rather than n 

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=687526&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687526
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 29/Nov/21 18:56
Start Date: 29/Nov/21 18:56
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736#issuecomment-981921238


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  7s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 21 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 22s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 50s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 32s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 47s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 31s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 13s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 30s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 46s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | -1 :x: |  javac  |   0m 40s | 
[/results-compile-javac-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/1/artifact/out/results-compile-javac-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt)
 |  hadoop-tools_hadoop-aws-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 1 new + 64 unchanged - 1 fixed 
= 65 total (was 65)  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | -1 :x: |  javac  |   0m 33s | 
[/results-compile-javac-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/1/artifact/out/results-compile-javac-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  
hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 
with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 1 new + 
64 unchanged - 1 fixed = 65 total (was 65)  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 23s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/1/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 183 new + 5 unchanged - 0 
fixed = 188 total (was 5)  |
   | +1 :green_heart: |  mvnsite  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 20s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | -1 :x: |  javadoc  |   0m 27s | 
[/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/1/artifact/out/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  hadoop-aws in the patch failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.  |
   | -1 :x: |  spotbugs  |   1m 15s | 
[/new-spotbugs-hadoop-tools_hadoop-aws.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3736/1/artifact/out/new-spotbugs-hadoop-tools_hadoop-aws.html)
 |  hadoop-tools/hadoop-aws generated 8 new + 0 unchanged - 0 fixed = 8 total 
(was 

[jira] [Work logged] (HADOOP-18028) improve S3 read speed using prefetching & caching

2021-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=687471&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-687471
 ]

ASF GitHub Bot logged work on HADOOP-18028:
---

Author: ASF GitHub Bot
Created on: 29/Nov/21 17:26
Start Date: 29/Nov/21 17:26
Worklog Time Spent: 10m 
  Work Description: bhalchandrap opened a new pull request #3736:
URL: https://github.com/apache/hadoop/pull/3736


   ### Description of PR
   I work for Pinterest. I developed a technique for vastly improving read 
throughput when reading from the S3 file system. It not only helps the 
sequential read case (like reading a SequenceFile) but also significantly 
improves read throughput of a random access case (like reading Parquet). This 
technique has been very useful in significantly improving efficiency of the 
data processing jobs at Pinterest. 

   I would like to contribute that feature to Apache Hadoop. More details on 
this technique are available in this blog I wrote recently:
   
https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0
   
   ### How was this patch tested?
   Tested against S3 in us-east-1 region.
   
   There are a small number of test failures (most within s3guard). I can 
attach test output if it helps.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 687471)
Remaining Estimate: 0h
Time Spent: 10m

> improve S3 read speed using prefetching & caching
> -
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org