[hadoop] branch branch-3.3.5 updated: HADOOP-18470. Update index md with section on ABFS prefetching

stevel Mon, 19 Dec 2022 05:03:39 -0800

This is an automated email from the ASF dual-hosted git repository.

stevel pushed a commit to branch branch-3.3.5
in repository https://gitbox.apache.org/repos/asf/hadoop.git



The following commit(s) were added to refs/heads/branch-3.3.5 by this push:
     new 3262495904d HADOOP-18470. Update index md with section on ABFS 
prefetching
3262495904d is described below

commit 3262495904d1af03f0f8c77aba69fea6e3122b64
Author: Steve Loughran <ste...@cloudera.com>
AuthorDate: Mon Dec 19 12:54:37 2022 +0000

    HADOOP-18470. Update index md with section on ABFS prefetching
---
 hadoop-project/src/site/markdown/index.md.vm | 77 ++++++++++++++++++----------
 1 file changed, 50 insertions(+), 27 deletions(-)

diff --git a/hadoop-project/src/site/markdown/index.md.vm 
b/hadoop-project/src/site/markdown/index.md.vm
index 05478ea50ac..5e0a46449fa 100644
--- a/hadoop-project/src/site/markdown/index.md.vm
+++ b/hadoop-project/src/site/markdown/index.md.vm
@@ -23,11 +23,29 @@ Overview of Changes
 Users are encouraged to read the full set of release notes.
 This page provides an overview of the major changes.
 
+Azure ABFS: Critical Stream Prefetch Fix
+---------------------------------------------
+
+The abfs has a critical bug fix
+[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
+*ABFS. Disable purging list of in-progress reads in abfs stream close().*
+
+All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade
+or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0`
+
+Consult the parent JIRA 
[HADOOP-18521](https://issues.apache.org/jira/browse/HADOOP-18521)
+*ABFS ReadBufferManager buffer sharing across concurrent HTTP requests*
+for root cause analysis, details on what is affected, and mitigations.
+
+
 Vectored IO API
 ---------------
 
+[HADOOP-18103](https://issues.apache.org/jira/browse/HADOOP-18103).
+*High performance vectored read API in Hadoop*
+
 The `PositionedReadable` interface has now added an operation for
-Vectored (also known as Scatter/Gather IO):
+Vectored IO (also known as Scatter/Gather IO):
 
 ```java
 void readVectored(List<? extends FileRange> ranges, IntFunction<ByteBuffer> 
allocate)
@@ -38,25 +56,25 @@ possibly in parallel, with results potentially coming in 
out-of-order.
 
 1. The default implementation uses a series of `readFully()` calls, so delivers
    equivalent performance.
-2. The local filesystem uses java native IO calls for higher performance reads 
than `readFully()`
+2. The local filesystem uses java native IO calls for higher performance reads 
than `readFully()`.
 3. The S3A filesystem issues parallel HTTP GET requests in different threads.
 
-Benchmarking of (modified) ORC and Parquet clients through `file://` and 
`s3a://`
-show tangible improvements in query times.
+Benchmarking of enhanced Apache ORC and Apache Parquet clients through 
`file://` and `s3a://`
+show significant improvements in query performance.
 
 Further Reading: 
[FsDataInputStream](./hadoop-project-dist/hadoop-common/filesystem/fsdatainputstream.html).
 
-Manifest Committer for Azure ABFS and google GCS performance
-------------------------------------------------------------
+Mapreduce: Manifest Committer for Azure ABFS and google GCS
+----------------------------------------------------------
 
-A new "intermediate manifest committer" uses a manifest file
+The new _Intermediate Manifest Committer_ uses a manifest file
 to commit the work of successful task attempts, rather than
 renaming directories.
 Job commit is matter of reading all the manifests, creating the
 destination directories (parallelized) and renaming the files,
 again in parallel.
 
-This is fast and correct on Azure Storage and Google GCS,
+This is both fast and correct on Azure Storage and Google GCS,
 and should be used there instead of the classic v1/v2 file
 output committers.
 
@@ -69,24 +87,6 @@ More details are available in the
 [manifest 
committer](./hadoop-mapreduce-client/hadoop-mapreduce-client-core/manifest_committer.html).
 documentation.
 
-Transitive CVE fixes
---------------------
-
-A lot of dependencies have been upgraded to address recent CVEs.
-Many of the CVEs were not actually exploitable through the Hadoop
-so much of this work is just due diligence.
-However applications which have all the library is on a class path may
-be vulnerable, and the ugprades should also reduce the number of false
-positives security scanners report.
-
-We have not been able to upgrade every single dependency to the latest
-version there is. Some of those changes are just going to be incompatible.
-If you have concerns about the state of a specific library, consult the apache 
JIRA
-issue tracker to see what discussions have taken place about the library in 
question.
-
-As an open source project, contributions in this area are always welcome,
-especially in testing the active branches, testing applications downstream of
-those branches and of whether updated dependencies trigger regressions.
 
 HDFS: Router Based Federation
 -----------------------------
@@ -96,7 +96,6 @@ A lot of effort has been invested into stabilizing/improving 
the HDFS Router Bas
 1. HDFS-13522, HDFS-16767 & Related Jiras: Allow Observer Reads in HDFS Router 
Based Federation.
 2. HDFS-13248: RBF supports Client Locality
 
-
 HDFS: Dynamic Datanode Reconfiguration
 --------------------------------------
 
@@ -109,6 +108,29 @@ cluster-wide Datanode Restarts.
 See 
[DataNode.java](https://github.com/apache/hadoop/blob/branch-3.3.5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L346-L361)
 for the list of dynamically reconfigurable attributes.
 
+
+Transitive CVE fixes
+--------------------
+
+A lot of dependencies have been upgraded to address recent CVEs.
+Many of the CVEs were not actually exploitable through the Hadoop
+so much of this work is just due diligence.
+However applications which have all the library is on a class path may
+be vulnerable, and the ugprades should also reduce the number of false
+positives security scanners report.
+
+We have not been able to upgrade every single dependency to the latest
+version there is. Some of those changes are just going to be incompatible.
+If you have concerns about the state of a specific library, consult the pache 
JIRA
+issue tracker to see whether a JIRA has been filed, discussions have taken 
place about
+the library in question, and whether or not there is already a fix in the 
pipeline.
+*Please don't file new JIRAs about dependency-X.Y.Z having a CVE without
+searching for any existing issue first*
+
+As an open source project, contributions in this area are always welcome,
+especially in testing the active branches, testing applications downstream of
+those branches and of whether updated dependencies trigger regressions.
+
 Getting Started
 ===============
 
@@ -119,3 +141,4 @@ which shows you how to set up a single-node Hadoop 
installation.
 Then move on to the
 [Cluster Setup](./hadoop-project-dist/hadoop-common/ClusterSetup.html)
 to learn how to set up a multi-node Hadoop installation.
+


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-commits-h...@hadoop.apache.org

[hadoop] branch branch-3.3.5 updated: HADOOP-18470. Update index md with section on ABFS prefetching

Reply via email to