[jira] [Created] (HDFS-16939) Fix the thread safety bug in LowRedundancyBlocks

2023-03-02 Thread Shuyan Zhang (Jira)
Shuyan Zhang created HDFS-16939:
---

 Summary: Fix the thread safety bug in LowRedundancyBlocks
 Key: HDFS-16939
 URL: https://issues.apache.org/jira/browse/HDFS-16939
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namanode
Reporter: Shuyan Zhang
Assignee: Shuyan Zhang


The remove method in LowRedundancyBlocks is not protected by synchronized. This 
method is private and is called by BlockManager. As a result, priorityQueues 
has the risk of being accessed concurrently by multiple threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.3.5 (RC2)

2023-03-02 Thread Ayush Saxena
>
> I will highlight that I am completely fed up with doing this  release and
> really want to get it out the way -for which I depend on support from as
> many other developers as possible.


hmm, I can feel the pain. I tried to find if there is any config or any
workaround which can dodge this HDFS issue, but unfortunately couldn't find
any. If someone does a getListing with needLocation and the file doesn't
exist at Observer he is gonna get a NPE rather than a FNF, It isn't just
the exception, AFAIK Observer reads have some logic around handling FNF
specifically, that it attempts Active NN or something like that in such
cases, So, that will be broken as well for this use case.

Now, there is no denying the fact there is an issue on the HDFS side, and
it has already been too much work on your side, so you can argue that it
might not be a very frequent use case or so. It's your call.

Just sharing, no intentions of saying you should do that, But as an RM
"nobody" can force you for a new iteration of a RC, it is gonna be your
call and discretion. As far as I know a release can not be vetoed by
"anybody" as per the apache by laws.(
https://www.apache.org/legal/release-policy.html#release-approval). Even
our bylaws say that product release requires a Lazy Majority not a
Consensus Approval.

So, you have a way out. You guys are 2 already and 1 I will give you a
pass, in case you are really in a state: ''Get me out of this mess" state,
my basic validations on x86 & Aarch64 both are passing as of now, couldn't
reach the end for any of the RC's

-Ayush

On Fri, 3 Mar 2023 at 08:41, Viraj Jasani  wrote:

> While this RC is not going to be final, I just wanted to share the results
> of the testing I have done so far with RC1 and RC2.
>
> * Signature: ok
> * Checksum : ok
> * Rat check (1.8.0_341): ok
>  - mvn clean apache-rat:check
> * Built from source (1.8.0_341): ok
>  - mvn clean install  -DskipTests
> * Built tar from source (1.8.0_341): ok
>  - mvn clean package  -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true
>
> * Built images using the tarball, installed and started all of Hdfs, JHS
> and Yarn components
> * Ran Hbase (latest 2.5) tests against Hdfs, ran RowCounter Mapreduce job
> * Hdfs CRUD tests
> * MapReduce wordcount job
>
> * Ran S3A tests with scale profile against us-west-2:
> mvn clean verify -Dparallel-tests -DtestsThreadCount=8 -Dscale
>
> ITestS3AConcurrentOps#testParallelRename is timing out after ~960s. This is
> consistently failing, looks like a recent regression.
> I was also able to repro on trunk, will create Jira.
>
>
> On Mon, Feb 27, 2023 at 9:59 AM Steve Loughran  >
> wrote:
>
> > Mukund and I have put together a release candidate (RC2) for Hadoop
> 3.3.5.
> >
> > We need anyone who can to verify the source and binary artifacts,
> > including those JARs staged on maven, the site documentation and the
> arm64
> > tar file.
> >
> > The RC is available at:
> > https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC2/
> >
> > The git tag is release-3.3.5-RC2, commit 72f8c2a4888
> >
> > The maven artifacts are staged at
> > https://repository.apache.org/content/repositories/orgapachehadoop-1369/
> >
> > You can find my public key at:
> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> >
> > Change log
> >
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC2/CHANGELOG.md
> >
> > Release notes
> >
> >
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC2/RELEASENOTES.md
> >
> > This is off branch-3.3 and is the first big release since 3.3.2.
> >
> > As to what changed since the RC1 attempt last week
> >
> >
> >1. Version fixup in JIRA (credit due to Takanobu Asanuma there)
> >2. HADOOP-18470. Remove HDFS RBF text in the 3.3.5 index.md file
> >3. Revert "HADOOP-18590. Publish SBOM artifacts (#5281)" (creating
> build
> >issues in maven 3.9.0)
> >4. HADOOP-18641. Cloud connector dependency and LICENSE fixup. (#5429)
> >
> >
> > Note, because the arm64 binaries are built separately on a different
> > platform and JVM, their jar files may not match those of the x86
> > release -and therefore the maven artifacts. I don't think this is
> > an issue (the ASF actually releases source tarballs, the binaries are
> > there for help only, though with the maven repo that's a bit blurred).
> >
> > The only way to be consistent would actually untar the x86.tar.gz,
> > overwrite its binaries with the arm stuff, retar, sign and push out
> > for the vote. Even automating that would be risky.
> >
> > Please try the release and vote. The vote will run for 5 days.
> >
> > Steve and Mukund
> >
>


Re: [VOTE] Release Apache Hadoop 3.3.5 (RC2)

2023-03-02 Thread Viraj Jasani
While this RC is not going to be final, I just wanted to share the results
of the testing I have done so far with RC1 and RC2.

* Signature: ok
* Checksum : ok
* Rat check (1.8.0_341): ok
 - mvn clean apache-rat:check
* Built from source (1.8.0_341): ok
 - mvn clean install  -DskipTests
* Built tar from source (1.8.0_341): ok
 - mvn clean package  -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true

* Built images using the tarball, installed and started all of Hdfs, JHS
and Yarn components
* Ran Hbase (latest 2.5) tests against Hdfs, ran RowCounter Mapreduce job
* Hdfs CRUD tests
* MapReduce wordcount job

* Ran S3A tests with scale profile against us-west-2:
mvn clean verify -Dparallel-tests -DtestsThreadCount=8 -Dscale

ITestS3AConcurrentOps#testParallelRename is timing out after ~960s. This is
consistently failing, looks like a recent regression.
I was also able to repro on trunk, will create Jira.


On Mon, Feb 27, 2023 at 9:59 AM Steve Loughran 
wrote:

> Mukund and I have put together a release candidate (RC2) for Hadoop 3.3.5.
>
> We need anyone who can to verify the source and binary artifacts,
> including those JARs staged on maven, the site documentation and the arm64
> tar file.
>
> The RC is available at:
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC2/
>
> The git tag is release-3.3.5-RC2, commit 72f8c2a4888
>
> The maven artifacts are staged at
> https://repository.apache.org/content/repositories/orgapachehadoop-1369/
>
> You can find my public key at:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> Change log
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC2/CHANGELOG.md
>
> Release notes
>
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC2/RELEASENOTES.md
>
> This is off branch-3.3 and is the first big release since 3.3.2.
>
> As to what changed since the RC1 attempt last week
>
>
>1. Version fixup in JIRA (credit due to Takanobu Asanuma there)
>2. HADOOP-18470. Remove HDFS RBF text in the 3.3.5 index.md file
>3. Revert "HADOOP-18590. Publish SBOM artifacts (#5281)" (creating build
>issues in maven 3.9.0)
>4. HADOOP-18641. Cloud connector dependency and LICENSE fixup. (#5429)
>
>
> Note, because the arm64 binaries are built separately on a different
> platform and JVM, their jar files may not match those of the x86
> release -and therefore the maven artifacts. I don't think this is
> an issue (the ASF actually releases source tarballs, the binaries are
> there for help only, though with the maven repo that's a bit blurred).
>
> The only way to be consistent would actually untar the x86.tar.gz,
> overwrite its binaries with the arm stuff, retar, sign and push out
> for the vote. Even automating that would be risky.
>
> Please try the release and vote. The vote will run for 5 days.
>
> Steve and Mukund
>


Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64

2023-03-02 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/954/

No changes


ERROR: File 'out/email-report.txt' does not exist

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64

2023-03-02 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1153/

[Mar 1, 2023, 3:10:05 PM] (Szilard Nemeth) MAPREDUCE-7434. Fix ShuffleHandler 
tests. Contributed by Tamas Domok
[Mar 1, 2023, 6:53:10 PM] (github) HDFS-16935. Fix 
TestFsDatasetImpl#testReportBadBlocks (#5432)
[Mar 1, 2023, 7:47:04 PM] (github) HDFS-16896 clear ignoredNodes list when we 
clear deadnode list on ref… (#5322)
[Mar 2, 2023, 12:02:07 AM] (github) HADOOP-18648. Avoid loading kms log4j 
properties dynamically by KMSWebServer (#5441)
[Mar 2, 2023, 12:18:38 AM] (github) HDFS-16923. [SBN read] getlisting RPC to 
observer will throw NPE if path does not exist (#5400)




-1 overall


The following subsystems voted -1:
blanks hadolint pathlen spotbugs unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml
 

spotbugs :

   module:hadoop-mapreduce-project/hadoop-mapreduce-client 
   Write to static field 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.nextId from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:[line 120] 

spotbugs :

   
module:hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core
 
   Write to static field 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.nextId from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:[line 120] 

spotbugs :

   module:hadoop-mapreduce-project 
   Write to static field 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.nextId from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:[line 120] 

spotbugs :

   module:root 
   Write to static field 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.nextId from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:[line 120] 

Failed junit tests :

   hadoop.hdfs.server.namenode.TestFsck 
   hadoop.hdfs.server.datanode.TestDirectoryScanner 
   
hadoop.hdfs.server.federation.router.TestRouterRPCMultipleDestinationMountTableResolver
 
  

   cc:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1153/artifact/out/results-compile-cc-root.txt
 [96K]

   javac:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1153/artifact/out/results-compile-javac-root.txt
 [528K]

   blanks:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-

Re: [VOTE] Release Apache Hadoop 3.3.5 (RC2)

2023-03-02 Thread Steve Loughran
well, lets see what others say.

we don't want to ship stuff with serious regression to hdfs.

I will highlight that I am completely fed up with doing this  release and
really want to get it out the way -for which I depend on support from as
many other developers as possible.

Erik, right now what you can help by doing is test all the rest of the
release knowing that this issue exists and seeing if you can identify
anything else. That way this update will be the sole blocker and we can get
through that next RC with nothing else surfacing.

I had noticed that the arm64 release somehow missed out the native binaries
and was going to investigate that but didn't consider that a blocker… I was
just going to cut that artefact and, post Darcy, create a new arm64 release
using all the jars of the x86 build but replacing the x86 native libs with
the arm versions. This helps ensure that the JAR files in the wild all
match, which strikes me as a good thing.

Can I also encourage people in the HFS team to put their hand up and
volunteer for leading the next release, with a goal of getting something
out later this year.



On Thu, 2 Mar 2023 at 00:27, Erik Krogen  wrote:

> Hi folks, apologies for being late to the release conversation, but I think
> we need to get HDFS-16923
>  into
> 3.3.5. HDFS-16732 ,
> which
> also went into 3.3.5, introduced an issue whereby Observer NameNodes will
> throw NPE upon any getListing call on a directory that doesn't exist. It
> will make Observer NN pretty much unusable in 3.3.5. Zander put up a patch
> for this and it has been merged to trunk/branch-3.3 as of a few minutes
> ago. I'd like to see about merging to branch-3.3.5 as well.
>
> Thanks for the consideration and sorry for not bringing this up in RC1 or
> earlier.
>
> On Mon, Feb 27, 2023 at 9:59 AM Steve Loughran  >
> wrote:
>
> > Mukund and I have put together a release candidate (RC2) for Hadoop
> 3.3.5.
> >
> > We need anyone who can to verify the source and binary artifacts,
> > including those JARs staged on maven, the site documentation and the
> arm64
> > tar file.
> >
> > The RC is available at:
> > https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC2/
> >
> > The git tag is release-3.3.5-RC2, commit 72f8c2a4888
> >
> > The maven artifacts are staged at
> > https://repository.apache.org/content/repositories/orgapachehadoop-1369/
> >
> > You can find my public key at:
> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> >
> > Change log
> >
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC2/CHANGELOG.md
> >
> > Release notes
> >
> >
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC2/RELEASENOTES.md
> >
> > This is off branch-3.3 and is the first big release since 3.3.2.
> >
> > As to what changed since the RC1 attempt last week
> >
> >
> >1. Version fixup in JIRA (credit due to Takanobu Asanuma there)
> >2. HADOOP-18470. Remove HDFS RBF text in the 3.3.5 index.md file
> >3. Revert "HADOOP-18590. Publish SBOM artifacts (#5281)" (creating
> build
> >issues in maven 3.9.0)
> >4. HADOOP-18641. Cloud connector dependency and LICENSE fixup. (#5429)
> >
> >
> > Note, because the arm64 binaries are built separately on a different
> > platform and JVM, their jar files may not match those of the x86
> > release -and therefore the maven artifacts. I don't think this is
> > an issue (the ASF actually releases source tarballs, the binaries are
> > there for help only, though with the maven repo that's a bit blurred).
> >
> > The only way to be consistent would actually untar the x86.tar.gz,
> > overwrite its binaries with the arm stuff, retar, sign and push out
> > for the vote. Even automating that would be risky.
> >
> > Please try the release and vote. The vote will run for 5 days.
> >
> > Steve and Mukund
> >
>