Restrict Frequency of BlockReport To Namenode startup and failover

2020-02-04 Thread Ayush Saxena
Hi All,
Me and Surendra have been lately trying to minimise the impact of Block Reports 
on Namenode in huge cluster. We observed in a huge cluster, about 10k 
datanodes, the periodic block reports impact the Namenode performance adversely.
We have been thinking to restrict the block reports to be triggered only during 
Namenode startup or in case of failover and eliminate the periodic block report.
The main purpose of block report is to get a corrupt blocks recognised, so as a 
follow up we can maintain a service at datanode to run periodically to check if 
the block size in memory is same as that reported to namenode, and the datanode 
can alarm the namenode in case of any suspect,(We still need to plan this.)

At the datanode side, a datanode can send a BlockReport or restore its actual 
frequency in case during the configured time period, the Datanode got shutdown 
or lost connection with the namenode, say if the datanode was supposed to send 
BR at 2100 hrs, if during the last 6 hrs there has been any failover or loss of 
connection between the namenode and datanode, it will trigger BR normally, else 
shall skip sending the BR

Let us know thoughts/challenges/improvements in this.

-Ayush



-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: branch2.10+JDK7 on Linux/x86

2020-02-04 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/587/

No changes




-1 overall


The following subsystems voted -1:
asflicense findbugs hadolint pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml
 
   hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml 
   hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml
 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client
 
   Boxed value is unboxed and then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:[line 335] 

Failed junit tests :

   hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys 
   hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.registry.secure.TestSecureLogins 
   hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 
   hadoop.yarn.client.api.impl.TestAMRMProxy 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/587/artifact/out/diff-compile-cc-root-jdk1.7.0_95.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/587/artifact/out/diff-compile-javac-root-jdk1.7.0_95.txt
  [328K]

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/587/artifact/out/diff-compile-cc-root-jdk1.8.0_232.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/587/artifact/out/diff-compile-javac-root-jdk1.8.0_232.txt
  [308K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/587/artifact/out/diff-checkstyle-root.txt
  [16M]

   hadolint:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/587/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/587/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/587/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/587/artifact/out/diff-patch-shellcheck.txt
  [56K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/587/artifact/out/diff-patch-shelldocs.txt
  [8.0K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/587/artifact/out/whitespace-eol.txt
  [12M]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/587/artifact/out/whitespace-tabs.txt
  [1.3M]

   xml:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/587/artifact/out/xml.txt
  [12K]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/587/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-hbase_hadoop-yarn-server-timelineservice-hbase-client-warnings.html
  [8.0K]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/587/artifact/out/diff-javadoc-javadoc-root-jdk1.7.0_95.txt
  [16K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/587/artifact/out/diff-javadoc-javadoc-root-jdk1.8.0_232.txt
  [1.1M]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/587/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [240K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/587/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/587/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-registry.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x

Introduce Read Write Lock to Datanode

2020-02-04 Thread Stephen O'Donnell
I would like to reopen an old topic, which is to introduce a Read Write
lock to the datanode.

In the current trunk, a RentrentLock is used, so it is always exclusive.
However there are many code paths in the DN where an exclusive lock is not
necessary and a read lock would suffice.

We know the ReentrantReadWrite lock scales fine, as it is used extensively
in the namenode, so the performance of the lock should not be a concern.

My proposal in https://issues.apache.org/jira/browse/HDFS-15150 is to start
small on this, and simply replace the Reentrant lock with a
ReentrantReadWrite lock, and then make all lock acquisitions take the write
lock. That would keep the locking exactly as it is now, and hopefully
result in a patch that is relatively easy to review.

If we can agree on a patch for that, we can create followup Jiras to switch
various code paths to use the readlock over time.

I have a patch available on HDFS-15150, so I would appreciate any thoughts
on the patch and this idea in general.

Thanks,

Stephen.


Alternative Decommission Monitor Implementation

2020-02-04 Thread Stephen O'Donnell
Hi All,

In https://issues.apache.org/jira/browse/HDFS-14854 we committed a new
decommission monitor to trunk, which is disabled by default. The new
implementation hopes to be an improvement over the original monitor, but
without running it on a real cluster it is hard to know for sure.

I would like to ask if anyone has tried to use this new monitor in a
production cluster, and if so did you find any problems or did it work as
expected etc? I would be very interested in any feedback from anyone who
has tried to use it.

Thanks,

Stephen.


Re: Introduce Read Write Lock to Datanode

2020-02-04 Thread Wei-Chiu Chuang
Thanks for initiating this discussion here. I am +1 to the general approach
proposed.
With DN getting denser, this is necessary more than ever.

On Tue, Feb 4, 2020 at 10:33 AM Stephen O'Donnell
 wrote:

> I would like to reopen an old topic, which is to introduce a Read Write
> lock to the datanode.
>
> In the current trunk, a RentrentLock is used, so it is always exclusive.
> However there are many code paths in the DN where an exclusive lock is not
> necessary and a read lock would suffice.
>
> We know the ReentrantReadWrite lock scales fine, as it is used extensively
> in the namenode, so the performance of the lock should not be a concern.
>
> My proposal in https://issues.apache.org/jira/browse/HDFS-15150 is to
> start
> small on this, and simply replace the Reentrant lock with a
> ReentrantReadWrite lock, and then make all lock acquisitions take the write
> lock. That would keep the locking exactly as it is now, and hopefully
> result in a patch that is relatively easy to review.
>
> If we can agree on a patch for that, we can create followup Jiras to switch
> various code paths to use the readlock over time.
>
> I have a patch available on HDFS-15150, so I would appreciate any thoughts
> on the patch and this idea in general.
>
> Thanks,
>
> Stephen.
>


Re: Alternative Decommission Monitor Implementation

2020-02-04 Thread Wei-Chiu Chuang
@Akira Ajisaka  you said you'd be interested, right?
Are you planning to adopt this feature?

On Tue, Feb 4, 2020 at 10:41 AM Stephen O'Donnell
 wrote:

> Hi All,
>
> In https://issues.apache.org/jira/browse/HDFS-14854 we committed a new
> decommission monitor to trunk, which is disabled by default. The new
> implementation hopes to be an improvement over the original monitor, but
> without running it on a real cluster it is hard to know for sure.
>
> I would like to ask if anyone has tried to use this new monitor in a
> production cluster, and if so did you find any problems or did it work as
> expected etc? I would be very interested in any feedback from anyone who
> has tried to use it.
>
> Thanks,
>
> Stephen.
>


[jira] [Created] (HDFS-15153) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT fails intermittently

2020-02-04 Thread Chen Liang (Jira)
Chen Liang created HDFS-15153:
-

 Summary: 
TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT fails 
intermittently
 Key: HDFS-15153
 URL: https://issues.apache.org/jira/browse/HDFS-15153
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chen Liang
Assignee: Chen Liang


The unit TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT is 
failing consistently. Seems this is due to a log message change. We should fix 
it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [VOTE] Apache Hadoop Ozone 0.4.2-alpha RC0

2020-02-04 Thread Dinesh Chitlangia
The vote ended with lone vote of 0 from Marton.

As highlighted by Marton, artifacts with Snapshot dependencies can't be
uploaded to Maven Central, we will roll out a new RC after ratis-0.5 has
been released.
Issue of missing README will also be addressed in the new RC.

Thank you.
Dinesh

On Thu, Jan 30, 2020 at 9:27 AM Elek, Marton  wrote:

>
>
> Thank you very much to work on this release Dinesh (and sorry to check
> it last minute).
>
> * I checked the package, the signatures, sha sums, place of the LICENSE
> files and all looks good.
>
> I can build it from the source (with a workaround, see later) and start
> the smoke tests: it works well, I executed the (standard) smoketests and
> they are all passed.
>
> I have two minor questions:
>
>   1. README seems to be missing from the src package, which is required
> to build the package. (It can't be built without 'touch README.md')
>
>   2. We still use fixed snapshot from ratis. I am not sure how important
> is (as we are still in alpha). Artifacts with SNAPSHOT dependencies
> can't be uploaded to the Maven Central.
>
> I am not sure if they are blockers. Would be interested about other's
> opinion...
>
> Marton
>
> On 1/24/20 1:08 AM, Dinesh Chitlangia wrote:
> > Hi Folks,
> >
> > We have put together RC0 for Apache Hadoop Ozone 0.4.2-alpha.
> >
> > The RC artifacts are at:
> > https://home.apache.org/~dineshc/ozone-0.4.2-alpha-rc0/
> >
> > The public key used for signing the artifacts can be found at:
> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> >
> > The maven artifacts are staged at:
> > https://repository.apache.org/content/repositories/orgapachehadoop-1256
> >
> > The RC tag in git is at:
> > https://github.com/apache/hadoop-ozone/tree/ozone-0.4.2-alpha-RC0
> >
> > This release contains 671 fixes/improvements [1].
> > Thanks to everyone who put in the effort to make this happen.
> >
> > *The vote will run for 7 days, ending on Jan 30th 2019 at 11:59 pm PST.*
> > Note: This release is alpha quality, it’s not recommended to use in
> > production but we believe that it’s stable enough to try out the feature
> > set and collect feedback.
> >
> >
> > [1] https://s.apache.org/ozone-0.4.2-fixed-issues
> >
> > Thanks,
> > Dinesh
> >
>
> -
> To unsubscribe, e-mail: ozone-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: ozone-dev-h...@hadoop.apache.org
>
>