from:"Duo Zhang"

[DISCUSS] Move our official slack channel to the one in the-asf.slack.com

2024-07-07 Thread Duo Zhang

As I mentioned in another thread, now slack will hide the comments
before 90 days in the current apache-hbase.slack.com, which is really
not good for finding useful discussions.

According to the documentation here

https://infra.apache.org/slack.html

We could invite people which do not have at apache dot org email
address as a guest to the slack channel, so there is no concerns about
only committers can join the channel.

Thoughts?

Thanks.

Re: [ANNOUNCE] Please welcome Pankaj Kumar to the HBase PMC

2024-07-03 Thread Duo Zhang

Congratulations!

Viraj Jasani  于2024年7月3日周三 12:36写道：
>
> On behalf of the Apache HBase PMC I am pleased to announce that Pankaj
> Kumar has accepted our invitation to become a PMC member on the Apache
> HBase project. We appreciate Pankaj Kumar stepping up to take more
> responsibility in the HBase project.
>
> Please join me in welcoming Pankaj Kumar to the HBase PMC!

[NOTICE] The minimum supported java version will be 17 starting from 3.0.0

2024-06-18 Thread Duo Zhang

After several rounds of discussions[1][2], we finally decided to set
the minimum supported java version for HBase 3.0.0+ to 17.

Notice that, we will keep HBase 2.x to work with both java 8, java 11
and java 17, so it is suggested to upgrade to use java 17 first on
HBase 2.x, and then upgrade to HBase 3.x.

If you have any questions/difficulties, feel free to contact us
through mailing list and open jira issues.

Thanks.

 below are Chinese version 以下是中文版本 

经过多轮讨论[1][2]，我们最终决定将 HBase 3.0.0+ 的最小支持 java 版本设定为 17。

注意，我们会确保 HBase 2.x 版本在 java 8，java 11，java 17 下都能够运行，所以我们建议用户首先在 HBase
2.x 上升级为使用 java 17，然后再升级到 HBase 3.x。

如果你有任何问题或遇到任何困难，请随时使用邮件列表联系我们或者创建 jira issue 来描述具体的问题。

谢谢！

1. https://lists.apache.org/thread/y0pc3n1go0t26hjnp53dwcnkfxhffqx5
2. https://lists.apache.org/thread/n5ln0g2cgq7qm4ryp2gqwdbfrnh0jkyb

[NOTICE] The minimum supported java version will be 17 starting from 3.0.0

2024-06-18 Thread Duo Zhang

After several rounds of discussions[1][2], we finally decided to set
the minimum supported java version for HBase 3.0.0+ to 17.

Notice that, we will keep HBase 2.x to work with both java 8, java 11
and java 17, so it is suggested to upgrade to use java 17 first on
HBase 2.x, and then upgrade to HBase 3.x.

If you have any questions/difficulties, feel free to contact us
through mailing list and open jira issues.

Thanks.

 below are Chinese version 以下是中文版本 

经过多轮讨论[1][2]，我们最终决定将 HBase 3.0.0+ 的最小支持 java 版本设定为 17。

注意，我们会确保 HBase 2.x 版本在 java 8，java 11，java 17 下都能够运行，所以我们建议用户首先在 HBase
2.x 上升级为使用 java 17，然后再升级到 HBase 3.x。

如果你有任何问题或遇到任何困难，请随时使用邮件列表联系我们或者创建 jira issue 来描述具体的问题。

谢谢！

1. https://lists.apache.org/thread/y0pc3n1go0t26hjnp53dwcnkfxhffqx5
2. https://lists.apache.org/thread/n5ln0g2cgq7qm4ryp2gqwdbfrnh0jkyb

Re: [DISCUSS] Bump minimum support jdk version to 11 or 17 for HBase 3.x

2024-06-12 Thread Duo Zhang

Thanks Udo.

In the previous discussion thread, we also discussed this topic.

In general, we do not expect, and also do not suggest our users to
upgrade major hbase version and jdk version at the same time.

We will make branch-2.x to work with jdk8, jdk11 and jdk17, so users
could upgrade jdk version first on 2.x, and then try to upgrade to
3.x. Usually, moving from jdk8 to jdk11 will be a huge pain, but
moving from jdk11 to jdk17 will be much easier.

Udo Offermann  于2024年6月10日周一 03:04写道：
>
> My opinion on this: If you choose jdk 11 as the minimum, it will be easier 
> for many to migrate from HBase 2.3+ to 3.0.
>
> Best Regards
>
> > Am 09.06.2024 um 16:08 schrieb Duo Zhang :
> >
> > HBase 3.0.0-beta-2 is about to come, it will be the last beta release
> > and then we will release our first 3.x GA HBase.
> >
> > There is a discussion thread about dropping jdk8 support in HBase
> > 3.x[1], we all agree to do this but there is no consensus on which jdk
> > version should be the minimum supported version yet, although there
> > were some suggestions on jumping to jdk17 directly.
> >
> > So here we open a new discussion thread to disucssion which version to
> > choose, jdk 11 or jdk 17?
> >
> > Suggestions are welcome.
> >
> > Thanks.
> >
> > 1. https://lists.apache.org/thread/y0pc3n1go0t26hjnp53dwcnkfxhffqx5
>

[DISCUSS] Bump minimum support jdk version to 11 or 17 for HBase 3.x

2024-06-09 Thread Duo Zhang

HBase 3.0.0-beta-2 is about to come, it will be the last beta release
and then we will release our first 3.x GA HBase.

There is a discussion thread about dropping jdk8 support in HBase
3.x[1], we all agree to do this but there is no consensus on which jdk
version should be the minimum supported version yet, although there
were some suggestions on jumping to jdk17 directly.

So here we open a new discussion thread to disucssion which version to
choose, jdk 11 or jdk 17?

Suggestions are welcome.

Thanks.

1. https://lists.apache.org/thread/y0pc3n1go0t26hjnp53dwcnkfxhffqx5

Re: Intermittent Null Data Returns Despite Data Presence in HBase

2024-06-05 Thread Duo Zhang

It seems like some regions are assigned to at least 2 region servers,
so you will get different results if you connect to different region
servers while trying to fetch the same row.

Usually disabling all the tables and then enabling them can fix the problem.

And hbase-1.x has been EOL for about two years, please consider
upgrading to recent hbase versions like 2.5.x and 2.6.x.

Thanks.

Roshan  于2024年6月5日周三 14:08写道：
>
> Dear HBase Community,
>
> We are experiencing an intermittent issue in our HBase cluster (version
> 1.4.14, HDFS 2.7.3, Zookeeper 3.4.10, 9 region servers, 2 masters).
>
> Issue Details:
>
>- Symptoms: Get operations intermittently return null for certain row
>keys despite data presence.
>- Duration: The issue persisted for two days and resolved on its own
>without intervention.
>- Timeline:
>   - Event: Full GC occurred on a region server.
>   - Action: Restarted the region server, leading to region assignment
>   issues.
>   - Troubleshooting: Used hbck fixAssignments but issues persisted.
>   Eventually restarted all region servers, stabilizing the cluster.
>   - Post-Stabilization: For two days, random Get queries returned null,
>   with no exceptions in region server, master, Zookeeper, or client logs.
>   - Resolution: Issue resolved itself after two days.
>
> Logs Reviewed:
>
>- Searched for keywords "WAL", "HLog", "flush", "replay", "corruption"
>in region server and master logs.
>- Checked Zookeeper logs for connectivity issues.
>
> Questions:
>
>1. What could cause intermittent null returns despite data presence?
>2. Are there specific WAL or region server configurations to check?
>3. What additional logs or steps should we review?
>
> Any guidance would be appreciated.
>
> Regards, Roshan B

[ANNOUNCE] New HBase committer Andor Molnár

2024-05-29 Thread Duo Zhang

On behalf of the Apache HBase PMC, I am pleased to announce that
Andor Molnár(andor) has accepted the PMC's invitation to become a
committer on the project. We appreciate all of Andor Molnár's
generous contributions thus far and look forward to his continued
involvement.

Congratulations and welcome, Andor Molnár!

我很高兴代表 Apache HBase PMC 宣布 Andor Molnár 已接受我们的邀请，成
为 Apache HBase 项目的 Committer。感谢 Andor Molnár 一直以来为 HBase 项目
做出的贡献，并期待他在未来继续承担更多的责任。

欢迎 Andor Molnár！

[ANNOUNCE] New HBase committer Andor Molnár

2024-05-29 Thread Duo Zhang

On behalf of the Apache HBase PMC, I am pleased to announce that
Andor Molnár(andor) has accepted the PMC's invitation to become a
committer on the project. We appreciate all of Andor Molnár's
generous contributions thus far and look forward to his continued
involvement.

Congratulations and welcome, Andor Molnár!

我很高兴代表 Apache HBase PMC 宣布 Andor Molnár 已接受我们的邀请，成
为 Apache HBase 项目的 Committer。感谢 Andor Molnár 一直以来为 HBase 项目
做出的贡献，并期待他在未来继续承担更多的责任。

欢迎 Andor Molnár！

[ANNOUNCE] Apache HBase 2.4.18 is now available for download

2024-05-25 Thread Duo Zhang

The HBase team is happy to announce the immediate availability of HBase
2.4.18.

Apache HBase™ is an open-source, distributed, versioned, non-relational
database. Apache HBase gives you low latency random access to billions of rows
with millions of columns atop non-specialized hardware. To learn more about
HBase, see https://hbase.apache.org/.

HBase 2.4.18 is the last patch release for the HBase 2.4.x release line. It
includes 186 resolved issues since 2.4.17. Users on 2.4.x release line are
encouraged to upgrade to our current stable release line 2.5.x if you want
more official support time, or try our newest minor release line 2.6.x if you
want to try the new features like TLS.

The full list of issues and release notes can be found here:

CHANGELOG: https://downloads.apache.org/hbase/2.4.18/CHANGES.md
RELEASENOTES: https://downloads.apache.org/hbase/2.4.18/RELEASENOTES.md

or via our issue tracker:

  https://issues.apache.org/jira/projects/HBASE/versions/12353080

To download please follow the links and instructions on our website:

  https://hbase.apache.org/downloads.html

Questions, comments, and problems are always welcome at:

  d...@hbase.apache.org
  u...@hbase.apache.org
  user-zh@hbase.apache.org

Thanks to all who contributed and made this release possible.

Cheers,
The HBase Dev Team

[ANNOUNCE] Apache HBase 2.4.18 is now available for download

2024-05-25 Thread Duo Zhang

The HBase team is happy to announce the immediate availability of HBase
2.4.18.

Apache HBase™ is an open-source, distributed, versioned, non-relational
database. Apache HBase gives you low latency random access to billions of rows
with millions of columns atop non-specialized hardware. To learn more about
HBase, see https://hbase.apache.org/.

HBase 2.4.18 is the last patch release for the HBase 2.4.x release line. It
includes 186 resolved issues since 2.4.17. Users on 2.4.x release line are
encouraged to upgrade to our current stable release line 2.5.x if you want
more official support time, or try our newest minor release line 2.6.x if you
want to try the new features like TLS.

The full list of issues and release notes can be found here:

CHANGELOG: https://downloads.apache.org/hbase/2.4.18/CHANGES.md
RELEASENOTES: https://downloads.apache.org/hbase/2.4.18/RELEASENOTES.md

or via our issue tracker:

  https://issues.apache.org/jira/projects/HBASE/versions/12353080

To download please follow the links and instructions on our website:

  https://hbase.apache.org/downloads.html

Questions, comments, and problems are always welcome at:

  d...@hbase.apache.org
  user@hbase.apache.org
  user...@hbase.apache.org

Thanks to all who contributed and made this release possible.

Cheers,
The HBase Dev Team

Re: [ANNOUNCE] Apache HBase 2.6.0 is now available for download

2024-05-20 Thread Duo Zhang

Congratulations!

But it seems you missed the replacement for the first '_version_' placeholder...

Bryan Beaudreault  于2024年5月21日周二 00:44写道：
>
> The HBase team is happy to announce the immediate availability of HBase
> _version_.
>
> Apache HBase™ is an open-source, distributed, versioned, non-relational
> database.
> Apache HBase gives you low latency random access to billions of rows with
> millions of columns atop non-specialized hardware. To learn more about
> HBase,
> see https://hbase.apache.org/.
>
> HBase 2.6.0 is the 1st release in the HBase 2.6.x line, which aims to
> improve the stability and reliability of HBase. This release includes
> roughly
> 560 resolved issues not covered by previous 2.x releases.
>
> Notable new features include:
> - Built-in support for full and incremental backups
> - Built-in support for TLS encryption and authentication
> - Erasure Coding support
> - Various improvements to Quotas
>
> The full list of issues can be found here:
>
> CHANGELOG: https://downloads.apache.org/hbase/2.6.0/CHANGES.md
> RELEASENOTES: https://downloads.apache.org/hbase/2.6.0/RELEASENOTES.md
>
> or via our issue tracker:
>   https://issues.apache.org/jira/projects/HBASE/versions/12350930
>
> To download please follow the links and instructions on our website:
>
> https://hbase.apache.org/downloads.html
>
> Question, comments, and problems are always welcome at:
>   d...@hbase.apache.org
>   user@hbase.apache.org
>   user...@hbase.apache.org
>
> Thanks to all who contributed and made this release possible.
>
> Cheers,
> The HBase Dev Team

Re: [ANNOUNCE] Apache HBase 2.6.0 is now available for download

2024-05-20 Thread Duo Zhang

Congratulations!

But it seems you missed the replacement for the first '_version_' placeholder...

Bryan Beaudreault  于2024年5月21日周二 00:44写道：
>
> The HBase team is happy to announce the immediate availability of HBase
> _version_.
>
> Apache HBase™ is an open-source, distributed, versioned, non-relational
> database.
> Apache HBase gives you low latency random access to billions of rows with
> millions of columns atop non-specialized hardware. To learn more about
> HBase,
> see https://hbase.apache.org/.
>
> HBase 2.6.0 is the 1st release in the HBase 2.6.x line, which aims to
> improve the stability and reliability of HBase. This release includes
> roughly
> 560 resolved issues not covered by previous 2.x releases.
>
> Notable new features include:
> - Built-in support for full and incremental backups
> - Built-in support for TLS encryption and authentication
> - Erasure Coding support
> - Various improvements to Quotas
>
> The full list of issues can be found here:
>
> CHANGELOG: https://downloads.apache.org/hbase/2.6.0/CHANGES.md
> RELEASENOTES: https://downloads.apache.org/hbase/2.6.0/RELEASENOTES.md
>
> or via our issue tracker:
>   https://issues.apache.org/jira/projects/HBASE/versions/12350930
>
> To download please follow the links and instructions on our website:
>
> https://hbase.apache.org/downloads.html
>
> Question, comments, and problems are always welcome at:
>   d...@hbase.apache.org
>   u...@hbase.apache.org
>   user-zh@hbase.apache.org
>
> Thanks to all who contributed and made this release possible.
>
> Cheers,
> The HBase Dev Team

HBase Quarterly report Jan-Mar 2024

2024-05-18 Thread Duo Zhang

Hi all,

HBase submits a report to the ASF board once a quarter, to inform the board
about project health. I'm sending the report to the user@ and dev@ mailing
lists because you are the project, and for transparency. If you have any
questions about the report or the running of the project, you can post them
to any PMC member or committer, or send an email to priv...@hbase.apache.org,
which every PMC member subscribes to.

## Description:
Apache HBase is an open-source, distributed, versioned, non-relational
database. Apache HBase gives you low latency random access to billions of rows
with millions of columns atop non-specialized hardware.
hbase-thirdparty is a set of internal artifacts used by the project to
mitigate the impact of our dependency choices on the wider ecosystem.
hbase-connectors is a collection of integration points with other projects.
The initial release includes artifacts for use with Apache Kafka and Apache
Spark.
hbase-filesystem contains HBase project-specific implementations of the Apache
Hadoop FileSystem API. It is currently experimental and internal to the
project.
hbase-operator-tools is a collection of tools for HBase operators. Now it is
mainly for hosting HBCK2.
hbase-native-client is a client library in C/C++, in its early days.
hbase-kustomize is for deploying HBase on kubernetes, still under development.


## Project Status:
Current project status: ongoing
Issues for the board:
While resolving the issue about adding 'TM' superscript on hbase.a.o, the
trademark team said it should be '®' instead of 'TM', and also wanted to add
some notes on the footer of the page.
The notes have already been presented on the index page of hbase.a.o, we
replied to the email to ask whether this is enough but got no response yet. No
action have been made yet.


## Membership Data:
Apache HBase was founded 2010-04-21 (14 years ago)
There are currently 105 committers and 59 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:4.

Community changes, past quarter:
- No new PMC members. Last addition was Bryan Beaudreault on 2023-10-17.
- Istvan Toth was added as committer on 2024-04-02


## Project Activity:
2.5.8 was released on 2024-03-12.
hbase-thirdparty-4.1.6 was released on 2024-03-04.
3.0.0-beta-1 was released on 2024-01-14.


We have started the vote for releasing 2.6.0. The first RC sunk because of a
compatibility issue, and then we faced the xz backdoor problem. Now we are
still discussing how to deal with the hbase-compression-xz module.
https://lists.apache.org/thread/g26lgy9z840ovo02vq75hys6krbmvxon
https://lists.apache.org/thread/on62z40rwotrcc8w1l5n55rd4zldho5g


3.0.0-beta-1 has been released. This is the first beta release for 3.0.0
release line.
https://lists.apache.org/thread/lq19rwgy7q668ps8b4lz53my8m16t3yc

For 3.0.0-beta-2, we found a place where we still leaked the internal zookeeper
to end users, so we are still discussing how to deal with it. It needs to
allow users to use something like a URL/URI for specifying the hbase cluster
they want to connect to, instead of zookeeper connection string and path.
https://lists.apache.org/thread/233p65w5hj80m9plzjgvr8v16zff2d7y


## Community Health:
d...@hbase.apache.org:
962 subscribers(960 in the previous quarter)
519 emails sent to list(420 in the previous quarter)

u...@hbase.apache.org:
1991 subscribers(1986 in the previous quarter)
36 emails sent to list(31 in the previous quarter)

user-zh@hbase.apache.org
79 subscribers(77 in the previous quarter)
11 emails sent to list(17 in the previous quarter)


- JIRA activity:
187 issues opened in JIRA, past quarter (20% increase)
132 issues closed in JIRA, past quarter (8% increase)


- Commit activity:
568 commits in the past quarter (21% increase)
41 code contributors in the past quarter (2% increase)


- GitHub PR activity:
193 PRs opened on GitHub, past quarter (19% increase)
183 PRs closed on GitHub, past quarter (24% increase)


The community is overall healthy. We will release 2.6.0 soon, and work towards
the 3.0.0-beta-2 release.

HBase Quarterly report Jan-Mar 2024

2024-05-18 Thread Duo Zhang

Hi all,

HBase submits a report to the ASF board once a quarter, to inform the board
about project health. I'm sending the report to the user@ and dev@ mailing
lists because you are the project, and for transparency. If you have any
questions about the report or the running of the project, you can post them
to any PMC member or committer, or send an email to priv...@hbase.apache.org,
which every PMC member subscribes to.

## Description:
Apache HBase is an open-source, distributed, versioned, non-relational
database. Apache HBase gives you low latency random access to billions of rows
with millions of columns atop non-specialized hardware.
hbase-thirdparty is a set of internal artifacts used by the project to
mitigate the impact of our dependency choices on the wider ecosystem.
hbase-connectors is a collection of integration points with other projects.
The initial release includes artifacts for use with Apache Kafka and Apache
Spark.
hbase-filesystem contains HBase project-specific implementations of the Apache
Hadoop FileSystem API. It is currently experimental and internal to the
project.
hbase-operator-tools is a collection of tools for HBase operators. Now it is
mainly for hosting HBCK2.
hbase-native-client is a client library in C/C++, in its early days.
hbase-kustomize is for deploying HBase on kubernetes, still under development.


## Project Status:
Current project status: ongoing
Issues for the board:
While resolving the issue about adding 'TM' superscript on hbase.a.o, the
trademark team said it should be '®' instead of 'TM', and also wanted to add
some notes on the footer of the page.
The notes have already been presented on the index page of hbase.a.o, we
replied to the email to ask whether this is enough but got no response yet. No
action have been made yet.


## Membership Data:
Apache HBase was founded 2010-04-21 (14 years ago)
There are currently 105 committers and 59 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:4.

Community changes, past quarter:
- No new PMC members. Last addition was Bryan Beaudreault on 2023-10-17.
- Istvan Toth was added as committer on 2024-04-02


## Project Activity:
2.5.8 was released on 2024-03-12.
hbase-thirdparty-4.1.6 was released on 2024-03-04.
3.0.0-beta-1 was released on 2024-01-14.


We have started the vote for releasing 2.6.0. The first RC sunk because of a
compatibility issue, and then we faced the xz backdoor problem. Now we are
still discussing how to deal with the hbase-compression-xz module.
https://lists.apache.org/thread/g26lgy9z840ovo02vq75hys6krbmvxon
https://lists.apache.org/thread/on62z40rwotrcc8w1l5n55rd4zldho5g


3.0.0-beta-1 has been released. This is the first beta release for 3.0.0
release line.
https://lists.apache.org/thread/lq19rwgy7q668ps8b4lz53my8m16t3yc

For 3.0.0-beta-2, we found a place where we still leaked the internal zookeeper
to end users, so we are still discussing how to deal with it. It needs to
allow users to use something like a URL/URI for specifying the hbase cluster
they want to connect to, instead of zookeeper connection string and path.
https://lists.apache.org/thread/233p65w5hj80m9plzjgvr8v16zff2d7y


## Community Health:
d...@hbase.apache.org:
962 subscribers(960 in the previous quarter)
519 emails sent to list(420 in the previous quarter)

user@hbase.apache.org:
1991 subscribers(1986 in the previous quarter)
36 emails sent to list(31 in the previous quarter)

user...@hbase.apache.org
79 subscribers(77 in the previous quarter)
11 emails sent to list(17 in the previous quarter)


- JIRA activity:
187 issues opened in JIRA, past quarter (20% increase)
132 issues closed in JIRA, past quarter (8% increase)


- Commit activity:
568 commits in the past quarter (21% increase)
41 code contributors in the past quarter (2% increase)


- GitHub PR activity:
193 PRs opened on GitHub, past quarter (19% increase)
183 PRs closed on GitHub, past quarter (24% increase)


The community is overall healthy. We will release 2.6.0 soon, and work towards
the 3.0.0-beta-2 release.

Re: HBase Master hangs on startup during upgrade from 2.2.5 to 2.5.7

2024-05-04 Thread Duo Zhang

OK, so finally the problem is that we are not load snappy compression
so all the regions with snappy compression enabled can not online?

It is good that you finally find the root cause.

For the bin with or without hadoop3, the difference is the version of
the hadoop jars bundled in the tarball.

In HBase, we make use of some internal classes of hadoop, so even if
we have done some reflection works to make sure the code can work with
different hadoop versions, drop in replacement is impossible,
especially if you replace hadoop2 jars with hadoop3, you will hit some
errors when starting HBase cluster.

So starting from hbase 2.5.x, we decided to publish two types of
tarballs, the tarballs without hadoop3 are built with hadoop2,
typically hadoop 2.10.2, and the tarballs with hadoop3 are built with
hadoop3, for branch-2.5 it is hadopp 3.2.4, for the up coming 2.6 it
is hadoop 3.3.5.

Notice that, for later branch-2.x releases we will keep this pattern
but for future hbase 3.x, we will not have special hadoop3 tarballs as
we have dropped hadoop2 support for hbase 3.x.

Thanks.

Udo Offermann  于2024年5月2日周四 22:13写道：
>
> The problem was actually with the Snappy codec or the native Snappy 
> libraries. After configuring the Snappy
> Java implementation, the cluster started without any problems.
>
> I have a final question regarding the Hbase distributions. Can you please 
> tell me the difference between the distributions:
> bin: https://www.apache.org/dyn/closer.lua/hbase/2.5.8/hbase-2.5.8-bin.tar.gz
> and
> hadoop3-bin: 
> https://www.apache.org/dyn/closer.lua/hbase/2.5.8/hbase-2.5.8-hadoop3-bin.tar.gz
>
> I can't find a description of this. The same applies to the client libraries 
> client-bin and hadoop3-client-bin.
>
>
> Best regards
> Udo
>
>
>
> > Am 30.04.2024 um 04:42 schrieb 张铎(Duo Zhang) :
> >
> > Oh, there is a typo, I mean the ServerCrashProcedure should not block other
> > procedures if it is in claim replication queue stage.
> >
> > 张铎(Duo Zhang) 于2024年4月30日 周二10:41写道：
> >
> >> Sorry to be a pain as the procedure store is a big problem before HBase
> >> 2.3 so we have done a big refactoring on HBase 2.3+ so we have a migration
> >> which makes the upgrading a bit complicated.
> >>
> >> And on the upgrading, you do not need to mix up HBase and Hadoop, you can
> >> upgrading them separately. Second, rolling upgrading is also a bit
> >> complicated, so I suggest you try fully down/up upgrading first, if you
> >> have successfully done an upgrading, then you can start to try rolling
> >> upgrading.
> >>
> >> To your scenario, I suggest, you first upgrading Hadoop, including
> >> namenode and datanode, HBase should be functional after the upgrading. And
> >> then, as discussed above, turn off the balancer, view the master page to
> >> make sure there are no RITs and no procedures, then shutdown master, and
> >> then shutdown all the region servers. And then, start master(do not need to
> >> wait the master finishes start up, as it relies on meta region online,
> >> where we must have at least one region server), and then all the region
> >> servers, to see if the cluster can go back to normal.
> >>
> >> On the ServerCrashProcedure, it is blocked in claim replication queue,
> >> which should be blocked other procedures as the region assignment should
> >> have already been finished. Does your cluster has replication peers? If
> >> not, it is a bit strange that why your procedure is blocked in the claim
> >> replication queue stage…
> >>
> >> Thanks.
> >>
> >> Udo Offermann 于2024年4月29日 周一21:26写道：
> >>
> >>> This time we made progress.
> >>> I first upgraded the Master Hadoop and HBase wise (after making sure that
> >>> there are no regions in transition and no running procedures) with keeping
> >>> Zookeeper running. Master was started with new version 2.8.5 telling that
> >>> there are 6 nodes with inconsistent version (what was to be expected). Now
> >>> the startup process completes with "Starting cluster schema service
> >>> COMPLETE“,
> >>> all regions were assigned and the cluster seemed to be stable.
> >>>
> >>> Again there were no regions in transitions and no procedures running and
> >>> so I started to upgrade the data nodes one by one.
> >>> The problem now is that the new region servers are not assigned regions
> >>> except of 3: hbase:namespace, hbase:meta and one of our application level
> >>> tables (which is empty most of the time).
> >>> The

Re: HBase Master hangs on startup during upgrade from 2.2.5 to 2.5.7

2024-04-29 Thread Duo Zhang

Oh, there is a typo, I mean the ServerCrashProcedure should not block other
procedures if it is in claim replication queue stage.

张铎(Duo Zhang) 于2024年4月30日 周二10:41写道：

> Sorry to be a pain as the procedure store is a big problem before HBase
> 2.3 so we have done a big refactoring on HBase 2.3+ so we have a migration
> which makes the upgrading a bit complicated.
>
> And on the upgrading, you do not need to mix up HBase and Hadoop, you can
> upgrading them separately. Second, rolling upgrading is also a bit
> complicated, so I suggest you try fully down/up upgrading first, if you
> have successfully done an upgrading, then you can start to try rolling
> upgrading.
>
> To your scenario, I suggest, you first upgrading Hadoop, including
> namenode and datanode, HBase should be functional after the upgrading. And
> then, as discussed above, turn off the balancer, view the master page to
> make sure there are no RITs and no procedures, then shutdown master, and
> then shutdown all the region servers. And then, start master(do not need to
> wait the master finishes start up, as it relies on meta region online,
> where we must have at least one region server), and then all the region
> servers, to see if the cluster can go back to normal.
>
> On the ServerCrashProcedure, it is blocked in claim replication queue,
> which should be blocked other procedures as the region assignment should
> have already been finished. Does your cluster has replication peers? If
> not, it is a bit strange that why your procedure is blocked in the claim
> replication queue stage…
>
> Thanks.
>
> Udo Offermann 于2024年4月29日 周一21:26写道：
>
>> This time we made progress.
>> I first upgraded the Master Hadoop and HBase wise (after making sure that
>> there are no regions in transition and no running procedures) with keeping
>> Zookeeper running. Master was started with new version 2.8.5 telling that
>> there are 6 nodes with inconsistent version (what was to be expected). Now
>> the startup process completes with "Starting cluster schema service
>>  COMPLETE“,
>>  all regions were assigned and the cluster seemed to be stable.
>>
>> Again there were no regions in transitions and no procedures running and
>> so I started to upgrade the data nodes one by one.
>> The problem now is that the new region servers are not assigned regions
>> except of 3: hbase:namespace, hbase:meta and one of our application level
>> tables (which is empty most of the time).
>> The more data nodes I migrated, the more regions were accumulated on the
>> nodes running the old version until the last old data node has managed all
>> regions except for 3.
>>
>>
>>
>> After all regions have been transitioned I migrated the last node which
>> yields that all regions are in transition and look like this one:
>>
>> 21852184WAITING_TIMEOUT seritrack
>>  TransitRegionStateProcedure table=tt_items,
>> region=d7a411647663dd9e0fc972c7e14088a5, ASSIGN Mon Apr 29 14:12:36
>> CEST 2024   Mon Apr 29 14:59:44 CEST 2024   pid=2185, ppid=2184,
>> state=WAITING_TIMEOUT:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE,
>> locked=true; TransitRegionStateProcedure table=tt_items,
>> region=d7a411647663dd9e0fc972c7e14088a5, ASSIGN
>>
>> They are all waiting on this one:
>>
>> 2184WAITING seritrack   ServerCrashProcedure
>> datanode06ct.gmd9.intern,16020,1714378085579   Mon Apr 29 14:12:36 CEST
>> 2024   Mon Apr 29 14:12:36 CEST 2024   pid=2184,
>> state=WAITING:SERVER_CRASH_CLAIM_REPLICATION_QUEUES, locked=true;
>> ServerCrashProcedure datanode06ct.gmd9.intern,16020,1714378085579,
>> splitWal=true, meta=false
>>
>> Again „ServerCrashProcedure“! Why are they not processed?
>> Why is it so hard to upgrade the cluster? Is it worthwhile to take the
>> next stable version 2.5.8?
>> And - btw- what is the difference between the two distributions „bin“ and
>> „hadoop3-bin“?
>>
>> Best regards
>> Udo
>>
>>
>>
>>
>>
>> > Am 28.04.2024 um 03:03 schrieb 张铎(Duo Zhang) :
>> >
>> > Better turn it off, and observe the master page until there is no RITs
>> > and no other procedures, then call hbase-daemon.sh stop master, and
>> > then hbase-daemon.sh stop regionserver.
>> >
>> > I'm not 100% sure about the shell command, you'd better search try it
>> > by yourself. The key here is to stop master first and make sure there
>> > is no procedure, so we can safely remove MasterProcWALs, and then stop
>> > all region servers.
>> >
>> > Thanks.
&

Re: HBase Master hangs on startup during upgrade from 2.2.5 to 2.5.7

2024-04-29 Thread Duo Zhang

Sorry to be a pain as the procedure store is a big problem before HBase 2.3
so we have done a big refactoring on HBase 2.3+ so we have a migration
which makes the upgrading a bit complicated.

And on the upgrading, you do not need to mix up HBase and Hadoop, you can
upgrading them separately. Second, rolling upgrading is also a bit
complicated, so I suggest you try fully down/up upgrading first, if you
have successfully done an upgrading, then you can start to try rolling
upgrading.

To your scenario, I suggest, you first upgrading Hadoop, including namenode
and datanode, HBase should be functional after the upgrading. And then, as
discussed above, turn off the balancer, view the master page to make sure
there are no RITs and no procedures, then shutdown master, and then
shutdown all the region servers. And then, start master(do not need to wait
the master finishes start up, as it relies on meta region online, where we
must have at least one region server), and then all the region servers, to
see if the cluster can go back to normal.

On the ServerCrashProcedure, it is blocked in claim replication queue,
which should be blocked other procedures as the region assignment should
have already been finished. Does your cluster has replication peers? If
not, it is a bit strange that why your procedure is blocked in the claim
replication queue stage…

Thanks.

Udo Offermann 于2024年4月29日 周一21:26写道：

> This time we made progress.
> I first upgraded the Master Hadoop and HBase wise (after making sure that
> there are no regions in transition and no running procedures) with keeping
> Zookeeper running. Master was started with new version 2.8.5 telling that
> there are 6 nodes with inconsistent version (what was to be expected). Now
> the startup process completes with "Starting cluster schema service
>  COMPLETE“,
>  all regions were assigned and the cluster seemed to be stable.
>
> Again there were no regions in transitions and no procedures running and
> so I started to upgrade the data nodes one by one.
> The problem now is that the new region servers are not assigned regions
> except of 3: hbase:namespace, hbase:meta and one of our application level
> tables (which is empty most of the time).
> The more data nodes I migrated, the more regions were accumulated on the
> nodes running the old version until the last old data node has managed all
> regions except for 3.
>
>
>
> After all regions have been transitioned I migrated the last node which
> yields that all regions are in transition and look like this one:
>
> 21852184WAITING_TIMEOUT seritrack
>  TransitRegionStateProcedure table=tt_items,
> region=d7a411647663dd9e0fc972c7e14088a5, ASSIGN Mon Apr 29 14:12:36
> CEST 2024   Mon Apr 29 14:59:44 CEST 2024   pid=2185, ppid=2184,
> state=WAITING_TIMEOUT:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE,
> locked=true; TransitRegionStateProcedure table=tt_items,
> region=d7a411647663dd9e0fc972c7e14088a5, ASSIGN
>
> They are all waiting on this one:
>
> 2184WAITING seritrack   ServerCrashProcedure
> datanode06ct.gmd9.intern,16020,1714378085579   Mon Apr 29 14:12:36 CEST
> 2024   Mon Apr 29 14:12:36 CEST 2024   pid=2184,
> state=WAITING:SERVER_CRASH_CLAIM_REPLICATION_QUEUES, locked=true;
> ServerCrashProcedure datanode06ct.gmd9.intern,16020,1714378085579,
> splitWal=true, meta=false
>
> Again „ServerCrashProcedure“! Why are they not processed?
> Why is it so hard to upgrade the cluster? Is it worthwhile to take the
> next stable version 2.5.8?
> And - btw- what is the difference between the two distributions „bin“ and
> „hadoop3-bin“?
>
> Best regards
> Udo
>
>
>
>
>
> > Am 28.04.2024 um 03:03 schrieb 张铎(Duo Zhang) :
> >
> > Better turn it off, and observe the master page until there is no RITs
> > and no other procedures, then call hbase-daemon.sh stop master, and
> > then hbase-daemon.sh stop regionserver.
> >
> > I'm not 100% sure about the shell command, you'd better search try it
> > by yourself. The key here is to stop master first and make sure there
> > is no procedure, so we can safely remove MasterProcWALs, and then stop
> > all region servers.
> >
> > Thanks.
> >
> > Udo Offermann  于2024年4月26日周五 23:34写道：
> >>
> >> I know, but is it necessary or beneficial to turn it off - and if so -
> when?
> >> And what is your recommendation about stopping the region servers? Just
> >> hbase-daemon.sh stop regionserver
> >> or
> >> gracefull_stop.sh localhost
> >> ?
> >>
> >>> Am 26.04.2024 um 17:22 schrieb 张铎(Duo Zhang) :
> >>>
> >>> Turning off balancer is to make sure that the balancer will not
> >&

Re: HBase Master hangs on startup during upgrade from 2.2.5 to 2.5.7

2024-04-23 Thread Duo Zhang

Strange, I checked the code, it seems we get NPE on this line

https://github.com/apache/hbase/blob/4d7ce1aac724fbf09e526fc422b5a11e530c32f0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java#L2872

Could you please confirm that you connect to the correct active master
which is hanging? It seems that you are connecting the backup
master...

Thanks.

张铎(Duo Zhang)  于2024年4月23日周二 15:31写道：
>
> Ah, NPE usually means a code bug, then there is no simple way to fix
> it, need to take a deep look on the code :(
>
> Sorry.
>
> Udo Offermann  于2024年4月22日周一 15:32写道：
> >
> > Unfortunately not.
> > I’ve found the node hosting the meta region and was able to run hack 
> > scheduleRecoveries using hbase-operator-tools-1.2.0.
> > The tool however stops with an NPE:
> >
> > 09:22:00.532 [main] WARN  org.apache.hadoop.util.NativeCodeLoader - Unable 
> > to load native-hadoop library for your platform... using builtin-java 
> > classes where applicable
> > 09:22:00.703 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation 
> > - hbase.client.pause.cqtbe is deprecated. Instead, use 
> > hbase.client.pause.server.overloaded
> > 09:22:00.765 [ReadOnlyZKClient-HBaseMaster:2181@0x7d9f158f] INFO  
> > org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client 
> > environment:zookeeper.version=3.8.3-6ad6d364c7c0bcf0de452d54ebefa3058098ab56,
> >  built on 2023-10-05 10:34 UTC
> > 09:22:00.765 [ReadOnlyZKClient-HBaseMaster:2181@0x7d9f158f] INFO  
> > org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client 
> > environment:host.name=HBaseMaster.gmd9.intern
> > 09:22:00.765 [ReadOnlyZKClient-HBaseMaster:2181@0x7d9f158f] INFO  
> > org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client 
> > environment:java.version=1.8.0_402
> > 09:22:00.766 [ReadOnlyZKClient-HBaseMaster:2181@0x7d9f158f] INFO  
> > org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client 
> > environment:java.vendor=Red Hat, Inc.
> > 09:22:00.766 [ReadOnlyZKClient-HBaseMaster:2181@0x7d9f158f] INFO  
> > org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client 
> > environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.402.b06-2.el8.x86_64/jre
> > 09:22:00.766 [ReadOnlyZKClient-HBaseMaster:2181@0x7d9f158f] INFO  
> > org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client 
> > environment:java.class.path=hbase-operator-tools-1.2.0/hbase-hbck2/hbase-hbck2-1.2.0.jar:hbase/conf:/opt/seritrack/tt/jdk/lib/tools.jar:/opt/seritrack/tt/nosql/hbase:/opt/seritrack/tt/nosql/hbase/lib/shaded-clients/hbase-shaded-mapreduce-2.5.7.jar:/opt/seritrack/tt/nosql/hbase/lib/client-facing-thirdparty/audience-annotations-0.13.0.jar:/opt/seritrack/tt/nosql/hbase/lib/client-facing-thirdparty/commons-logging-1.2.jar:/opt/seritrack/tt/nosql/hbase/lib/client-facing-thirdparty/htrace-core4-4.1.0-incubating.jar:/opt/seritrack/tt/nosql/hbase/lib/client-facing-thirdparty/jcl-over-slf4j-1.7.33.jar:/opt/seritrack/tt/nosql/hbase/lib/client-facing-thirdparty/jul-to-slf4j-1.7.33.jar:/opt/seritrack/tt/nosql/hbase/lib/client-facing-thirdparty/opentelemetry-api-1.15.0.jar:/opt/seritrack/tt/nosql/hbase/lib/client-facing-thirdparty/opentelemetry-context-1.15.0.jar:/opt/seritrack/tt/nosql/hbase/lib/client-facing-thirdparty/opentelemetry-semconv-1.15.0-alpha.jar:/opt/seritrack/tt/nosql/hbase/lib/client-facing-thirdparty/slf4j-api-1.7.33.jar:/opt/seritrack/tt/nosql/hbase/lib/shaded-clients/hbase-shaded-client-2.5.7.jar:/opt/seritrack/tt/nosql/pl_nosql_ext/libs/pl_nosql_ext-3.0.jar:/opt/seritrack/tt/nosql/hbase/lib/client-facing-thirdparty/log4j-1.2-api-2.17.2.jar:/opt/seritrack/tt/nosql/hbase/lib/client-facing-thirdparty/log4j-api-2.17.2.jar:/opt/seritrack/tt/nosql/hbase/lib/client-facing-thirdparty/log4j-core-2.17.2.jar:/opt/seritrack/tt/nosql/hbase/lib/client-facing-thirdparty/log4j-slf4j-impl-2.17.2.jar:/opt/seritrack/tt/prometheus_exporters/jmx_exporter/jmx_prometheus_javaagent.jar
> > 09:22:00.766 [ReadOnlyZKClient-HBaseMaster:2181@0x7d9f158f] INFO  
> > org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client 
> > environment:java.library.path=/opt/seritrack/tt/nosql/hadoop/lib/native
> > 09:22:00.766 [ReadOnlyZKClient-HBaseMaster:2181@0x7d9f158f] INFO  
> > org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client 
> > environment:java.io.tmpdir=/tmp
> > 09:22:00.766 [ReadOnlyZKClient-HBaseMaster:2181@0x7d9f158f] INFO  
> > org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client 
> > environment:java.compiler=
> > 09:22:00.766 [ReadOnlyZKClient-HBaseMaster:2181@0x7d9f158f] INFO  
> > org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client 
&g

Re: HBase Master hangs on startup during upgrade from 2.2.5 to 2.5.7

2024-04-23 Thread Duo Zhang

lHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
> at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
> at 
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:750)
>
>
>
>
> > Am 20.04.2024 um 15:53 schrieb 张铎(Duo Zhang) :
> >
> > OK, it was waitForMetaOnline.
> >
> > Maybe the problem is that you do have some correct procedures before
> > upgrading, like ServerCrashProcedure, but then you delete all the
> > procedure wals so the ServerCrashProcedure is also gone, so meta can
> > never be online.
> >
> > Please check the /hbase/meta-region-server znode on zookeeper, dump
> > its content, it is protobuf based but anyway, you could see the
> > encoded server name which hosts meta region.
> >
> > Then use HBCK2, to schedule a SCP for this region server, to see if it
> > can fix the problem.
> >
> > https://github.com/apache/hbase-operator-tools/blob/master/hbase-hbck2/README.md
> >
> > This is the document for HBCK2, you should use the scheduleRecoveries 
> > command.
> >
> > Hope this could fix your problem.
> >
> > Thread 92 (master/masterserver:16000:becomeActiveMaster):
> >  State: TIMED_WAITING
> >  Blocked count: 165
> >  Waited count: 404
> >  Stack:
> >java.lang.Thread.sleep(Native Method)
> >org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:125)
> >org.apache.hadoop.hbase.master.HMaster.isRegionOnline(HMaster.java:1358)
> >
> > org.apache.hadoop.hbase.master.HMaster.waitForMetaOnline(HMaster.java:1328)
> >
> > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1069)
> >
> > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2405)
> >org.apache.hadoop.hbase.master.HMaster.lambda$null$0(HMaster.java:565)
> >
> > org.apache.hadoop.hbase.master.HMaster$$Lambda$265/1598878738.run(Unknown
> > Source)
> >org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:187)
> >org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:177)
> >org.apache.hadoop.hbase.master.HMaster.lambda$run$1(HMaster.java:562)
> >
> > org.apache.hadoop.hbase.master.HMaster$$Lambda$264/1129144214.run(Unknown
> > Source)
> >java.lang.Thread.run(Thread.java:750)
> >
> > Udo Offermann mailto:udo.offerm...@zfabrik.de>> 
> > 于2024年4月20日周六 21:13写道：
> >>
> >> Master status for masterserver.gmd9.intern,16000,1713515965162 as of Fri
> >> Apr 19 10:55:22 CEST 2024
> >>
> >>
> >> Version Info:
> >> ===
> >> HBase 2.5.7
> >> Source code repository
> >> git://buildbox.localdomain/home/apurtell/tmp/RM/hbase
> >> revision=6788f98356dd70b4a7ff766ea7a8298e022e7b95
> >> Compiled by apurtell on Thu Dec 14 15:59:16 PST 2023
> >> From source with checksum
> >> 1501d7fdf72398791ee335a229d099fc972cea7c2a952da7622eb087ddf975361f107cbbbee5d0ad6f603466e9afa1f4fd242ffccbd4371eb0b56059bb3b5402
> >> Hadoop 2.10.2
> >> Source code repository Unknown
> >> revision=965fd380006fa78b2315668fbc7eb43

Re: HBase Master hangs on startup during upgrade from 2.2.5 to 2.5.7

2024-04-21 Thread Duo Zhang

Before upgrading, disable the balancer, make sure there is no region
in transition, and there are no dead region servers currently being
processed, i.e, no ServerCrashProcedure.
This is to make sure that there is no procedure before shutting down,
so it is safe to just remove all the MasterProcWALs.

Then you can stop both masters, active and standby, remove the
MasterProcWALs if it is too big, and then start master with new code.

Usually in this way the new master could start successfully.

Thanks.

Udo Offermann  于2024年4月20日周六 23:32写道：
>
> Thank you, I can check on Monday.
>
> This is the upgrade of the test system and serves as training for the upgrade 
> of the production system. What do we need to do to prevent this problem?
>
> We had some problems starting zookeeper after the upgrade and I had to start 
> it with "zookeeper.snapshot.trust.empty=true“.
> BTW, is it ok to delete the zookeeper directory?
>
> Best regards
> Udo
>
>
> > Am 20.04.2024 um 15:53 schrieb 张铎(Duo Zhang) :
> >
> > OK, it was waitForMetaOnline.
> >
> > Maybe the problem is that you do have some correct procedures before
> > upgrading, like ServerCrashProcedure, but then you delete all the
> > procedure wals so the ServerCrashProcedure is also gone, so meta can
> > never be online.
> >
> > Please check the /hbase/meta-region-server znode on zookeeper, dump
> > its content, it is protobuf based but anyway, you could see the
> > encoded server name which hosts meta region.
> >
> > Then use HBCK2, to schedule a SCP for this region server, to see if it
> > can fix the problem.
> >
> > https://github.com/apache/hbase-operator-tools/blob/master/hbase-hbck2/README.md
> >
> > This is the document for HBCK2, you should use the scheduleRecoveries 
> > command.
> >
> > Hope this could fix your problem.
> >
> > Thread 92 (master/masterserver:16000:becomeActiveMaster):
> >  State: TIMED_WAITING
> >  Blocked count: 165
> >  Waited count: 404
> >  Stack:
> >java.lang.Thread.sleep(Native Method)
> >org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:125)
> >org.apache.hadoop.hbase.master.HMaster.isRegionOnline(HMaster.java:1358)
> >
> > org.apache.hadoop.hbase.master.HMaster.waitForMetaOnline(HMaster.java:1328)
> >
> > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1069)
> >
> > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2405)
> >org.apache.hadoop.hbase.master.HMaster.lambda$null$0(HMaster.java:565)
> >
> > org.apache.hadoop.hbase.master.HMaster$$Lambda$265/1598878738.run(Unknown
> > Source)
> >org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:187)
> >org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:177)
> >org.apache.hadoop.hbase.master.HMaster.lambda$run$1(HMaster.java:562)
> >
> > org.apache.hadoop.hbase.master.HMaster$$Lambda$264/1129144214.run(Unknown
> > Source)
> >java.lang.Thread.run(Thread.java:750)
> >
> > Udo Offermann mailto:udo.offerm...@zfabrik.de>> 
> > 于2024年4月20日周六 21:13写道：
> >>
> >> Master status for masterserver.gmd9.intern,16000,1713515965162 as of Fri
> >> Apr 19 10:55:22 CEST 2024
> >>
> >>
> >> Version Info:
> >> ===
> >> HBase 2.5.7
> >> Source code repository
> >> git://buildbox.localdomain/home/apurtell/tmp/RM/hbase
> >> revision=6788f98356dd70b4a7ff766ea7a8298e022e7b95
> >> Compiled by apurtell on Thu Dec 14 15:59:16 PST 2023
> >> From source with checksum
> >> 1501d7fdf72398791ee335a229d099fc972cea7c2a952da7622eb087ddf975361f107cbbbee5d0ad6f603466e9afa1f4fd242ffccbd4371eb0b56059bb3b5402
> >> Hadoop 2.10.2
> >> Source code repository Unknown
> >> revision=965fd380006fa78b2315668fbc7eb432e1d8200f
> >> Compiled by ubuntu on 2022-05-25T00:12Z
> >>
> >>
> >> Tasks:
> >> ===
> >> Task: Master startup
> >> Status: RUNNING:Starting assignment manager
> >> Running for 954s
> >>
> >> Task: Flushing master:store,,1.1595e783b53d99cd5eef43b6debb2682.
> >> Status: COMPLETE:Flush successful flush result:CANNOT_FLUSH_MEMSTORE_EMPTY,
> >> failureReason:Nothing to flush,flush seq id14
> >> Completed 49s ago
> >> Ran for 0s
> >>
> >> Task: RpcServer.priority.RWQ.Fifo.write.handler=0,queue=0,port=16000
> >> Status: WAITING:Waiting for a call
>

Re: HBase Master hangs on startup during upgrade from 2.2.5 to 2.5.7

2024-04-20 Thread Duo Zhang

OK, it was waitForMetaOnline.

Maybe the problem is that you do have some correct procedures before
upgrading, like ServerCrashProcedure, but then you delete all the
procedure wals so the ServerCrashProcedure is also gone, so meta can
never be online.

Please check the /hbase/meta-region-server znode on zookeeper, dump
its content, it is protobuf based but anyway, you could see the
encoded server name which hosts meta region.

Then use HBCK2, to schedule a SCP for this region server, to see if it
can fix the problem.

https://github.com/apache/hbase-operator-tools/blob/master/hbase-hbck2/README.md

This is the document for HBCK2, you should use the scheduleRecoveries command.

Hope this could fix your problem.

Thread 92 (master/masterserver:16000:becomeActiveMaster):
  State: TIMED_WAITING
  Blocked count: 165
  Waited count: 404
  Stack:
java.lang.Thread.sleep(Native Method)
org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:125)
org.apache.hadoop.hbase.master.HMaster.isRegionOnline(HMaster.java:1358)

org.apache.hadoop.hbase.master.HMaster.waitForMetaOnline(HMaster.java:1328)

org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1069)

org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2405)
org.apache.hadoop.hbase.master.HMaster.lambda$null$0(HMaster.java:565)

org.apache.hadoop.hbase.master.HMaster$$Lambda$265/1598878738.run(Unknown
Source)
org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:187)
org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:177)
org.apache.hadoop.hbase.master.HMaster.lambda$run$1(HMaster.java:562)

org.apache.hadoop.hbase.master.HMaster$$Lambda$264/1129144214.run(Unknown
Source)
java.lang.Thread.run(Thread.java:750)

Udo Offermann  于2024年4月20日周六 21:13写道：
>
> Master status for masterserver.gmd9.intern,16000,1713515965162 as of Fri
> Apr 19 10:55:22 CEST 2024
>
>
> Version Info:
> ===
> HBase 2.5.7
> Source code repository
> git://buildbox.localdomain/home/apurtell/tmp/RM/hbase
> revision=6788f98356dd70b4a7ff766ea7a8298e022e7b95
> Compiled by apurtell on Thu Dec 14 15:59:16 PST 2023
> From source with checksum
> 1501d7fdf72398791ee335a229d099fc972cea7c2a952da7622eb087ddf975361f107cbbbee5d0ad6f603466e9afa1f4fd242ffccbd4371eb0b56059bb3b5402
> Hadoop 2.10.2
> Source code repository Unknown
> revision=965fd380006fa78b2315668fbc7eb432e1d8200f
> Compiled by ubuntu on 2022-05-25T00:12Z
>
>
> Tasks:
> ===
> Task: Master startup
> Status: RUNNING:Starting assignment manager
> Running for 954s
>
> Task: Flushing master:store,,1.1595e783b53d99cd5eef43b6debb2682.
> Status: COMPLETE:Flush successful flush result:CANNOT_FLUSH_MEMSTORE_EMPTY,
> failureReason:Nothing to flush,flush seq id14
> Completed 49s ago
> Ran for 0s
>
> Task: RpcServer.priority.RWQ.Fifo.write.handler=0,queue=0,port=16000
> Status: WAITING:Waiting for a call
> Running for 951s
>
> Task: RpcServer.priority.RWQ.Fifo.write.handler=1,queue=0,port=16000
> Status: WAITING:Waiting for a call
> Running for 951s
>
>
>
> Servers:
> ===
> servername1ct.gmd9.intern,16020,1713514863737: requestsPerSecond=0.0,
> numberOfOnlineRegions=0, usedHeapMB=37.0MB, maxHeapMB=2966.0MB,
> numberOfStores=0, numberOfStorefiles=0, storeRefCount=0,
> maxCompactedStoreFileRefCount=0, storefileUncompressedSizeMB=0,
> storefileSizeMB=0, memstoreSizeMB=0, readRequestsCount=0,
> filteredReadRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0,
> totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0,
> currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]
> servername2ct.gmd9.intern,16020,1713514925960: requestsPerSecond=0.0,
> numberOfOnlineRegions=0, usedHeapMB=20.0MB, maxHeapMB=2966.0MB,
> numberOfStores=0, numberOfStorefiles=0, storeRefCount=0,
> maxCompactedStoreFileRefCount=0, storefileUncompressedSizeMB=0,
> storefileSizeMB=0, memstoreSizeMB=0, readRequestsCount=0,
> filteredReadRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0,
> totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0,
> currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]
> servername3ct.gmd9.intern,16020,1713514937151: requestsPerSecond=0.0,
> numberOfOnlineRegions=0, usedHeapMB=67.0MB, maxHeapMB=2966.0MB,
> numberOfStores=0, numberOfStorefiles=0, storeRefCount=0,
> maxCompactedStoreFileRefCount=0, storefileUncompressedSizeMB=0,
> storefileSizeMB=0, memstoreSizeMB=0, readRequestsCount=0,
> filteredReadRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0,
> totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0,
> currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]
> servername4ct.gmd9.intern,16020,1713514968019: requestsPerSecond=0.0,
> numberOfOnlineRegions=0, usedHeapMB=24.0MB, maxHeapMB=2966.0MB,

Re: HBase Master hangs on startup during upgrade from 2.2.5 to 2.5.7

2024-04-20 Thread Duo Zhang

Just post it somewhere so we can check it.

Udo Offermann  于2024年4月20日周六 20:25写道：
>
> I do have the dump File from the web ui. I can sende it all or you Tell me
> threads you are interessted in. Fortunately they all have meaningfull named.
>
> 张铎(Duo Zhang)  schrieb am Sa., 20. Apr. 2024, 14:13:
>
> > What is the jstack result for HMaster while hanging? Wait on the
> > namespace table online or meta table online?
> >
> > Udo Offermann  于2024年4月20日周六 19:43写道：
> > >
> > > Hello everyone,
> > >
> > > We are upgrading our Hadoop/HBase cluster from Hadoop 2.8.5 & HBase 2.2.5
> > > to Hadoop 3.3.6 & HBase 2.5.7
> > >
> > > The Hadoop upgrade worked well, but unfortunately we have problems with
> > the
> > > Hbase upgrade, because the master hangs on startup inside the „Starting
> > > assignment manger“ task.
> > >
> > > After 15 minutes the following message appears in the log file:
> > >
> > > Master failed to complete initialization after 90ms. Please
> > > consider submitting a bug report including a thread dump of this
> > > process.
> > >
> > >
> > > We face the same problem as Adam a couple of weeks ago: "Rolling upgrade
> > > from HBase 2.2.2 to 2.5.8 [typo corrected]: There are 2336 corrupted
> > > procedures“ and we fixed it in the same way by deleting the
> > > MasterProcWALs-folder
> > > in HDFS.
> > >
> > > I can provide HMaster dump and a dump of one data nodes!
> > >
> > > How can we proceed with the upgrade?
> > >
> > > Thanks and best regards
> > > Udo
> >

Re: HBase Master hangs on startup during upgrade from 2.2.5 to 2.5.7

2024-04-20 Thread Duo Zhang

What is the jstack result for HMaster while hanging? Wait on the
namespace table online or meta table online?

Udo Offermann  于2024年4月20日周六 19:43写道：
>
> Hello everyone,
>
> We are upgrading our Hadoop/HBase cluster from Hadoop 2.8.5 & HBase 2.2.5
> to Hadoop 3.3.6 & HBase 2.5.7
>
> The Hadoop upgrade worked well, but unfortunately we have problems with the
> Hbase upgrade, because the master hangs on startup inside the „Starting
> assignment manger“ task.
>
> After 15 minutes the following message appears in the log file:
>
> Master failed to complete initialization after 90ms. Please
> consider submitting a bug report including a thread dump of this
> process.
>
>
> We face the same problem as Adam a couple of weeks ago: "Rolling upgrade
> from HBase 2.2.2 to 2.5.8 [typo corrected]: There are 2336 corrupted
> procedures“ and we fixed it in the same way by deleting the
> MasterProcWALs-folder
> in HDFS.
>
> I can provide HMaster dump and a dump of one data nodes!
>
> How can we proceed with the upgrade?
>
> Thanks and best regards
> Udo

Re: Considering deprecation and removal of XZ compression (hbase-compression-xz)

2024-04-02 Thread Duo Zhang

For me I've never seen people actually use the xz compression.

For size, usually people will choose gzip, and for speed, in the past
people will choose lzo and now they choose snappy or zstd.

So for me I prefer we just deprecated the xz compression immediately
and remove it 2.6.0.

Thanks.

Andrew Purtell  于2024年4月2日周二 08:02写道：
>
> Red Hat filed CVE-2024-3094 late last week on 2024-03-29. This implicates
> recent releases of the native liblzma library as a vector for malicious
> code.
>
> This is not the pure Java version that we depend upon for HBase's support
> for the LZMA algorithm (
> https://github.com/apache/hbase/tree/master/hbase-compression/hbase-compression-xz).
> We depend on version 1.9 of xz-java, which was published in 2021, well
> before maintenance changes in the project and the involvement of a person
> who is now believed to be a malicious actor. Projects like HBase that
> depend on xz-java have no reason to be concerned about the issues affecting
> the native xz library.
>
> How the backdoor was introduced calls into question the trustworthiness and
> viability of the XZ project. GitHub has disabled all repositories related
> to XZ and liblzma, even xz-java. The webpage for XZ and xz-java is down.
> The open source software community is responding vigorously. CVE-2024-3094
> has a CVSS score 10, the highest possible score. Your security team may
> become interested in HBase because of hbase-compression-xz's dependency on
> xz-java. It is likely any discovered dependency on any LZMA implementation
> will at least raise questions.
>
> For now xz-java remains available in Maven central. (See
> https://central.sonatype.com/artifact/org.tukaani/xz/versions) We may have
> no choice but to immediately remove hbase-compression-xz if Maven blocks or
> drops xz-java too, because that will break our builds.
>
> There is no immediate cause for concern. Still, we believe XZ compression
> provides little to no value over more modern alternatives, like ZStandard,
> that can also achieve similar compression ratios. XZ, and alternatives like
> ZStandard with the compression level set to a high value, are also suitable
> only for archival use cases and unsuitable for compression of flush files
> or for use in minor compactions. Given how niche any use of XZ
> compression could
> be, we are wondering if there are actually any users of it.
>
> If we have no users of hbase-compression-xz, then it provides little to no
> value and continued maintenance of hbase-compression-xz given the issues
> with its dependency does not make sense.
>
> Do you use XZ compression, or are you planning to?
>
> If we deprecate XZ compression immediately and then remove it in 2.6, would
> this present a problem? In a private discussion we reached consensus on
> this approach, but, of course, that is not yet a plan, and something that
> could easily change based on feedback.
>
> From https://nvd.nist.gov/vuln/detail/CVE-2024-3094:
> "Malicious code was discovered in the upstream tarballs of xz, starting
> with version 5.6.0. Through a series of complex obfuscations, the liblzma
> build process extracts a prebuilt object file from a disguised test file
> existing in the source code, which is then used to modify specific
> functions in the liblzma code. This results in a modified liblzma library
> that can be used by any software linked against this library, intercepting
> and modifying the data interaction with this library."
>
> --
> Best regards,
> Andrew

[ANNOUNCE] New HBase committer Istvan Toth

2024-04-02 Thread Duo Zhang

On behalf of the Apache HBase PMC, I am pleased to announce that
Istvan Toth(stoty)
has accepted the PMC's invitation to become a committer on the
project. We appreciate all
of Istvan Toth's generous contributions thus far and look forward to
his continued involvement.

Congratulations and welcome, Istvan Toth!

我很高兴代表 Apache HBase PMC 宣布 Istvan Toth 已接受我们的邀请，成
为 Apache HBase 项目的 Committer。感谢 Istvan Toth 一直以来为 HBase 项目
做出的贡献，并期待他在未来继续承担更多的责任。

欢迎 Istvan Toth！

[ANNOUNCE] New HBase committer Istvan Toth

2024-04-02 Thread Duo Zhang

On behalf of the Apache HBase PMC, I am pleased to announce that
Istvan Toth(stoty)
has accepted the PMC's invitation to become a committer on the
project. We appreciate all
of Istvan Toth's generous contributions thus far and look forward to
his continued involvement.

Congratulations and welcome, Istvan Toth!

我很高兴代表 Apache HBase PMC 宣布 Istvan Toth 已接受我们的邀请，成
为 Apache HBase 项目的 Committer。感谢 Istvan Toth 一直以来为 HBase 项目
做出的贡献，并期待他在未来继续承担更多的责任。

欢迎 Istvan Toth！

Re: Rolling upgrade from HBase 2.2.2 to 2.5.8 [typo corrected]: There are 2336 corrupted procedures

2024-03-25 Thread Duo Zhang

OK, glad to hear that. Hope the rolling upgrading goes well

Adam Sjøgren  于2024年3月25日周一 21:34写道：
>
> 张铎(Duo Zhang) writes:
>
> > Is it OK if you just restart with the 2.2.2 master?
>
> Yes, it was.
>
> After finding this StackOverflow question:
> https://stackoverflow.com/questions/59718227/hbase-masterprocwals-issue
> I realized that my MasterProcWALs-folder had files stretching all the
> way back to 2021 - and since there haven't been any DDL-changes
> recently, I assumed it would be fine to move the files away. I did and
> the 2.5.8 master started up fine.
>
> So now I have both active- and standby-master updated, and I am
> rolling through the regionservers.
>
>
>   Best regards,
>
> Adam
>
> --
>  "I wonder if you can refuse to inherit the world." Adam Sjøgren
>  "I think if you're born, it's too late."  a...@koldfront.dk
>

Re: Rolling upgrade from HBase 2.2.2 to 2.5.4: There are 2336 corrupted procedures

2024-03-25 Thread Duo Zhang

Is it OK if you just restart with the 2.2.2 master?

Adam Sjøgren  于2024年3月25日周一 20:14写道：
>
>   Hi,
>
> I am trying to do a rolling upgrade of an HBase 2.2.2 cluster to
> 2.5.4.
>
> When I try switching over to the new master, it processes the WAL
> files and then ends up saying:
>
>   [... many lines elided ...]
>
>   2024-03-25T12:44:29,374 ERROR [master/s4:16000:becomeActiveMaster]
>   region.RegionProcedureStore: Corrupted procedure pid=221160,
>   ppid=219241, state=RUNNABLE; OpenRegionProcedure
>   47b69716553a2e2587f22e0e7c3d268a, server=node13,16020,1640161141416
>
>   2024-03-25T12:44:29,374 ERROR [master/s4:16000:becomeActiveMaster]
>   region.RegionProcedureStore: Corrupted procedure pid=221161,
>   ppid=219258, state=RUNNABLE; OpenRegionProcedure
>   1071a48e8ab7730a160cf55391d1670a, server=node23,16020,1639980983954
>
>   2024-03-25T12:44:29,374 ERROR [master/s4:16000:becomeActiveMaster]
>   region.RegionProcedureStore: Corrupted procedure pid=4215645,
>   ppid=4215619, state=SUCCESS, bypass=true; OpenRegionProcedure
>   17c468cbf507e548b1c34c74b14b4c40, server=node09,16020,1700398035068
>
>   2024-03-25T12:44:29,375 ERROR
>   [master/s4:16000:becomeActiveMaster] master.HMaster: Failed to
>   become active master
>   java.io.IOException: There are 2336 corrupted procedures when
>   migrating from the old WAL based store to the new region based
>   store, please fix them before upgrading again.
>
> and then it quits.
>
> How do I fix these 2336 corrupted procedures?
>
>   Best regards,
>
> Adam
>
> --
>  "Why does the sky turn red as the sun sets?"   Adam Sjøgren
>  "That's all the oxygen in the atmosphere catching a...@koldfront.dk
>   fire."
>

HBase Quarterly report Oct-Dec 2023

2024-01-28 Thread Duo Zhang

Hi all,

HBase submits a report to the ASF board once a quarter, to inform the board
about project health. I'm sending the report to the user@ and dev@ mailing
lists because you are the project, and for transparency. If you have any
questions about the report or the running of the project, you can post them
to any PMC member or committer, or send an email to priv...@hbase.apache.org,
which every PMC member subscribes to.



## Description:
Apache HBase is an open-source, distributed, versioned, non-relational
database. Apache HBase gives you low latency random access to billions of rows
with millions of columns atop non-specialized hardware.

hbase-thirdparty is a set of internal artifacts used by the project to
mitigate the impact of our dependency choices on the wider ecosystem.

hbase-connectors is a collection of integration points with other projects.
The initial release includes artifacts for use with Apache Kafka and Apache
Spark.

hbase-filesystem contains HBase project-specific implementations of the Apache
Hadoop FileSystem API. It is currently experimental and internal to the
project.

hbase-operator-tools is a collection of tools for HBase operators. Now it is
mainly for hosting HBCK2.

hbase-native-client is a client library in C/C++, in its early days.

hbase-kustomize is for deploying HBase on kubernetes, still under development.

## Project Status:
Current project status: Ongoing

Issues for the board:
There is a request from the trademarks team about missing the 'TM' superscript
for some web pages on hbase.a.o. The issue has already been resolved.

## Membership Data:
Apache HBase was founded 2010-04-21 (14 years ago)
There are currently 104 committers and 59 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:4.

Community changes, past quarter:
- Bryan Beaudreault was added to the PMC on 2023-10-17
- No new committers. Last addition was Hui Ruan on 2023-09-15.

## Project Activity:
2.5.7 was released on 2023-12-24.
hbase-connectors-1.0.1 was released on 2023-10-27.
2.5.6 was released on 2023-10-20.

We have cut branch-2.6 and started to prepare the 2.6.0 release. There were
some discussions around the hadoop versions we want to support
https://lists.apache.org/thread/rt2ht8xmmr9vp77trsbs1db6f90pbpm8
We decided to follow the hadoop community, move the minimum hadoop 3 support
to 3.3.x, but for hbase-2.x, we still have support for 2.10.2, though there
are some CVEs.

We upgraded zookeeper dependencies to 3.8.x in a patch release due to some
CVEs, and it should be much less hurt than upgrading hadoop.
https://lists.apache.org/thread/pbjfs75kpkd3y9dcydhmr2lotnv72s8w

There will be a HBase meetup in India.
https://lists.apache.org/thread/n8bb8tzghcxgg7w8fd80542fo9x6mmr9

## Community Health:
d...@hbase.apache.org:
960 subscribers(958 in the previous quarter)
420 emails sent to list(360 in the previous quarter)

user@hbase.apache.org:
1986 subscribers(1988 in the previous quarter)
31 emails sent to list(39 in the previous quarter)

user...@hbase.apache.org
77 subscribers(78 in the previous quarter)
17 emails sent to list(19 in the previous quarter)

- JIRA activity:
108 issues opened in JIRA, past quarter (-36% change)
75 issues closed in JIRA, past quarter (-32% change)

- Commit activity:
464 commits in the past quarter (20% increase)
40 code contributors in the past quarter (-4% change)

- GitHub PR activity:
160 PRs opened on GitHub, past quarter (4% increase)
145 PRs closed on GitHub, past quarter (no change)

The community is overall healthy. It is Chrismas so recently the numbers
decreased a lot. We have cut branch-2.6 and planned to put up 2.6.0 release
candidate soon. And the vote for 3.0.0-beta-1 is ongoing.

HBase Quarterly report Oct-Dec 2023

2024-01-28 Thread Duo Zhang

Hi all,

HBase submits a report to the ASF board once a quarter, to inform the board
about project health. I'm sending the report to the user@ and dev@ mailing
lists because you are the project, and for transparency. If you have any
questions about the report or the running of the project, you can post them
to any PMC member or committer, or send an email to priv...@hbase.apache.org,
which every PMC member subscribes to.



## Description:
Apache HBase is an open-source, distributed, versioned, non-relational
database. Apache HBase gives you low latency random access to billions of rows
with millions of columns atop non-specialized hardware.

hbase-thirdparty is a set of internal artifacts used by the project to
mitigate the impact of our dependency choices on the wider ecosystem.

hbase-connectors is a collection of integration points with other projects.
The initial release includes artifacts for use with Apache Kafka and Apache
Spark.

hbase-filesystem contains HBase project-specific implementations of the Apache
Hadoop FileSystem API. It is currently experimental and internal to the
project.

hbase-operator-tools is a collection of tools for HBase operators. Now it is
mainly for hosting HBCK2.

hbase-native-client is a client library in C/C++, in its early days.

hbase-kustomize is for deploying HBase on kubernetes, still under development.

## Project Status:
Current project status: Ongoing

Issues for the board:
There is a request from the trademarks team about missing the 'TM' superscript
for some web pages on hbase.a.o. The issue has already been resolved.

## Membership Data:
Apache HBase was founded 2010-04-21 (14 years ago)
There are currently 104 committers and 59 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:4.

Community changes, past quarter:
- Bryan Beaudreault was added to the PMC on 2023-10-17
- No new committers. Last addition was Hui Ruan on 2023-09-15.

## Project Activity:
2.5.7 was released on 2023-12-24.
hbase-connectors-1.0.1 was released on 2023-10-27.
2.5.6 was released on 2023-10-20.

We have cut branch-2.6 and started to prepare the 2.6.0 release. There were
some discussions around the hadoop versions we want to support
https://lists.apache.org/thread/rt2ht8xmmr9vp77trsbs1db6f90pbpm8
We decided to follow the hadoop community, move the minimum hadoop 3 support
to 3.3.x, but for hbase-2.x, we still have support for 2.10.2, though there
are some CVEs.

We upgraded zookeeper dependencies to 3.8.x in a patch release due to some
CVEs, and it should be much less hurt than upgrading hadoop.
https://lists.apache.org/thread/pbjfs75kpkd3y9dcydhmr2lotnv72s8w

There will be a HBase meetup in India.
https://lists.apache.org/thread/n8bb8tzghcxgg7w8fd80542fo9x6mmr9

## Community Health:
d...@hbase.apache.org:
960 subscribers(958 in the previous quarter)
420 emails sent to list(360 in the previous quarter)

u...@hbase.apache.org:
1986 subscribers(1988 in the previous quarter)
31 emails sent to list(39 in the previous quarter)

user-zh@hbase.apache.org
77 subscribers(78 in the previous quarter)
17 emails sent to list(19 in the previous quarter)

- JIRA activity:
108 issues opened in JIRA, past quarter (-36% change)
75 issues closed in JIRA, past quarter (-32% change)

- Commit activity:
464 commits in the past quarter (20% increase)
40 code contributors in the past quarter (-4% change)

- GitHub PR activity:
160 PRs opened on GitHub, past quarter (4% increase)
145 PRs closed on GitHub, past quarter (no change)

The community is overall healthy. It is Chrismas so recently the numbers
decreased a lot. We have cut branch-2.6 and planned to put up 2.6.0 release
candidate soon. And the vote for 3.0.0-beta-1 is ongoing.

[ANNOUNCE] Apache HBase 3.0.0-beta-1 is now available for download

2024-01-19 Thread Duo Zhang

The HBase team is happy to announce the immediate availability of HBase
3.0.0-beta-1.

Apache HBase™ is an open-source, distributed, versioned, non-relational
database. Apache HBase gives you low latency random access to billions of rows
with millions of columns atop non-specialized hardware. To learn more about
HBase, see https://hbase.apache.org/.

HBase 3.0.0-beta-1 is the first beta release in the HBase 3.x line, which is
the fifth release of our next major release line. It includes roughly 4000+
resolved issues since 2.0.0.

A beta release usually means that there will be no new big feature for 3.0.0
release line any more, and the release is much stabler than the alpha releases
before, as we have passed several 1B ITBLL runs and one 10B ITBLL run. So you
can try it in more critical environments, but still need to use it
with caution, and
report back everything you find unusual.

Notable new features include:
  Synchronous Replication
  OpenTelemetry Tracing
  Distributed MOB Compaction
  Backup and Restore
  Decouple region replication and general replication framework, and also
make region replication can work when SKIP_WAL is used
  A new file system based replication peer storage
  Use hbase table instead of zookeeper for tracking hbase replication queue

For other important changes:
  We do not support hadoop 2.x any more
  Move RSGroup balancer to core
  Reimplement sync client on async client
  CPEPs on shaded proto
  Move the logging framework from log4j to log4j2
  Yield SCP and TRSP to improve MTTR when restarting large cluster

The full list of issues and release notes can be found here:

CHANGELOG: https://downloads.apache.org/hbase/3.0.0-beta-1/CHANGES.md
RELEASENOTES: https://downloads.apache.org/hbase/3.0.0-beta-1/RELEASENOTES.md

or via our issue tracker:

  https://issues.apache.org/jira/projects/HBASE/versions/12353291
  https://issues.apache.org/jira/projects/HBASE/versions/12351845
  https://issues.apache.org/jira/projects/HBASE/versions/12350942
  https://issues.apache.org/jira/projects/HBASE/versions/12350250
  https://issues.apache.org/jira/projects/HBASE/versions/12332342

To download please follow the links and instructions on our website:

  https://hbase.apache.org/downloads.html

Questions, comments, and problems are always welcome at:

  d...@hbase.apache.org
  user@hbase.apache.org
  user...@hbase.apache.org

Thanks to all who contributed and made this release possible.

Cheers,
The HBase Dev Team

[ANNOUNCE] Apache HBase 3.0.0-beta-1 is now available for download

2024-01-19 Thread Duo Zhang

The HBase team is happy to announce the immediate availability of HBase
3.0.0-beta-1.

Apache HBase™ is an open-source, distributed, versioned, non-relational
database. Apache HBase gives you low latency random access to billions of rows
with millions of columns atop non-specialized hardware. To learn more about
HBase, see https://hbase.apache.org/.

HBase 3.0.0-beta-1 is the first beta release in the HBase 3.x line, which is
the fifth release of our next major release line. It includes roughly 4000+
resolved issues since 2.0.0.

A beta release usually means that there will be no new big feature for 3.0.0
release line any more, and the release is much stabler than the alpha releases
before, as we have passed several 1B ITBLL runs and one 10B ITBLL run. So you
can try it in more critical environments, but still need to use it
with caution, and
report back everything you find unusual.

Notable new features include:
  Synchronous Replication
  OpenTelemetry Tracing
  Distributed MOB Compaction
  Backup and Restore
  Decouple region replication and general replication framework, and also
make region replication can work when SKIP_WAL is used
  A new file system based replication peer storage
  Use hbase table instead of zookeeper for tracking hbase replication queue

For other important changes:
  We do not support hadoop 2.x any more
  Move RSGroup balancer to core
  Reimplement sync client on async client
  CPEPs on shaded proto
  Move the logging framework from log4j to log4j2
  Yield SCP and TRSP to improve MTTR when restarting large cluster

The full list of issues and release notes can be found here:

CHANGELOG: https://downloads.apache.org/hbase/3.0.0-beta-1/CHANGES.md
RELEASENOTES: https://downloads.apache.org/hbase/3.0.0-beta-1/RELEASENOTES.md

or via our issue tracker:

  https://issues.apache.org/jira/projects/HBASE/versions/12353291
  https://issues.apache.org/jira/projects/HBASE/versions/12351845
  https://issues.apache.org/jira/projects/HBASE/versions/12350942
  https://issues.apache.org/jira/projects/HBASE/versions/12350250
  https://issues.apache.org/jira/projects/HBASE/versions/12332342

To download please follow the links and instructions on our website:

  https://hbase.apache.org/downloads.html

Questions, comments, and problems are always welcome at:

  d...@hbase.apache.org
  u...@hbase.apache.org
  user-zh@hbase.apache.org

Thanks to all who contributed and made this release possible.

Cheers,
The HBase Dev Team

Re: Engineering blog post on HBase/Phoenix major upgrade without downtime

2024-01-12 Thread Duo Zhang

Great article!

And congratulations on the successful upgrade!

Viraj Jasani  于2024年1月13日周六 08:21写道：
>
> Hello,
>
> We have completed major upgrade of HBase/Phoenix from 1.6/4.16 to 2.4/5.1
> versions and continuing journey to 2.5 release in production. Would like to
> share some insights on how we achieved in-place rolling upgrade of major
> release without any downtime.
>
> Here is the Engineering Blog on Implementing Salesforce’s Largest Database
> Upgrade:
> https://engineering.salesforce.com/implementing-salesforces-largest-database-upgrade-inside-the-migration-to-hbase-2/
>
>
> Detailed insights available on
> https://engineering.salesforce.com/wp-content/uploads/2023/12/SFDC-HBase2-Phoenix5-Paper-2023.pdf

Re: version of haddop in hbase binary download

2023-12-18 Thread Duo Zhang

In general, HBase will bundle the oldest hadoop binaries it could
support by default.

It does not mean that you should have your hadoop servers, like HDFS
or YARN, has the same version. You are free to use newer version of
hadoop to setup HDFS or YARN, as a hadoop client with lower version
can communicate with hadoop server with higher version.

If you want to use hadoop client with higher version, it is
recommended to build HBase binaries by your own. You can check the
'Building Apache HBase' section in our ref guide

https://hbase.apache.org/book.html#build

The command is like this

mvn -DskipTests -Dhadoop.profile=3.0 -Dhadoop-three.version=3.3.6
clean install && mvn -DskipTests -Dhadoop.profile=3.0
-Dhadoop-three.version=3.3.6 package assembly:single

Thanks.

Michael  于2023年12月18日周一 19:44写道：
>
> Hi,
>
> the version of hadoop which is included in the binary download of hbase
> 2.5.6 is slightly older than that of hadoop on the apache webside.
>
> Does it matter? It differs in the minor version.
> Or is it recommended to use the same version of hadoop as is included in
> the hbase binary?
>
> Thanks
>   Michael

Re: [DISCUSS] End support for hadoop 2.10?

2023-12-09 Thread Duo Zhang

If a hadoop minor release line is EOL, we can drop its support in the
next hbase minor release, but hadoop 2.10 is still a bit different
comparing to other hadoop release line, as it is the last release line
for hadoop2.

https://lists.apache.org/thread/nkrxx0glxz939162qrkc6d5nc72qxbfw

This is the discussion thread in hadoop community, they are all agree
to EOL 3.2.x, but for 2.10.x, they have concerns about EOL it too...

So I think we'd better drop 3.2.x support for hbase 2.6.x, but still
leave the hadoop2 profile as is.

Once the hadoop community officially decides to EOL hadoop2
completely, we could discuss again.

Thanks.

Bryan Beaudreault  于2023年12月6日周三 23:21写道：

>
> Thanks for the input.
>
> My concern with waiting on hbase 3.x is that it's already been pending for
> years, and comes with many big architectural changes. It will probably be a
> risky upgrade for users, and we will end up supporting hbase 2.x for years
> to come. This is probably a separate discussion, but I do wonder if we
> should target a specific major release cadence (yearly) so that we can move
> forward on deprecations, etc. Not every major release has to be huge
> (ideally isn't).
>
> I agree we need to support hadoop-2.x for a while, but we can keep that
> support in hbase 2.5. This is how we've handled other hadoop versions
> according to our compatibility matrix.
>
> On Wed, Dec 6, 2023 at 1:53 AM 张铎(Duo Zhang)  wrote:
>
> > Better also send the email to user@hbase to see what our users think.
> >
> > I think we could change the default profile to hadoop3, but better
> > still have the hadoop2 profile as there could still be users on
> > hadoop-2.x.
> >
> > We will completely drop the hadoop2 support in hbase 3.x.
> >
> > Tak Lon (Stephen) Wu  于2023年12月6日周三 12:08写道：
> > >
> > > When Wei-Chiu and I were working on Ozone support via HBASE-27769, we
> > asked
> > > once when we could supporting hadoop-3.3+, the answer from Duo was HBase
> > > community supports the oldest version of hadoop
> > > https://hadoop.apache.org/releases.html (it was 2.10, 3.2.4 and 3.3.6).
> > >
> > > If this strategy remains and once 2.10 becomes EOL then HBase 2.6 should
> > be
> > > able to support 3.2.x and 3.3.x. At the same time, IMO 3.2.x is also an
> > > inactive release version, we can discuss if we should just change our
> > base
> > > of hadoop to 3.3.6 maybe starting from HBase 3.0+
> > >
> > > -Stephen
> > >
> > > On Tue, Dec 5, 2023 at 7:51 AM Bryan Beaudreault <
> > bbeaudrea...@apache.org>
> > > wrote:
> > >
> > > > On the hdfs dev list, they are talking about EOL Hadoop 2.10 (and thus
> > > > 2.x). They may cherry-pick back critical CVE fixes but not create any
> > more
> > > > releases. Of course, the decision is not final yet, but I wonder if we
> > > > should make a similar decision for supporting 2.10 in hbase.
> > > >
> > > > Given that 2.6 is soon, we could mark the end of support in that
> > release.
> > > > While it may seem like a major change, there is some precedent for
> > this.
> > > > Looking at our compatibility matrix, we have dropped support for Hadoop
> > > > releases in minor releases in the past.
> > > >
> > > > Dropping support for Hadoop 2 in HBase 2.6 would allow us to start
> > cleaning
> > > > up our POMs and some of the hacks we've had to do to reflect around
> > Hadoop
> > > > releases. It may also free up Jenkins capacity since we can turn off
> > some
> > > > builds for our primary branches.
> > > >
> >

Re: [DISCUSS] End support for hadoop 2.10?

2023-12-05 Thread Duo Zhang

Better also send the email to user@hbase to see what our users think.

I think we could change the default profile to hadoop3, but better
still have the hadoop2 profile as there could still be users on
hadoop-2.x.

We will completely drop the hadoop2 support in hbase 3.x.

Tak Lon (Stephen) Wu  于2023年12月6日周三 12:08写道：
>
> When Wei-Chiu and I were working on Ozone support via HBASE-27769, we asked
> once when we could supporting hadoop-3.3+, the answer from Duo was HBase
> community supports the oldest version of hadoop
> https://hadoop.apache.org/releases.html (it was 2.10, 3.2.4 and 3.3.6).
>
> If this strategy remains and once 2.10 becomes EOL then HBase 2.6 should be
> able to support 3.2.x and 3.3.x. At the same time, IMO 3.2.x is also an
> inactive release version, we can discuss if we should just change our base
> of hadoop to 3.3.6 maybe starting from HBase 3.0+
>
> -Stephen
>
> On Tue, Dec 5, 2023 at 7:51 AM Bryan Beaudreault 
> wrote:
>
> > On the hdfs dev list, they are talking about EOL Hadoop 2.10 (and thus
> > 2.x). They may cherry-pick back critical CVE fixes but not create any more
> > releases. Of course, the decision is not final yet, but I wonder if we
> > should make a similar decision for supporting 2.10 in hbase.
> >
> > Given that 2.6 is soon, we could mark the end of support in that release.
> > While it may seem like a major change, there is some precedent for this.
> > Looking at our compatibility matrix, we have dropped support for Hadoop
> > releases in minor releases in the past.
> >
> > Dropping support for Hadoop 2 in HBase 2.6 would allow us to start cleaning
> > up our POMs and some of the hacks we've had to do to reflect around Hadoop
> > releases. It may also free up Jenkins capacity since we can turn off some
> > builds for our primary branches.
> >

Re: Stripe Compaction 策略遇到的一些问题请教

2023-11-28 Thread Duo Zhang

果然

https://github.com/apache/hbase/blob/4d90b918a3702b4e4ae2f9ee890c14665e821c01/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StripeStoreEngine.java#L84

这个方法里压根就没用过那个 forceMajor 参数

可以开个 issue 吧，看看这块怎么改一下，至少得把 forceMajor 这个参数传递下去，在 stripe 内部做 compact
的时候需要能拿到这个参数

张铎(Duo Zhang)  于2023年11月29日周三 14:18写道：
>
> 我印象里应该是有这个机制的，是不是 StripeCompaction 里没考虑这个参数
>
> leojie  于2023年11月21日周二 11:42写道：
> >
> > 张老师，
> > 您好！
> > 请教一个HBase的问题，我们线上有张表，应用了Stripe
> > Compaction策略，每个region均值40G，被分成8个Stripe，每个Stripe5g，业务删除大量数据，表整体和单个region手动触发major
> > compaction不起作用，挑选不出来合适的文件参与合并。
> >
> > 看了源码，每个Stripe下面应用的合并策略是，ExploringCompactionPolicy，这个策略有一个关键点，筛选耽搁Stripe的Store
> > file列表，候选文件列表中，只要存在一个文件的大小过大，满足条件，fileSize > (totalFileSize - fileSize) *
> > (hbase.hstore.compaction.ratio 默认值1.2)，就不会筛选出来文件参与major
> > 如果支持一个强制合并的机制，是否合理？针对大量删除场景，或bulkload，存在大量数据被标记删除，可以在手动触发major时，
> > 显式传入一个foreMajor之类的参数，就是
> > 不应用挑选策略，直接选择合并全部文件，目前是否存在这样的功能，或者这样的功能是否合理？除此之外，针对这样的情况，是否有更理想的方案呢？
> > 期待张老师的解惑，十分感谢

Re: Stripe Compaction 策略遇到的一些问题请教

2023-11-28 Thread Duo Zhang

我印象里应该是有这个机制的，是不是 StripeCompaction 里没考虑这个参数

leojie  于2023年11月21日周二 11:42写道：
>
> 张老师，
> 您好！
> 请教一个HBase的问题，我们线上有张表，应用了Stripe
> Compaction策略，每个region均值40G，被分成8个Stripe，每个Stripe5g，业务删除大量数据，表整体和单个region手动触发major
> compaction不起作用，挑选不出来合适的文件参与合并。
>
> 看了源码，每个Stripe下面应用的合并策略是，ExploringCompactionPolicy，这个策略有一个关键点，筛选耽搁Stripe的Store
> file列表，候选文件列表中，只要存在一个文件的大小过大，满足条件，fileSize > (totalFileSize - fileSize) *
> (hbase.hstore.compaction.ratio 默认值1.2)，就不会筛选出来文件参与major
> 如果支持一个强制合并的机制，是否合理？针对大量删除场景，或bulkload，存在大量数据被标记删除，可以在手动触发major时，
> 显式传入一个foreMajor之类的参数，就是
> 不应用挑选策略，直接选择合并全部文件，目前是否存在这样的功能，或者这样的功能是否合理？除此之外，针对这样的情况，是否有更理想的方案呢？
> 期待张老师的解惑，十分感谢

Re: Undeleted replication queue for removed peer found

2023-11-18 Thread Duo Zhang

I guess the problem is you exceeded the maximum size limit for
zookeeper multi operation.

I searched the code base of branch-1, you could try to set
'hbase.zookeeper.useMulti' to false in your hbase-site.xml to disable
multi so the operation could succeed. But it may introduce
inconsistency so you'd better find out why there are so many files
that need to be claimed or deleted, fix the problem and switch
hbase.zookeeper.useMulti back to true.

And the 1.4.x release line is already EOL, suggest you upgrade to the
current stable release line 2.5.x.

Thanks.

Manimekalai  于2023年11月18日周六 20:21写道：
>
> Dear Team,
>
> In one of the Hbase Cluster, some of the replication queue has not been
> properly removed, though the concerned peerId has been removed from
> list_peers.
>
> Due to this, I'm facing frequent region server restart has been
> occurring in the cluster where replication has to be written.
>
> I have tried to use hbase hbck -fixReplication. But it didn't work.
>
> The HBase Version is 1.4.14
>
> Below is the exception from Master and Regionserver respectively
> *Master Exception*
>
> 2023-11-18 13:01:30,815 ERROR
> > [172.XX.XX.XX,16020,1700289063450_ChoreService_2]
> > zookeeper.RecoverableZooKeeper: ZooKeeper multi failed after 4 attempts
> > 2023-11-18 13:01:30,815 WARN  
> > [172.XX.XX.XX,,16020,1700289063450_ChoreService_2]
> > cleaner.ReplicationZKNodeCleanerChore: Failed to clean replication zk node
> > java.io.IOException: Failed to delete queue, replicator:
> > 172.XX.XX.XX,,16020,1655822657566, queueId: 3
> > at
> > org.apache.hadoop.hbase.master.cleaner.ReplicationZKNodeCleaner$ReplicationQueueDeletor.
> > removeQueue(ReplicationZKNodeCleaner.java:160)
> > at
> > org.apache.hadoop.hbase.master.cleaner.ReplicationZKNodeCleaner.
> > removeQueues(ReplicationZKNodeCleaner.java:197)
> > at
> > org.apache.hadoop.hbase.master.cleaner.ReplicationZKNodeCleanerChore.chore(ReplicationZKNodeCleanerChore.java:49)
> > at
> > org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:189)
> > at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> > at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> > at
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> > at
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> > at
> > org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
> > at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > at java.lang.Thread.run(Thread.java:748)
>
>
>
> *RegionServer Exception*
>
> 2023-11-18 13:17:52,200 WARN  [main-SendThread(10.XX.XX.XX:2171)]
> > zookeeper.ClientCnxn: Session 0xXXX for server
> > 10.XX.XX.XX/10.XX.XX.XX:2171, unexpected error, closing socket connection
> > and attempting reconnect
> > java.io.IOException: Broken pipe
> > at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
> > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
> > at sun.nio.ch.IOUtil.write(IOUtil.java:65)
> > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
> > at
> > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
> > at
> > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
> > at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
> > 2023-11-18 13:17:52,300 ERROR [ReplicationExecutor-0]
> > zookeeper.RecoverableZooKeeper: ZooKeeper multi failed after 4 attempts
> > 2023-11-18 13:17:52,300 WARN  [ReplicationExecutor-0]
> > replication.ReplicationQueuesZKImpl: Got exception in
> > copyQueuesFromRSUsingMulti:
> > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > KeeperErrorCode = ConnectionLoss
> > at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> > at
> > org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:992)
> > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910)
> > at
> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:672)
> > at
> > org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1685)
> > at
> > org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.moveQueueUsingMulti(ReplicationQueuesZKImpl.java:410)
> > at
> > org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.claimQueue(ReplicationQueuesZKImpl.java:257)
> > at
> >

Re: How to proactively update the information of hdfs blocks in hbase cache

2023-10-30 Thread Duo Zhang

See https://issues.apache.org/jira/browse/HDFS-13571.

The problem for HBase is that the region server will keep a large
amount of DFSInputStream open, for random access. If a DN node is
completely gone, i.e, can not be reached through the network, you will
face a connection timeout, usually it will be very slow, in your case,
60 seconds, for HBase to find out the connection is broken.

It is not easy for HBase to fix this problem and in HDFS-13571, the
hdfs community introduced a way to mark a DN as dead before sending
request to it, and the dead node information is shared through all
DFSInputStream so after the first time you see a connection timeout of
a DN, soon all the DFSInputStream will give up requesting the dead DN,
which could mostly solve the above problem.

You can try to build HBase with hadoop 3.3.x and deploy it again for
fixing the problem, as the code changes are mostly at client side.

Thanks.

Miao Wang  于2023年10月30日周一 18:49写道：
>
> Hi, community
>
>
> We have an HBase cluster using version 2.4.6 and Hadoop version 3.0.0 
> cdh6.3.2，I have two questions to ask. Could you please answer them? Thank you
>
>
> 1.When datanode crashes for an extended period of time and the client fails 
> to hit the cache when requesting data, the region server will still request 
> abnormal datanode nodes. Is this cached information only updated when the 
> table is making a major? ，Is the information in this cache only updated when 
> the table generates the main table? When the dn service encounters an 
> exception, how can I proactively update this information in the regional 
> server cache?
> The error message is as follows:
>
>
> Failed to connect to /10.11.34.29:9866 for file 
> /hbase/data/default/tj_idcard_sha_02/429606d2083c287a13d0ebe43774938f/cf/dc063bb106b34039a014690f86b1c22c
>  for block 
> BP-1561188726-10.11.39.10-1684756341563:blk_1078186539_4445860:java.io.IOException:
>  Connection reset by peer
>
>
> 2.When datanoe encounters an exception, the regionserve will generate the 
> following error log, which seriously affects the client's request time. 
> Analyzing the log roughly affects the duration of ten minutes. Why are these 
> two error messages still inconsistent? How can I avoid it?
>
>
>
>
>  Connection failure: Failed to connect to /10.11.34.29:9866 for file 
> /hbase/archive/data/default/tj_soical_v05_feature_hb_rhsh/500ea510085d36ce0efa33ac63a15a88/cf/e6003a114c5d415ba720ac7657538120
>  for block 
> BP-1561188726-10.11.39.10-1684756341563:blk_1078131727_4391047:java.net.SocketTimeoutException:
>  6 millis timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/10.11.2.142:14154 
> remote=/10.11.34.29:9866]
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/10.11.2.142:14154 remote=/10.11.34.29:9866]
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> at 
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
> at java.io.FilterInputStream.read(FilterInputStream.java:83)
> at 
> org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:537)
> at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:407)
> at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:848)
> at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:744)
> at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:379)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:645)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1050)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1002)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1361)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1325)
> at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:93)
> at 
>

[ANNOUNCE] Please welcome Bryan Beaudreault to the HBase PMC

2023-10-16 Thread Duo Zhang

On behalf of the Apache HBase PMC I am pleased to announce that
Bryan Beaudreault has accepted our invitation to become a PMC member
on the Apache HBase project. We appreciate Bryan Beaudreault
stepping up to take more responsibility in the HBase project.

Please join me in welcoming Bryan Beaudreault to the HBase PMC!

我很高兴代表 Apache HBase PMC 宣布 Bryan Beaudreault 已接受我们的邀请，
成为 Apache HBase 项目的 PMC 成员。感谢 Bryan Beaudreault 愿意在 HBase
项目中承担更大的责任。

欢迎 Bryan Beaudreault！

[ANNOUNCE] Please welcome Bryan Beaudreault to the HBase PMC

2023-10-16 Thread Duo Zhang

On behalf of the Apache HBase PMC I am pleased to announce that
Bryan Beaudreault has accepted our invitation to become a PMC member
on the Apache HBase project. We appreciate Bryan Beaudreault
stepping up to take more responsibility in the HBase project.

Please join me in welcoming Bryan Beaudreault to the HBase PMC!

我很高兴代表 Apache HBase PMC 宣布 Bryan Beaudreault 已接受我们的邀请，
成为 Apache HBase 项目的 PMC 成员。感谢 Bryan Beaudreault 愿意在 HBase
项目中承担更大的责任。

欢迎 Bryan Beaudreault！

Re: Hbase Download Question

2023-10-07 Thread Duo Zhang

Is this problem reproducible?
I think lots of developers and users in the community use M1 mac now...

Harry Jamison  于2023年10月7日周六 10:30写道：
>
>  I figured out what the problem was.I was untaring this on a mac, and it was 
> somehow corrupting the file.It was fixed when I did it on a linux machine
>
> On Wednesday, October 4, 2023 at 08:41:12 PM PDT, Harry Jamison 
>  wrote:
>
>  I am trying to setup hbase on a cluster that I am setting up.
> I am going through this setup guidehttps://hbase.apache.org/book.html
>
> I am looking at this pagehttps://hbase.apache.org/downloads.html
> I am curious what the difference is between the -hadoop3- packages.And the 
> -client-bin packages.If I am using hadoop3 I assume that I should use that 
> artifact.
>
> I download "hbase-2.5.5-hadoop3-bin.tar.gz"
> I got to the the section of that guide where it says
> Example extract from hbase-env.sh where JAVA_HOME is set
>
> But when I look at that file, it appears to be a binary file.So I am not sure 
> if I am doing something wrong.
>
> Thanks

[ANNOUNCE] New HBase committer Hui Ruan(阮辉)

2023-09-15 Thread Duo Zhang

On behalf of the Apache HBase PMC, I am pleased to announce that Hui
Ruan(frostruan)
has accepted the PMC's invitation to become a committer on the
project. We appreciate all
of Hui's generous contributions thus far and look forward to his
continued involvement.

Congratulations and welcome, Hui Ruan!

我很高兴代表 Apache HBase PMC 宣布阮辉已接受我们的邀请，成
为 Apache HBase 项目的 Committer。感谢阮辉一直以来为 HBase 项目
做出的贡献，并期待他在未来继续承担更多的责任。

欢迎阮辉！

[ANNOUNCE] New HBase committer Hui Ruan(阮辉)

2023-09-15 Thread Duo Zhang

On behalf of the Apache HBase PMC, I am pleased to announce that Hui
Ruan(frostruan)
has accepted the PMC's invitation to become a committer on the
project. We appreciate all
of Hui's generous contributions thus far and look forward to his
continued involvement.

Congratulations and welcome, Hui Ruan!

我很高兴代表 Apache HBase PMC 宣布阮辉已接受我们的邀请，成
为 Apache HBase 项目的 Committer。感谢阮辉一直以来为 HBase 项目
做出的贡献，并期待他在未来继续承担更多的责任。

欢迎阮辉！

Re: Replication Lag Issue in HBase DR Cluster after Upgrade

2023-08-22 Thread Duo Zhang

Replication can also replicate the wal file which is currently being
written, so usually at least the sizeOfLogQueue is 1, even if there is no
write to the region server.

You can see how to calculate timeStampNextToReplicate on branch-1, I think
this is the correct way to fix the metrics issue.

Valli  于2023年8月22日周二 20:17写道：

> Hi Duo
>
> [image: Screenshot 2023-08-22 at 5.39.58 PM.png]
>
>
> In the above metrics, sizeofLogQueue is always as 1 though we don't
> have any entry for regionserver in the oldWAL folder.
>
>
> https://github.com/apache/hbase/blob/rel/1.4.14/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationLoad.java
>
> long timePassedAfterLastShippedOp =
> EnvironmentEdgeManager.currentTime() - timeStampOfLastShippedOp;
> if (sizeOfLogQueue != 0) {
> // err on the large side
> replicationLag = Math.max(ageOfLastShippedOp, timePassedAfterLastShippedOp
> );
> } else if (timePassedAfterLastShippedOp < 2 * ageOfLastShippedOp) {
> replicationLag = ageOfLastShippedOp; // last shipped happen recently
> } else {
> // last shipped may happen last night,
> // so NO real lag although ageOfLastShippedOp is non-zero
> replicationLag = 0;
> }
>
> Above is the code extract from 1.4.14, here in this if sizeOfLogQueue is
> not equals 0 , the max value of either timePassedAfterLastShippedOp,
> ageOfLastShippedOp has been displayed as replication Lag. How to find
> sizeOfLogQueue value has been set.
>
>
> On Sun, 20 Aug 2023 at 12:56, 张铎(Duo Zhang)  wrote:
>
>> If it is just a metrics issue then HBASE-22784 won't help. I guess the
>> problem is that the replication lag is calculated by comparing the
>> current time and the time when we ship the last edit, so if there is
>> no new edit, the replication lag will keep growing.
>>
>> Looking at the current code
>>
>>
>> https://github.com/apache/hbase/blob/dae078e5bc342012b49cd066027eb53ae9a21280/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSource.java#L341
>>
>>   public long getReplicationDelay() {
>> if (getTimestampOfLastShippedOp() >= timeStampNextToReplicate) {
>>   return 0;
>> } else {
>>   return EnvironmentEdgeManager.currentTime() -
>> timeStampNextToReplicate;
>> }
>>   }
>>
>> It has a if condition to check whether there are actual edits to
>> replicate to avoid false alarming, which is added by HBASE-21505.
>>
>> The code for branch-1.4 is completely different, and since all hbase
>> 1.x version have been EOL for quite some time, I'm not sure what is
>> the easier way to fix the problem, maybe you need to read the code a
>> bit more carefully to see how to add the above check in 1.x code line.
>>
>> Thanks.
>>
>> Valli  于2023年8月17日周四 23:14写道：
>> >
>> > Hi Duo Zhang
>> >
>> > Its just metrics. Because in that cluster, there is no active write. So
>> we
>> > don't have any data to replicate to the another cluster.
>> >
>> >
>> > On Wed, 16 Aug 2023 at 08:01, 张铎(Duo Zhang) 
>> wrote:
>> >
>> > > Is this just a metrics issue or is there an actual replication lag?
>> > >
>> > > Valli  于2023年8月11日周五 22:51写道：
>> > > >
>> > > > Hello HBase Community,
>> > > >
>> > > > We recently upgraded our HBase cluster from version 1.2.6 to 1.4.14
>> and
>> > > > have encountered an issue with replication lag in our Disaster
>> Recovery
>> > > > (DR) cluster. We have two clusters in our setup: an active write
>> cluster
>> > > > and a DR cluster that receives replication from the active cluster.
>> The
>> > > > replication lag in the DR cluster has been building up, even though
>> there
>> > > > are no direct writes to it.
>> > > >
>> > > > Here's a brief overview of the problem:
>> > > > - We have an active write cluster with no replication lag.
>> > > > - The DR cluster only receives replication from the active cluster
>> and
>> > > > doesn't have direct writes.
>> > > > - Replication lag builds up in the DR cluster over time, even though
>> > > there
>> > > > is no active write.
>> > > > - When a 'put' call is made in the DR cluster, the replication lag
>> > > reduces
>> > > > momentarily, but then starts building up .
>> > > >
>> > > > We have experienced similar kind of issue in 1.4.9 version in
>> anothe

Re: Higher availability for writes

2023-08-22 Thread Duo Zhang

AFAIK, currently we can not promote a secondary replica to a primary
replica, they are assigned separately.

Promoting secondary replica to primary replica has been discussed in
the past but hasn't been implemented yet.

Thanks.

Michiel de Jong  于2023年8月22日周二 18:25写道：
>
> Hello,
>
> We are looking to increase the write availability of our HBase when a 
> RegionServer fails.
> Will using region replication for high available reads help with this? 
> (https://hbase.apache.org/book.html#arch.timelineconsistent.reads)
>
> In the design document for this feature, and the comments on the issue there 
> are some mentions of a future feature where a secondary replica can be 
> promoted to a primary replica. This should lower the time for restoring a 
> region. Has this ever been implemented?
>
> Kind regards,
> Michiel de Jong

Re: Replication Lag Issue in HBase DR Cluster after Upgrade

2023-08-20 Thread Duo Zhang

If it is just a metrics issue then HBASE-22784 won't help. I guess the
problem is that the replication lag is calculated by comparing the
current time and the time when we ship the last edit, so if there is
no new edit, the replication lag will keep growing.

Looking at the current code

https://github.com/apache/hbase/blob/dae078e5bc342012b49cd066027eb53ae9a21280/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/MetricsSource.java#L341

  public long getReplicationDelay() {
if (getTimestampOfLastShippedOp() >= timeStampNextToReplicate) {
  return 0;
} else {
  return EnvironmentEdgeManager.currentTime() - timeStampNextToReplicate;
}
  }

It has a if condition to check whether there are actual edits to
replicate to avoid false alarming, which is added by HBASE-21505.

The code for branch-1.4 is completely different, and since all hbase
1.x version have been EOL for quite some time, I'm not sure what is
the easier way to fix the problem, maybe you need to read the code a
bit more carefully to see how to add the above check in 1.x code line.

Thanks.

Valli  于2023年8月17日周四 23:14写道：
>
> Hi Duo Zhang
>
> Its just metrics. Because in that cluster, there is no active write. So we
> don't have any data to replicate to the another cluster.
>
>
> On Wed, 16 Aug 2023 at 08:01, 张铎(Duo Zhang)  wrote:
>
> > Is this just a metrics issue or is there an actual replication lag?
> >
> > Valli  于2023年8月11日周五 22:51写道：
> > >
> > > Hello HBase Community,
> > >
> > > We recently upgraded our HBase cluster from version 1.2.6 to 1.4.14 and
> > > have encountered an issue with replication lag in our Disaster Recovery
> > > (DR) cluster. We have two clusters in our setup: an active write cluster
> > > and a DR cluster that receives replication from the active cluster. The
> > > replication lag in the DR cluster has been building up, even though there
> > > are no direct writes to it.
> > >
> > > Here's a brief overview of the problem:
> > > - We have an active write cluster with no replication lag.
> > > - The DR cluster only receives replication from the active cluster and
> > > doesn't have direct writes.
> > > - Replication lag builds up in the DR cluster over time, even though
> > there
> > > is no active write.
> > > - When a 'put' call is made in the DR cluster, the replication lag
> > reduces
> > > momentarily, but then starts building up .
> > >
> > > We have experienced similar kind of issue in 1.4.9 version in another
> > > cluster.  We used the below patch for it.
> > >
> > > https://issues.apache.org/jira/browse/HBASE-22784
> > >
> > > But 1.4.14 version contains above patch but still we experience issue.
> > >
> > > If there are any specific configurations or adjustments we should be
> > making
> > > to address this problem. It's important for us to maintain a reliable DR
> > > setup, and any guidance or insights you can provide would be greatly
> > > appreciated.
> > >
> > > If anyone has experienced a similar issue after upgrading HBase or has
> > any
> > > recommendations on how to troubleshoot and resolve replication lag in a
> > DR
> > > cluster, please share your thoughts.
> > >
> > > Thank you in advance for your time and assistance. Your expertise and
> > > insights are invaluable to us as we work to resolve this issue and
> > maintain
> > > the stability of our HBase setup.
> > >
> > > Best regards,
> > > Manimekalai K
> > > --
> > > *Regards,*
> > > *Manimekalai K*
> >

Re: Replication Lag Issue in HBase DR Cluster after Upgrade

2023-08-15 Thread Duo Zhang

Is this just a metrics issue or is there an actual replication lag?

Valli  于2023年8月11日周五 22:51写道：
>
> Hello HBase Community,
>
> We recently upgraded our HBase cluster from version 1.2.6 to 1.4.14 and
> have encountered an issue with replication lag in our Disaster Recovery
> (DR) cluster. We have two clusters in our setup: an active write cluster
> and a DR cluster that receives replication from the active cluster. The
> replication lag in the DR cluster has been building up, even though there
> are no direct writes to it.
>
> Here's a brief overview of the problem:
> - We have an active write cluster with no replication lag.
> - The DR cluster only receives replication from the active cluster and
> doesn't have direct writes.
> - Replication lag builds up in the DR cluster over time, even though there
> is no active write.
> - When a 'put' call is made in the DR cluster, the replication lag reduces
> momentarily, but then starts building up .
>
> We have experienced similar kind of issue in 1.4.9 version in another
> cluster.  We used the below patch for it.
>
> https://issues.apache.org/jira/browse/HBASE-22784
>
> But 1.4.14 version contains above patch but still we experience issue.
>
> If there are any specific configurations or adjustments we should be making
> to address this problem. It's important for us to maintain a reliable DR
> setup, and any guidance or insights you can provide would be greatly
> appreciated.
>
> If anyone has experienced a similar issue after upgrading HBase or has any
> recommendations on how to troubleshoot and resolve replication lag in a DR
> cluster, please share your thoughts.
>
> Thank you in advance for your time and assistance. Your expertise and
> insights are invaluable to us as we work to resolve this issue and maintain
> the stability of our HBase setup.
>
> Best regards,
> Manimekalai K
> --
> *Regards,*
> *Manimekalai K*

Re: [DISCUSS] Reschedule July meetup?

2023-07-25 Thread Duo Zhang

We'd better discuss the topics we want to discuss in the online
meeting on the mailing list first, and post them out when scheduling
the meeting. In this way I think we can attract more attendees.

And for me, I prefer a night session, as I have to take care of my
children in the morning...

Thanks.

Tak Lon (Stephen) Wu  于2023年7月25日周二 14:23写道：
>
> Hi guys,
>
> We missed the July 12th meetup and I assumed no one showed up at the event.
>
> Questions to be discussed:
>
> 1. the next one will be September 27th? Would anyone like a reschedule for
> July ? e.g. the first or second week of August?
> 2. How about the time zone problem? Would folks in Asia like to have a
> morning session? If so , I think we could give it a try between 8 till 10
> am Beijing time ?
>
> Please feel free to leave any comments. (and sorry for sending multiple
> invitesI updated the invites and thought gmail would only sent to new
> invited guest)
>
> Thanks,
> Stephen

Re: [DISCUSS] Reschedule July meetup?

2023-07-25 Thread Duo Zhang

We'd better discuss the topics we want to discuss in the online
meeting on the mailing list first, and post them out when scheduling
the meeting. In this way I think we can attract more attendees.

And for me, I prefer a night session, as I have to take care of my
children in the morning...

Thanks.

Tak Lon (Stephen) Wu  于2023年7月25日周二 14:23写道：
>
> Hi guys,
>
> We missed the July 12th meetup and I assumed no one showed up at the event.
>
> Questions to be discussed:
>
> 1. the next one will be September 27th? Would anyone like a reschedule for
> July ? e.g. the first or second week of August?
> 2. How about the time zone problem? Would folks in Asia like to have a
> morning session? If so , I think we could give it a try between 8 till 10
> am Beijing time ?
>
> Please feel free to leave any comments. (and sorry for sending multiple
> invitesI updated the invites and thought gmail would only sent to new
> invited guest)
>
> Thanks,
> Stephen

[ANNOUNCE] Apache HBase 3.0.0-alpha-4 is now available for download

2023-06-08 Thread Duo Zhang

The HBase team is happy to announce the immediate availability of HBase
3.0.0-alpha-4.

Apache HBase™ is an open-source, distributed, versioned, non-relational
database. Apache HBase gives you low latency random access to billions of rows
with millions of columns atop non-specialized hardware. To learn more about
HBase, see https://hbase.apache.org/.

HBase 3.0.0-alpha-4 is the fourth alpha release in the HBase 3.x line, which is
the fourth release of our next major release line. It includes roughly 4000+
resolved issues since 2.0.0.


HBase 3.0.0-alpha-4 release will mark the final alpha version for the HBase 3.x
line. This indicates that the major features for 3.x have been finalized. The
subsequent release, 3.0.0-beta-1, will prioritize improving stability and
cleaning up the API.

Notice that this is not a production ready release. It is used to let our
users try and test the new major release, to get feedback before the final GA
release is out. So please do NOT use it in production. Just try it and report
back everything you find unusual.

Notable new features include:
  Synchronous Replication
  OpenTelemetry Tracing
  Distributed MOB Compaction
  Backup and Restore
  Decouple region replication and general replication framework, and also
make region replication can work when SKIP_WAL is used
  A new file system based replication peer storage
  Use hbase table instead of zookeeper for tracking hbase replication queue

For other important changes:
  We do not support hadoop 2.x any more
  Move RSGroup balancer to core
  Reimplement sync client on async client
  CPEPs on shaded proto
  Move the logging framework from log4j to log4j2

The full list of issues and release notes can be found here:

CHANGELOG: https://downloads.apache.org/hbase/3.0.0-alpha-4/CHANGES.md
RELEASENOTES: https://downloads.apache.org/hbase/3.0.0-alpha-4/RELEASENOTES.md

or via our issue tracker:

  https://issues.apache.org/jira/projects/HBASE/versions/12351845
  https://issues.apache.org/jira/projects/HBASE/versions/12350942
  https://issues.apache.org/jira/projects/HBASE/versions/12350250
  https://issues.apache.org/jira/projects/HBASE/versions/12332342

To download please follow the links and instructions on our website:

  https://hbase.apache.org/downloads.html

Questions, comments, and problems are always welcome at:

  d...@hbase.apache.org
  user@hbase.apache.org
  user...@hbase.apache.org

Thanks to all who contributed and made this release possible.

Cheers,
The HBase Dev Team

[ANNOUNCE] Apache HBase 3.0.0-alpha-4 is now available for download

2023-06-08 Thread Duo Zhang

The HBase team is happy to announce the immediate availability of HBase
3.0.0-alpha-4.

Apache HBase™ is an open-source, distributed, versioned, non-relational
database. Apache HBase gives you low latency random access to billions of rows
with millions of columns atop non-specialized hardware. To learn more about
HBase, see https://hbase.apache.org/.

HBase 3.0.0-alpha-4 is the fourth alpha release in the HBase 3.x line, which is
the fourth release of our next major release line. It includes roughly 4000+
resolved issues since 2.0.0.


HBase 3.0.0-alpha-4 release will mark the final alpha version for the HBase 3.x
line. This indicates that the major features for 3.x have been finalized. The
subsequent release, 3.0.0-beta-1, will prioritize improving stability and
cleaning up the API.

Notice that this is not a production ready release. It is used to let our
users try and test the new major release, to get feedback before the final GA
release is out. So please do NOT use it in production. Just try it and report
back everything you find unusual.

Notable new features include:
  Synchronous Replication
  OpenTelemetry Tracing
  Distributed MOB Compaction
  Backup and Restore
  Decouple region replication and general replication framework, and also
make region replication can work when SKIP_WAL is used
  A new file system based replication peer storage
  Use hbase table instead of zookeeper for tracking hbase replication queue

For other important changes:
  We do not support hadoop 2.x any more
  Move RSGroup balancer to core
  Reimplement sync client on async client
  CPEPs on shaded proto
  Move the logging framework from log4j to log4j2

The full list of issues and release notes can be found here:

CHANGELOG: https://downloads.apache.org/hbase/3.0.0-alpha-4/CHANGES.md
RELEASENOTES: https://downloads.apache.org/hbase/3.0.0-alpha-4/RELEASENOTES.md

or via our issue tracker:

  https://issues.apache.org/jira/projects/HBASE/versions/12351845
  https://issues.apache.org/jira/projects/HBASE/versions/12350942
  https://issues.apache.org/jira/projects/HBASE/versions/12350250
  https://issues.apache.org/jira/projects/HBASE/versions/12332342

To download please follow the links and instructions on our website:

  https://hbase.apache.org/downloads.html

Questions, comments, and problems are always welcome at:

  d...@hbase.apache.org
  u...@hbase.apache.org
  user-zh@hbase.apache.org

Thanks to all who contributed and made this release possible.

Cheers,
The HBase Dev Team

Re: Hmaster starting issue.

2023-06-07 Thread Duo Zhang

Do you want to start a single node pseudo distributed hbase cluster, or a
fully distributed cluster?

For the former one, you do not need to start hdfs or zookeeper, just type
./start-hbase.sh with default configuration is enough. If you want a fully
distributed cluster, and want it to use pre deployed zookeeper and HDFS,
please read this section in our ref guide.

https://hbase.apache.org/book.html#standalone_dist

And make sure you have subscribed to the user@hbase or user-zh@hbase
mailing list otherwise you may not receive the response.

Thanks.

Nick dan  于2023年6月7日周三 21:48写道：

> Hi,
> I'm doing Hbase 2.4.17 configuration on hadoop 3.2.4  after all the
> configuration when I'm starting start-hbase.sh on mastr node i'm able to
> see this.
> jps
> 15299 Jps
> 13764 NameNode
> 14981 HMaster
> 14296 ResourceManager
> 15177 HRegionServer
> 14876 HQuorumPeer
> 14063 SecondaryNameNode
> but after going into shell if i'm creating some table it is giving me error
> of *ERROR: KeeperErrorCode = NoNode for /hbase/master.*
> Can you help me how to start hbase.
>

Re: Call for Presentations, Community Over Code Asia 2023

2023-06-05 Thread Duo Zhang

BTW, I've already submitted a representation about the new features
introduced in HBase recently.

Title: What's new in the recent and upcoming HBase releases
Description:
Apache HBase™ is the Hadoop database, a distributed, scalable, big data
store.
The HBase community is preparing new major release 3.0.0 and new minor
release 2.6.0, with some brand new features. In this presentation, we will
introduce these new features, about how they benefit our users and how we
implement them in HBase. Like:
- OpenTelemetry Tracing: HBASE-22120, HBASE-26419
- The new region replication framework: HBASE-26233
- Move the logging framework from log4j to log4j2: HBASE-19577
- Cloud native supports
  - Introduce StoreFileTracker for better object storage support:
HBASE-26067, HBASE-26584
  - K8s deployment support: HBASE-27827, in progress
  - No persistent data on zookeeper
- Move meta location off zookeeper: HBASE-26193
- Table based replication queue storage: HBASE-27109
- File system based replication peer storage: HBASE-27110
  - redeploy cluster with only root directory on object storage: HBASE-26245
- Future?
  - HBase on Ozone
- WAL on Ozone: HBASE-27740
  - Fully cloud native, no other self deployed services other than HBase
itself
- New WAL implementation: BookKeeper? Embedded WAL service?

Plan to create a google doc online so others in the community could also
help polishing the slides, as I do not know all the details about these
features.

Thanks.

Duo Zhang  于2023年6月6日周二 10:36写道：

> FYI. The CFP deadline has been extended to 6.18, please submit your
> representations :)
>
> 会议的中文介绍见：https://www.bagevent.com/event/8409854
>
>
>
> -- Forwarded message -
> 发件人： Rich Bowen 
> Date: 2023年6月6日周二 00:09
> Subject: Call for Presentations, Community Over Code Asia 2023
> To: rbo...@apache.org 
>
>
> You are receiving this message because you are subscribed to one more
> more developer mailing lists at the Apache Software Foundation.
>
> The call for presentations is now open at
> "https://apachecon.com/acasia2023/cfp.html;, and will be closed by
> Sunday, Jun 18th, 2023 11:59 PM GMT.
>
> The event will be held in Beijing, China, August 18-20, 2023.
>
> We are looking for presentations about anything relating to Apache
> Software Foundation projects, open-source governance, community, and
> software development.
> In particular, this year we are building content tracks around the
> following specific topics/projects:
>
> AI / Machine learning
> API / Microservice
> Community
> CloudNative
> Data Storage & Computing
> DataOps
> Data Lake & Data Warehouse
> OLAP & Data Analysis
> Performance Engineering
> Incubator
> IoT/IIoT
> Messaging
> RPC
> Streaming
> Workflow / Data Processing
> Web Server / Tomcat
>
> If your proposed presentation falls into one of these categories,
> please select that topic in the CFP entry form. Or select Others if
> it’s related to another topic or project area.
>
> Looking forward to hearing from you!
>
> Willem Jiang, and the Community Over Code planners
>
>

Re: Call for Presentations, Community Over Code Asia 2023

2023-06-05 Thread Duo Zhang

BTW, I've already submitted a representation about the new features
introduced in HBase recently.

Title: What's new in the recent and upcoming HBase releases
Description:
Apache HBase™ is the Hadoop database, a distributed, scalable, big data
store.
The HBase community is preparing new major release 3.0.0 and new minor
release 2.6.0, with some brand new features. In this presentation, we will
introduce these new features, about how they benefit our users and how we
implement them in HBase. Like:
- OpenTelemetry Tracing: HBASE-22120, HBASE-26419
- The new region replication framework: HBASE-26233
- Move the logging framework from log4j to log4j2: HBASE-19577
- Cloud native supports
  - Introduce StoreFileTracker for better object storage support:
HBASE-26067, HBASE-26584
  - K8s deployment support: HBASE-27827, in progress
  - No persistent data on zookeeper
- Move meta location off zookeeper: HBASE-26193
- Table based replication queue storage: HBASE-27109
- File system based replication peer storage: HBASE-27110
  - redeploy cluster with only root directory on object storage: HBASE-26245
- Future?
  - HBase on Ozone
- WAL on Ozone: HBASE-27740
  - Fully cloud native, no other self deployed services other than HBase
itself
- New WAL implementation: BookKeeper? Embedded WAL service?

Plan to create a google doc online so others in the community could also
help polishing the slides, as I do not know all the details about these
features.

Thanks.

Duo Zhang  于2023年6月6日周二 10:36写道：

> FYI. The CFP deadline has been extended to 6.18, please submit your
> representations :)
>
> 会议的中文介绍见：https://www.bagevent.com/event/8409854
>
>
>
> -- Forwarded message -
> 发件人： Rich Bowen 
> Date: 2023年6月6日周二 00:09
> Subject: Call for Presentations, Community Over Code Asia 2023
> To: rbo...@apache.org 
>
>
> You are receiving this message because you are subscribed to one more
> more developer mailing lists at the Apache Software Foundation.
>
> The call for presentations is now open at
> "https://apachecon.com/acasia2023/cfp.html;, and will be closed by
> Sunday, Jun 18th, 2023 11:59 PM GMT.
>
> The event will be held in Beijing, China, August 18-20, 2023.
>
> We are looking for presentations about anything relating to Apache
> Software Foundation projects, open-source governance, community, and
> software development.
> In particular, this year we are building content tracks around the
> following specific topics/projects:
>
> AI / Machine learning
> API / Microservice
> Community
> CloudNative
> Data Storage & Computing
> DataOps
> Data Lake & Data Warehouse
> OLAP & Data Analysis
> Performance Engineering
> Incubator
> IoT/IIoT
> Messaging
> RPC
> Streaming
> Workflow / Data Processing
> Web Server / Tomcat
>
> If your proposed presentation falls into one of these categories,
> please select that topic in the CFP entry form. Or select Others if
> it’s related to another topic or project area.
>
> Looking forward to hearing from you!
>
> Willem Jiang, and the Community Over Code planners
>
>

Fwd: Call for Presentations, Community Over Code Asia 2023

2023-06-05 Thread Duo Zhang

FYI. The CFP deadline has been extended to 6.18, please submit your
representations :)

会议的中文介绍见：https://www.bagevent.com/event/8409854



-- Forwarded message -
发件人： Rich Bowen 
Date: 2023年6月6日周二 00:09
Subject: Call for Presentations, Community Over Code Asia 2023
To: rbo...@apache.org 


You are receiving this message because you are subscribed to one more
more developer mailing lists at the Apache Software Foundation.

The call for presentations is now open at
"https://apachecon.com/acasia2023/cfp.html;, and will be closed by
Sunday, Jun 18th, 2023 11:59 PM GMT.

The event will be held in Beijing, China, August 18-20, 2023.

We are looking for presentations about anything relating to Apache
Software Foundation projects, open-source governance, community, and
software development.
In particular, this year we are building content tracks around the
following specific topics/projects:

AI / Machine learning
API / Microservice
Community
CloudNative
Data Storage & Computing
DataOps
Data Lake & Data Warehouse
OLAP & Data Analysis
Performance Engineering
Incubator
IoT/IIoT
Messaging
RPC
Streaming
Workflow / Data Processing
Web Server / Tomcat

If your proposed presentation falls into one of these categories,
please select that topic in the CFP entry form. Or select Others if
it’s related to another topic or project area.

Looking forward to hearing from you!

Willem Jiang, and the Community Over Code planners

Fwd: Call for Presentations, Community Over Code Asia 2023

2023-06-05 Thread Duo Zhang

FYI. The CFP deadline has been extended to 6.18, please submit your
representations :)

会议的中文介绍见：https://www.bagevent.com/event/8409854



-- Forwarded message -
发件人： Rich Bowen 
Date: 2023年6月6日周二 00:09
Subject: Call for Presentations, Community Over Code Asia 2023
To: rbo...@apache.org 


You are receiving this message because you are subscribed to one more
more developer mailing lists at the Apache Software Foundation.

The call for presentations is now open at
"https://apachecon.com/acasia2023/cfp.html;, and will be closed by
Sunday, Jun 18th, 2023 11:59 PM GMT.

The event will be held in Beijing, China, August 18-20, 2023.

We are looking for presentations about anything relating to Apache
Software Foundation projects, open-source governance, community, and
software development.
In particular, this year we are building content tracks around the
following specific topics/projects:

AI / Machine learning
API / Microservice
Community
CloudNative
Data Storage & Computing
DataOps
Data Lake & Data Warehouse
OLAP & Data Analysis
Performance Engineering
Incubator
IoT/IIoT
Messaging
RPC
Streaming
Workflow / Data Processing
Web Server / Tomcat

If your proposed presentation falls into one of these categories,
please select that topic in the CFP entry form. Or select Others if
it’s related to another topic or project area.

Looking forward to hearing from you!

Willem Jiang, and the Community Over Code planners

Re: HBase2 Orphan Regions on RegionServer

2023-06-05 Thread Duo Zhang

建议 grep 一下 master 的日志，这种大概率是已经被合并的 region，只是没有正确的 offline，可以看看当前表里和他重叠的那个
region 是怎么合并出来的

在 2.3 以后这种情况会少很多，主要是 AM-v2 和 procedure 存储的部分做了不少改进和优化

leojie  于2023年6月5日周一 16:09写道：

> hi all
> 请教社区一些HBase2中 Orphan Regions的问题，使用的hbase版本是2.2.6，hadoop版本是3.3.2
> HBase的hbck report页面出现了一些Orphan Regions，
> Orphan Regions on RegionServer
>
> 126 region(s) in set.
> Region NameReported Online RegionServer
> newptc_log,09c83c3e,1628548002412.5db648586560d01bcc5e4ae26348f14c.
> node27.hadoop,60020,1679982363818 
> Orphan Regions on FileSystem
>
> 164 region(s) in set.
> Region Encoded NameFileSystem Path
> 5db648586560d01bcc5e4ae26348f14c
> hdfs://hadoop-namenode/hbase/data/default/newptc_log
> /5db648586560d01bcc5e4ae26348f14c
>
> 这些region的特点是：
>
> 1.  孤儿region的元数据信息在hbase:meta表中不存在，但仍会被一些RS报告，RS界面上可以搜索到这个region的信息，
> 2. 孤儿region对应的HBase表region是完整的，不存在region重叠或region空洞，表依旧正常被读写
> 3. region的hdfs目录结构如下：
> [root@hadoop-operator~]# sudo -uhbase hdfs dfs -ls -R
> /hbase/data/default/newptc_log/5db648586560d01bcc5e4ae26348f14c
> -rw-r--r--   3 hbase hbase109 2023-04-06 10:37
> /hbase/data/default/newptc_log/5db648586560d01bcc5e4ae26348f14c/.regioninfo
> drwxr-xr-x   - hbase hbase  0 2023-04-06 10:38
> /hbase/data/default/newptc_log/5db648586560d01bcc5e4ae26348f14c/.tmp
> drwxr-xr-x   - hbase hbase  0 2023-04-17 18:10
> /hbase/data/default/newptc_log/5db648586560d01bcc5e4ae26348f14c/.tmp/task
> drwxr-xr-x   - hbase hbase  0 2023-04-06 10:37
>
> /hbase/data/default/newptc_log/5db648586560d01bcc5e4ae26348f14c/recovered.edits
> -rw-r--r--   3 hbase hbase  0 2023-04-06 10:37
>
> /hbase/data/default/newptc_log/5db648586560d01bcc5e4ae26348f14c/recovered.edits/3461430.seqid
> drwxr-xr-x   - hbase hbase  0 2023-04-17 18:10
> /hbase/data/default/newptc_log/5db648586560d01bcc5e4ae26348f14c/task
> -rw-r--r--   3 hbase hbase 121250 2023-04-17 17:06
>
> /hbase/data/default/newptc_log/5db648586560d01bcc5e4ae26348f14c/task/046f0b1f772e40e59131251eb6d6e44f
>
> 这些region下的hfile size大小不为0（个别是个空目录），我使用
>
> HFile.Reader读出这些hfile，与表中数据进行比对，发现这些孤儿region对应的数据不存在表里，（表没有设置TTL），且这些孤儿region
> 对应的HFILE中的数据都有Put标识（非delete），因此不确定这些孤儿region下的数据是否还有用。
> 我按照页面提示的步骤回放孤儿region目录，发现并不成功：
> First make sure *hbase:meta* is in a healthy state; run *hbck2 fixMeta* to
> be sure. Once this is done, per Region below, run a bulk load -- *$ hbase
> completebulkload REGION_DIR_PATH TABLE_NAME* -- and then delete the
> desiccated directory content (HFiles are removed upon successful load; all
> that is left are empty directories and occasionally a seqid marking file).
>
>
> 请教各位大佬，出现大面积孤儿region的根本原因是什么（发生这个问题之前，我们只是对表进行了大批量的小region合并操作，可能跟这个操作相关），是否有相关ISSUE修复？该如何确定这些孤儿region的hfile是否是冗余数据呢？
> 当重启RS时，这些孤儿region就会消失了，不确定是否会导致数据丢失。
>

Re: TimeoutIOException: Failed to get sync result after 300000 ms for txid=16920651960, WAL system stuck?

2023-05-31 Thread Duo Zhang

第一个 issue 的情况是有一个 DN 返回的速度比别的 DN 都快然后他又挂了，就可能会卡住
第二个 issue 是说 shutdown WAL 的时候可能会卡住，这个主要是导致 RegionServer 退出不了

应该在 2.4.10 之后的版本都修复了，你可以试试

leojie  于2023年6月1日周四 09:34写道：

> 非常感谢张老师之前的解答，在ISSUE列表中我找到了如下patch:
> https://issues.apache.org/jira/browse/HBASE-26679 通过测试用例可以稳定复现我们使用版本的阻塞异常
> https://issues.apache.org/jira/browse/HBASE-25905 这个是您提交的修复，wal
> sync阻塞时候会立即中断RegionServer进程，不知道我理解的是否有误，我先应用这两个修复，再观察下集群情况
>
> leojie  于2023年5月15日周一 15:51写道：
>
> > 感谢张老师的回复，出现该问题期间，有张比较大的表在进行快照scan，资源并发数设置的比较大，引起read block
> > ops特别高，导致DN有压力（与历史对比，个别DN的XceiverCount指标持续异常高），
> > 停止该任务后，这两天集群指标比较稳定了，没有发现WAL system stuck?的异常了。
> >
> > 感觉DN有压力是引起此现象的一个导火索，但可能不是sync
> > wal操作hang住一两个小时后，才导致RS挂掉的直接原因，我试试升级jdk，再研读下这块的代码
> >
> > 张铎(Duo Zhang)  于2023年5月13日周六 18:58写道：
> >
> >> 这看起来还是网络有抖动所以连接断了？建议还是查查出这个 log 时候集群的各种指标
> >>
> >> 也可以尝试升级一下 JDK 吧
> >>
> >> https://bugs.openjdk.org/browse/JDK-8215355
> >>
> >> 这个是 8u251 之后才 fix 的，虽然这里写的是用 jemalloc 才会出现，但我之前也疑似遇到过，升级之后就再也没有了
> >>
> >> leojie  于2023年5月11日周四 12:03写道：
> >>
> >> > java 的版本是：1.8.0_131
> >> >
> >> > 翻了翻异常时间点之前的日志，发现有如下相关报错：
> >> > 2023-05-11 10:59:32,711 INFO  [HBase-Metrics2-1]
> impl.MetricsSystemImpl:
> >> > Stopping HBase metrics system...
> >> > 2023-05-11 10:59:32,728 INFO  [ganglia] impl.MetricsSinkAdapter:
> ganglia
> >> > thread interrupted.
> >> > 2023-05-11 10:59:32,728 INFO  [HBase-Metrics2-1]
> impl.MetricsSystemImpl:
> >> > HBase metrics system stopped.
> >> > 2023-05-11 10:59:33,038 WARN
> >> >  [AsyncFSWAL-0-hdfs://hadoop-bdxs-hb1-namenode/hbase] wal.AsyncFSWAL:
> >> sync
> >> > failed
> >> > java.io.IOException: stream already broken
> >> > at
> >> > org.apache.hadoop.hbase.io
> >> >
> >>
> .asyncfs.FanOutOneBlockAsyncDFSOutput.flush0(FanOutOneBlockAsyncDFSOutput.java:423)
> >> > at
> >> > org.apache.hadoop.hbase.io
> >> >
> >>
> .asyncfs.FanOutOneBlockAsyncDFSOutput.flush(FanOutOneBlockAsyncDFSOutput.java:512)
> >> > at
> >> >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.sync(AsyncProtobufLogWriter.java:148)
> >> > at
> >> >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:379)
> >> > at
> >> >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.consume(AsyncFSWAL.java:566)
> >> > at
> >> >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >> > at
> >> >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >> > at java.lang.Thread.run(Thread.java:748)
> >> > 2023-05-11 10:59:33,040 WARN
> >> >  [AsyncFSWAL-0-hdfs://hadoop-bdxs-hb1-namenode/hbase] wal.AsyncFSWAL:
> >> sync
> >> > failed
> >> > java.io.IOException: stream already broken
> >> > at
> >> > org.apache.hadoop.hbase.io
> >> >
> >>
> .asyncfs.FanOutOneBlockAsyncDFSOutput.flush0(FanOutOneBlockAsyncDFSOutput.java:423)
> >> > at
> >> > org.apache.hadoop.hbase.io
> >> >
> >>
> .asyncfs.FanOutOneBlockAsyncDFSOutput.flush(FanOutOneBlockAsyncDFSOutput.java:512)
> >> > at
> >> >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.sync(AsyncProtobufLogWriter.java:148)
> >> > at
> >> >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:379)
> >> > at
> >> >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.consume(AsyncFSWAL.java:566)
> >> > at
> >> >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >> > at
> >> >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >> > at java.lang.Thread.run(Thread.java:748)
> >> > 2023-05-11 10:59:33,040 WARN
> >> >  [AsyncFSWAL-0-hdfs://hadoop-bdxs-hb1-namenode/hbase] wal.AsyncFSWAL:
> >> sync
> >> > failed
> >> >
> >>
> org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoEx

Re: TimeoutIOException: Failed to get sync result after 300000 ms for txid=16920651960, WAL system stuck?

2023-05-13 Thread Duo Zhang

> org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException:
> syscall:read(..) failed: Connection reset by peer
> at
>
> org.apache.hbase.thirdparty.io.netty.channel.unix.FileDescriptor.readAddress(..)(Unknown
> Source)
> 2023-05-11 10:59:33,040 WARN
>  [AsyncFSWAL-0-hdfs://hadoop-bdxs-hb1-namenode/hbase] wal.AsyncFSWAL: sync
> failed
> org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException:
> syscall:read(..) failed: Connection reset by peer
> at
>
> org.apache.hbase.thirdparty.io.netty.channel.unix.FileDescriptor.readAddress(..)(Unknown
> Source)
> 2023-05-11 10:59:33,040 WARN
>  [AsyncFSWAL-0-hdfs://hadoop-bdxs-hb1-namenode/hbase] wal.AsyncFSWAL: sync
> failed
> org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException:
> syscall:read(..) failed: Connection reset by peer
> at
>
> org.apache.hbase.thirdparty.io.netty.channel.unix.FileDescriptor.readAddress(..)(Unknown
> Source)
> 2023-05-11 10:59:33,040 WARN
>  [AsyncFSWAL-0-hdfs://hadoop-bdxs-hb1-namenode/hbase] wal.AsyncFSWAL: sync
> failed
> org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException:
> syscall:read(..) failed: Connection reset by peer
> at
>
> org.apache.hbase.thirdparty.io.netty.channel.unix.FileDescriptor.readAddress(..)(Unknown
> Source)
> 2023-05-11 10:59:33,040 WARN
>  [AsyncFSWAL-0-hdfs://hadoop-bdxs-hb1-namenode/hbase] wal.AsyncFSWAL: sync
> failed
> org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException:
> syscall:read(..) failed: Connection reset by peer
> at
>
> org.apache.hbase.thirdparty.io.netty.channel.unix.FileDescriptor.readAddress(..)(Unknown
> Source)
> 2023-05-11 10:59:33,040 WARN
>  [AsyncFSWAL-0-hdfs://hadoop-bdxs-hb1-namenode/hbase] wal.AsyncFSWAL: sync
> failed
> org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException:
> syscall:read(..) failed: Connection reset by peer
> at
>
> org.apache.hbase.thirdparty.io.netty.channel.unix.FileDescriptor.readAddress(..)(Unknown
> Source)
> 2023-05-11 10:59:33,229 INFO  [HBase-Metrics2-1] impl.MetricsConfig: Loaded
> properties from hadoop-metrics2-hbase.properties
> 2023-05-11 10:59:33,231 INFO  [HBase-Metrics2-1] impl.MetricsSinkAdapter:
> Sink ganglia started
> 2023-05-11 10:59:33,257 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl:
> Scheduled Metric snapshot period at 20 second(s).
> 2023-05-11 10:59:33,257 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl:
> HBase metrics system started
>
>
>
> 张铎(Duo Zhang)  于2023年5月11日周四 10:25写道：
>
> > 你往上翻翻有没有别的异常？这个看起来应该就是 AsyncFSWAL 有 bug 导致 hang 住不动了，不过我翻了一下，2.2.6
> > 之后似乎没有跟这个有关的 fix 了，之前倒是有一些。
> >
> > 另外你的 jdk 版本是多少？我印象里 jdk8 早期版本 synchronized 有个 bug 可能会导致执行顺序错乱
> >
> > leojie  于2023年5月9日周二 14:45写道：
> >
> > > hi all
> > > 向社区求助一个问题，这两天总是在12:50左右遇到一个异常，描述如下：
> > > hbase版本：2.2.6
> > > hadoop版本：3.3.1
> > > 异常现象：一个隔离组下的（只有一张表）的一个节点，在某一时刻write call
> > > queue阻塞，阻塞时间点开始，这张表的读写qps都降为0，客户端读写不了该表，RS call
> queue阻塞开始的时间点，日志中不断有如下报错：
> > > 2023-05-08 12:42:27,310 ERROR [MemStoreFlusher.2]
> > > regionserver.MemStoreFlusher: Cache flush failed for region
> > >
> > >
> >
> user_feature_v2,eacf_1658057555,1660314723816.2376cc2326b5372131cc530b115d959a.
> > > org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get
> sync
> > > result after 30 ms for txid=16920651960, WAL system stuck?
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:155)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:743)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:625)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:602)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.doSyncOfUnflushedWALChanges(HRegion.java:2754)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2691)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2549)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2523)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2409)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:611)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:580)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:360)
> > > at java.lang.Thread.run(Thread.java:748)
> > >
> > >
> >
> 节点memstore中无法刷新数据到WAL文件中，节点其他指标都正常，HDFS也没有压力。重启阻塞节点后，表恢复正常。异常期间，捕获的jstack文件我放进附件中了。
> > > 麻烦社区大佬有空帮忙定位下原因
> > > jstack 文件见ISSUE： https://issues.apache.org/jira/browse/HBASE-27850的附件
> > >
> >
>

Re: TimeoutIOException: Failed to get sync result after 300000 ms for txid=16920651960, WAL system stuck?

2023-05-10 Thread Duo Zhang

你往上翻翻有没有别的异常？这个看起来应该就是 AsyncFSWAL 有 bug 导致 hang 住不动了，不过我翻了一下，2.2.6
之后似乎没有跟这个有关的 fix 了，之前倒是有一些。

另外你的 jdk 版本是多少？我印象里 jdk8 早期版本 synchronized 有个 bug 可能会导致执行顺序错乱

leojie  于2023年5月9日周二 14:45写道：

> hi all
> 向社区求助一个问题，这两天总是在12:50左右遇到一个异常，描述如下：
> hbase版本：2.2.6
> hadoop版本：3.3.1
> 异常现象：一个隔离组下的（只有一张表）的一个节点，在某一时刻write call
> queue阻塞，阻塞时间点开始，这张表的读写qps都降为0，客户端读写不了该表，RS call queue阻塞开始的时间点，日志中不断有如下报错：
> 2023-05-08 12:42:27,310 ERROR [MemStoreFlusher.2]
> regionserver.MemStoreFlusher: Cache flush failed for region
>
> user_feature_v2,eacf_1658057555,1660314723816.2376cc2326b5372131cc530b115d959a.
> org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync
> result after 30 ms for txid=16920651960, WAL system stuck?
> at
>
> org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:155)
> at
>
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:743)
> at
>
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:625)
> at
>
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:602)
> at
>
> org.apache.hadoop.hbase.regionserver.HRegion.doSyncOfUnflushedWALChanges(HRegion.java:2754)
> at
>
> org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2691)
> at
>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2549)
> at
>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2523)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2409)
> at
>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:611)
> at
>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:580)
> at
>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68)
> at
>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:360)
> at java.lang.Thread.run(Thread.java:748)
>
> 节点memstore中无法刷新数据到WAL文件中，节点其他指标都正常，HDFS也没有压力。重启阻塞节点后，表恢复正常。异常期间，捕获的jstack文件我放进附件中了。
> 麻烦社区大佬有空帮忙定位下原因
> jstack 文件见ISSUE： https://issues.apache.org/jira/browse/HBASE-27850的附件
>

Re: 关于offPeakCompactionTracker的疑惑

2023-05-04 Thread Duo Zhang

可以的，建 issue 吧，先不着急改具体代码
或者你可以内部先改改测试一下，然后把结果啥的也贴到 issue 上

章啸  于2023年5月4日周四 13:35写道：

> 我是发现OffPeak compaction并不能充分利用低峰期的集群资源去合并高峰期生成的hfile，因为同一时间只能有一个 off peak
> compact。
> 似乎这样并不是很合理，我想尝试修改这里（去掉static，改成一个store
> 内同一时间只有一个），但是不理解这样设计的初衷，以及我这样修改是否有问题。
> 我想我可以创建一个jira，把我的修改思路提上来。
>
> > 在 2023年5月4日，11:25，张铎  写道：
> >
> > 我 blame 翻了一下
> >
> >
> https://github.com/apache/hbase/blob/5998a0f349824adf823f79a52530e97dfc624b92/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/OffPeakCompactions.java
> >
> > 这个 AtomicBoolean 的作用其实就是替代这个文件里的一些逻辑的，在 HBASE-7437 的改动里把这个文件去掉了
> >
> >
> https://github.com/apache/hbase/commit/c9d33bef3f74cc771be1574db191666c2bc043d2#diff-bb21d9a53c6b006a954b4a981483fae7dae1c635298f24d208c6be80df1153a4
> >
> > 你可以看他的注释解释，意思就是说 OffPeak 的 compaction
> > 个数统计是全局的，同一时间只能有一个，可以看下面那个 tryStartOffPeakRequest 的实现
> >
> > 这个代码已经是十年之前的了，如果觉得不合适的也可以讨论修改。你具体是遇到了啥问题？
> >
> >  于2023年4月23日周日 09:30写道：
> >
> >>
> >>
> Hi，各位社区的大佬们。关于offPeakCompaction我有一个疑惑，在HStore中有一个static修饰的成员，这是HBASE-7437优化HBASE-7822中的bug而引入的。
> >>
> >> private static final AtomicBoolean offPeakCompactionTracker = new
> >> AtomicBoolean();
> >>
> >>
> 然后在请求compaction时，同一个rs中的不同store需要来抢着这个offPeakCompactionTracker，这样在低峰期，同一个时刻只能有一个store使用offpeak
> >> compaction的参数配置来运行compaction。
> >>
> >> // Normal case - coprocessor is not overriding file selection.
> >> if (!compaction.hasSelection()) {
> >>  boolean isUserCompaction = priority == Store.PRIORITY_USER;
> >>  boolean mayUseOffPeak =
> >>offPeakHours.isOffPeakHour() &&
> >> offPeakCompactionTracker.compareAndSet(false, true);
> >>  try {
> >>compaction.select(this.filesCompacting, isUserCompaction,
> >> mayUseOffPeak,
> >>  forceMajor && filesCompacting.isEmpty());
> >>  } catch (IOException e) {
> >>if (mayUseOffPeak) {
> >>  offPeakCompactionTracker.set(false);
> >>}
> >>throw e;
> >>  }
> >>  assert compaction.hasSelection();
> >>  if (mayUseOffPeak && !compaction.getRequest().isOffPeak()) {
> >>// Compaction policy doesn't want to take advantage of off-peak.
> >>offPeakCompactionTracker.set(false);
> >>  }
> >> }
> >>
> >>
> >> 对于这里，我有几个疑惑：
> >>
> >> 1. 为啥offpeak compaction需要做成rs级别不同的store之间互斥？（对此，我没有翻找到任何相关的jira或者设计文档。）
> >> 2. 如果去掉static的修饰，会有什么问题？
> >>
> >>
>

Re: 关于offPeakCompactionTracker的疑惑

2023-05-03 Thread Duo Zhang

我 blame 翻了一下

https://github.com/apache/hbase/blob/5998a0f349824adf823f79a52530e97dfc624b92/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/OffPeakCompactions.java

这个 AtomicBoolean 的作用其实就是替代这个文件里的一些逻辑的，在 HBASE-7437 的改动里把这个文件去掉了

https://github.com/apache/hbase/commit/c9d33bef3f74cc771be1574db191666c2bc043d2#diff-bb21d9a53c6b006a954b4a981483fae7dae1c635298f24d208c6be80df1153a4

你可以看他的注释解释，意思就是说 OffPeak 的 compaction
个数统计是全局的，同一时间只能有一个，可以看下面那个 tryStartOffPeakRequest 的实现

这个代码已经是十年之前的了，如果觉得不合适的也可以讨论修改。你具体是遇到了啥问题？

 于2023年4月23日周日 09:30写道：

>
> Hi，各位社区的大佬们。关于offPeakCompaction我有一个疑惑，在HStore中有一个static修饰的成员，这是HBASE-7437优化HBASE-7822中的bug而引入的。
>
> private static final AtomicBoolean offPeakCompactionTracker = new
> AtomicBoolean();
>
> 然后在请求compaction时，同一个rs中的不同store需要来抢着这个offPeakCompactionTracker，这样在低峰期，同一个时刻只能有一个store使用offpeak
> compaction的参数配置来运行compaction。
>
> // Normal case - coprocessor is not overriding file selection.
> if (!compaction.hasSelection()) {
>   boolean isUserCompaction = priority == Store.PRIORITY_USER;
>   boolean mayUseOffPeak =
> offPeakHours.isOffPeakHour() &&
> offPeakCompactionTracker.compareAndSet(false, true);
>   try {
> compaction.select(this.filesCompacting, isUserCompaction,
> mayUseOffPeak,
>   forceMajor && filesCompacting.isEmpty());
>   } catch (IOException e) {
> if (mayUseOffPeak) {
>   offPeakCompactionTracker.set(false);
> }
> throw e;
>   }
>   assert compaction.hasSelection();
>   if (mayUseOffPeak && !compaction.getRequest().isOffPeak()) {
> // Compaction policy doesn't want to take advantage of off-peak.
> offPeakCompactionTracker.set(false);
>   }
> }
>
>
> 对于这里，我有几个疑惑：
>
> 1. 为啥offpeak compaction需要做成rs级别不同的store之间互斥？（对此，我没有翻找到任何相关的jira或者设计文档。）
> 2. 如果去掉static的修饰，会有什么问题？
>
>

Re: [ANNOUNCE] New HBase committer Nihal Jain

2023-05-03 Thread Duo Zhang

Congratulations!

Viraj Jasani  于2023年5月3日周三 23:47写道：

> Congratulations Nihal!! Very well deserved!!
>
> On Wed, May 3, 2023 at 5:12 AM Nick Dimiduk  wrote:
>
> > Hello!
> >
> > On behalf of the Apache HBase PMC, I am pleased to announce that Nihal
> Jain
> > has accepted the PMC's invitation to become a committer on the project.
> We
> > appreciate all of Nihal's generous contributions thus far and look
> forward
> > to his continued involvement.
> >
> > Congratulations and welcome, Nihal Jain!
> >
> > Thanks,
> > Nick
> >
>

Re: [ANNOUNCE] New HBase committer Nihal Jain

2023-05-03 Thread Duo Zhang

Congratulations!

Viraj Jasani  于2023年5月3日周三 23:47写道：

> Congratulations Nihal!! Very well deserved!!
>
> On Wed, May 3, 2023 at 5:12 AM Nick Dimiduk  wrote:
>
> > Hello!
> >
> > On behalf of the Apache HBase PMC, I am pleased to announce that Nihal
> Jain
> > has accepted the PMC's invitation to become a committer on the project.
> We
> > appreciate all of Nihal's generous contributions thus far and look
> forward
> > to his continued involvement.
> >
> > Congratulations and welcome, Nihal Jain!
> >
> > Thanks,
> > Nick
> >
>

Re: Re: Re: hbase集群单个region flush时间过长问题求助

2023-04-02 Thread Duo Zhang

2.1 上的代码我记的不太清楚了，不过 region assign 相关 procedure 在 2.2 重新写了一遍，之前确实有 bug
很难搞定，常见的就是 region 不下线，或者一个 region 被 assign 到两台 RS 上，只能重启搞定。另外后来又在 2.3 上把存储
procedure 的部分重新实现了一遍，才算比较稳定。

所以 2.1 在大集群特别是有些过载的情况下跑 merge 确实有可能出 bug，hbck 的处理没啥问题，也只能这样搞。。。

邢*  于2023年3月31日周五 15:15写道：

>
> 感谢张老师的回复，我们后续对这张大表按照您的建议操作一下，再观察观察是否还有此类问题。然后我们之前也是由于region数量太多，手动操作过合并hbase集群的region，但是遇到了hole问题，想再麻烦您帮忙看一下。
>
>
>
> 1、问题描述:
> 手动合并两个region后出现hole，在用hbck2修复过程中出现region多分配问题。当时情况表现为hbase集群读写均无问题，但是在另外一个rs上出现了一个多余的region。此region在meta中无数据，hdfs上有目录但regioninfo缺失，且多余region对应的表compact和snapshot会失败，当时是把多余的region的rs上region迁移走，然后对rs进行了重启，重启完成后多余region消失了。
>
>
>
>
>
> 2、操作过程
>
> 合并region命令：
>
> merge_region
> 'sdhz_phone_info_realtime,363:2H9S49dd1UNDbnX+3vVj7g==,1627405790553.ba670bd86dc51211e50bf05995f9b8c5.','sdhz_phone_info_realtime,371:1xxhPgZ9BuzE29InP4QrWA==,1627405790553.3f8854e2e437b0469ef735b08e3abd05.'
>
>
>
> hole问题处理过程：
>
> 使用hbck2修复
>
> - 查询不一致原因：hbase hbck sdhz_phone_info_realtime
>
> - 修复元数据：hbase hbck -j hbase-hbck2-1.0.0-SNAPSHOT.jar
> addFsRegionsMissingInMeta default:sdhz_phone_info_realtime
>
> - 重启master
>
> - 重新分配region:hbase hbck -j hbase-hbck2-1.0.0-SNAPSHOT.jar assigns
> 23341c80a5bff8834bf1c6cdcc06
>
>
>
> 多余region问题修复过程：
>
> 1. 关闭线上balance
>
> 2. 依次move rs-13上的 region
>
> 3. 测试迁移后的region是否可用
>
> 4. 存在多余region的rs-13重启，重启完成后多余region消失
>
> 5. rs-13解除授权，机器下线
>
>
>
> 3、想得到解答的问题：
>
> 为什么手动合并region会出问题？
>
> hole问题使用hbck2修复的方式是否有误？
>
>
>
>
>
>
>
>
>

Re: 大表快照scan场景中，遇到快照删除，但是快照引用的hfile未删除的现象

2023-04-02 Thread Duo Zhang

嗯，逻辑上没问题，就是写法，你可以先在自己的仓库上修了试一下

leojie  于2023年4月3日周一 09:04写道：

> 感谢张老师的回复，这个PR是我们组之前一个同事提的，，这个修复没啥问题，应该是可以直接应用的吧
>
> 张铎(Duo Zhang)  于2023年3月30日周四 12:09写道：
>
> > https://issues.apache.org/jira/browse/HBASE-25510
> >
> > 这个，不过还没合，也很久没动静了，我再跟一下吧
> >
> > leojie  于2023年3月30日周四 11:43写道：
> >
> > > 好的，感谢张老师回复，我搜下ISSUE列表
> > >
> > > 张铎(Duo Zhang)  于2023年3月29日周三 21:44写道：
> > >
> > > > 这都是 cleaner 的线程，能跑到这里说明还是在扫描看文件是否可以被删除的，并不是判断有 snapshot 正在执行就直接跳过执行？
> > > >
> > > > 可以看看是不是就是删的速度跟不上之类的，我印象里好像 TableName 里那个 cache 的性能似乎比较成问题，有个相关的 issue
> > > >
> > > > leojie  于2023年3月29日周三 15:06写道：
> > > >
> > > > > hi 张老师
> > > > > 用arthas打印了blocked的线程，快照清理线程卡住的情况如下:
> > > > > "dir-scan-pool4-thread-10" Id=1448 RUNNABLE
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.TableName.createTableNameIfNecessary(TableName.java:377)
> > > > > at
> org.apache.hadoop.hbase.TableName.valueOf(TableName.java:505)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toTableName(ProtobufUtil.java:2175)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toRegionInfo(ProtobufUtil.java:3114)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitRegionStoreFiles(SnapshotReferenceUtil.java:134)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitTableStoreFiles(SnapshotReferenceUtil.java:121)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.getHFileNames(SnapshotReferenceUtil.java:348)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.getHFileNames(SnapshotReferenceUtil.java:331)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner$1.filesUnderSnapshot(SnapshotHFileCleaner.java:108)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.getSnapshotsInProgress(SnapshotFileCache.java:285)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.getUnreferencedFiles(SnapshotFileCache.java:215)
> > > > > -  locked
> > > > > org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache@ab834f8
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner.getDeletableFiles(SnapshotHFileCleaner.java:69)
> > > > > -  locked
> > > > >
> org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner@5c750233
> > > > > < but blocks 9 other threads!
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteFiles(CleanerChore.java:295)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.lambda$traverseAndDelete$1(CleanerChore.java:387)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.cleaner.CleanerChore$$Lambda$323/299183520.act(Unknown
> > > > > Source)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.deleteAction(CleanerChore.java:442)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.traverseAndDelete(CleanerChore.java:387)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.lambda$null$2(CleanerChore.java:396)
> > > > > at
> > > > >
&

Re: Re: hbase集群单个region flush时间过长问题求助

2023-03-30 Thread Duo Zhang

e: BLOCK*
> allocate blk_1109782685_36041896, replicas=x.x.x.x:9866, x.x.x.x:9866,
> x.x.x.x:9866 for
> /hbase/WALs/hbase-virgo-rs-18.bigdata.bj.tx.lan,16020,1679431900739/hbase-virgo-rs-18.bigdata.bj.tx.lan%2C16020%2C1679431900739.hbase-virgo-rs-18.bigdata.bj.tx.lan%2C16020%2C1679431900739.regiongroup-0.1680150172590
> 2023-03-30 12:22:53,301 DEBUG
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to choose from local rack (location = /TX/T02/02), retry with the rack of
> the next replica (location = /TX/T01/02)
>
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException:
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:827)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:715)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:622)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:506)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:416)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:465)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:292)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:143)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:159)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2094)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2673)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:872)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:550)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
> at
> java.base/java.security.AccessController.doPrivileged(AccessController.java:691)
> at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> 2023-03-30 12:22:53,301 DEBUG org.apache.hadoop.net.NetworkTopology:
> Choosing random from 1 available nodes on node /TX/T01/02,
> scope=/TX/T01/02, excludedScope=null, excludeNodes=[x.x.x.x:9866,
> x.x.x.x:9866]. numOfDatanodes=2.
> 2023-03-30 12:22:53,301 DEBUG org.apache.hadoop.net.NetworkTopology:
> nthValidToReturn is 0
> 2023-03-30 12:22:53,301 DEBUG org.apache.hadoop.net.NetworkTopology:
> chooseRandom returning x.x.x.x:9866
> 2023-03-30 12:22:53,301 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> allocate blk_1109782686_36041897, replicas=x.x.x.x:9866, x.x.x.x:9866,
> x.x.x.x:9866 for
> /hbase/WALs/hbase-virgo-rs-12.bigdata.bj.tx.lan,16020,1679299081702/hbase-virgo-rs-12.bigdata.bj.tx.lan%2C16020%2C1679299081702.hbase-virgo-rs-12.bigdata.bj.tx.lan%2C16020%2C1679299081702.regiongroup-0.1680150171428
> 2023-03-30 12:22:53,311 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> completeFile:
> /hbase/WALs/hbase-virgo-rs-18.bigdata.bj.tx.lan,16020,1679431900739/hbase-virgo-rs-18.bigdata.bj.tx.lan%2C16020%2C1679431900739.hbase-virgo-rs-18.bigdata.bj.tx.lan%2C16020%2C1679431900739.regiongroup-0.1680149941088
> is closed by DFSClient_NONMAPREDUCE_-1935786627_1
> 2023-03-30 12:22:53,311 DEBUG org.apache.hadoop.net.NetworkTopology:
> Choosing random from 13 available nodes on node /, scope=,
> excludedScope=/TX/T01/03, excludeNodes=[x.x.x.x:9866]. numOfDatanodes=13.
> 2023-03-30 12:22:53,311 DEBUG org.apache.h

Re: 大表快照scan场景中，遇到快照删除，但是快照引用的hfile未删除的现象

2023-03-29 Thread Duo Zhang

https://issues.apache.org/jira/browse/HBASE-25510

这个，不过还没合，也很久没动静了，我再跟一下吧

leojie  于2023年3月30日周四 11:43写道：

> 好的，感谢张老师回复，我搜下ISSUE列表
>
> 张铎(Duo Zhang)  于2023年3月29日周三 21:44写道：
>
> > 这都是 cleaner 的线程，能跑到这里说明还是在扫描看文件是否可以被删除的，并不是判断有 snapshot 正在执行就直接跳过执行？
> >
> > 可以看看是不是就是删的速度跟不上之类的，我印象里好像 TableName 里那个 cache 的性能似乎比较成问题，有个相关的 issue
> >
> > leojie  于2023年3月29日周三 15:06写道：
> >
> > > hi 张老师
> > > 用arthas打印了blocked的线程，快照清理线程卡住的情况如下:
> > > "dir-scan-pool4-thread-10" Id=1448 RUNNABLE
> > > at
> > >
> >
> org.apache.hadoop.hbase.TableName.createTableNameIfNecessary(TableName.java:377)
> > > at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:505)
> > > at
> > >
> >
> org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toTableName(ProtobufUtil.java:2175)
> > > at
> > >
> >
> org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toRegionInfo(ProtobufUtil.java:3114)
> > > at
> > >
> >
> org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitRegionStoreFiles(SnapshotReferenceUtil.java:134)
> > > at
> > >
> >
> org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitTableStoreFiles(SnapshotReferenceUtil.java:121)
> > > at
> > >
> >
> org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.getHFileNames(SnapshotReferenceUtil.java:348)
> > > at
> > >
> >
> org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.getHFileNames(SnapshotReferenceUtil.java:331)
> > > at
> > >
> >
> org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner$1.filesUnderSnapshot(SnapshotHFileCleaner.java:108)
> > > at
> > >
> >
> org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.getSnapshotsInProgress(SnapshotFileCache.java:285)
> > > at
> > >
> >
> org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.getUnreferencedFiles(SnapshotFileCache.java:215)
> > > -  locked
> > > org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache@ab834f8
> > > at
> > >
> >
> org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner.getDeletableFiles(SnapshotHFileCleaner.java:69)
> > > -  locked
> > > org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner@5c750233
> > > < but blocks 9 other threads!
> > > at
> > >
> >
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteFiles(CleanerChore.java:295)
> > > at
> > >
> >
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.lambda$traverseAndDelete$1(CleanerChore.java:387)
> > > at
> > >
> >
> org.apache.hadoop.hbase.master.cleaner.CleanerChore$$Lambda$323/299183520.act(Unknown
> > > Source)
> > > at
> > >
> >
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.deleteAction(CleanerChore.java:442)
> > > at
> > >
> >
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.traverseAndDelete(CleanerChore.java:387)
> > > at
> > >
> >
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.lambda$null$2(CleanerChore.java:396)
> > > at
> > >
> >
> org.apache.hadoop.hbase.master.cleaner.CleanerChore$$Lambda$319/2057648148.run(Unknown
> > > Source)
> > > at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > > at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > > at java.lang.Thread.run(Thread.java:748)
> > >
> > > Number of locked synchronizers = 2
> > > -
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync@358dbc1b
> > > - java.util.concurrent.ThreadPoolExecutor$Worker@236be816
> > >
> > > jstack中与snapshot有关的线程栈如下：
> > >
> > > "SnapshotHandlerChoreCleaner" #855 daemon prio=5 os_prio=0
> > > tid=0x00c20800 nid=0x15a4 waiting on condition
> > [0x7f7b1803c000]
> > >java.lang.Thread.State: TIMED_WAITING (parking)
> > > at sun.misc.Unsafe.park(Native Method)
> > > - parking to wait for  <0x7f8123cefc98> (a
> > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > > at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> > > at
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(

Re: hbase集群单个region flush时间过长问题求助

2023-03-29 Thread Duo Zhang

这个要看你 flush 出去的数据有多大吧？以及写 HDFS 的速度，十几秒通常看还行吧，不算特别慢。有监控打 snapshot 时候集群的整体负载和
IO 情况吗？

邢*  于2023年3月29日周三 19:20写道：

> hi all，
>
>
> 向社区求助一个HBase的情况，情况描述如下：我们hbase集群某张大表最近做snapshot的时间过长，耗时约半小时。梳理日志发现每个region
> flush的时间在十几秒左右，想咨询下flush的时间为什么会这么久？
>
> 集群现状：目前是有14台cpu内存都是96c 384g机型的regionServer机器+3台cpu内存都是8c
> 32g机型的zookeeper机器，使用的hbase版本为2.1.9。
>
> 最近排查到的日志记录如下：
>
> regionserver DFSClient 日志详情：
>
> 2023-03-29 16:24:55,395 INFO org.apache.hadoop.hdfs.DFSClient: Could
> not complete
> /hbase/.hbase-snapshot/.tmp/sdhz_user_info_realtime_1680076952828/region-manifest.603b4d8028af279648af4bfaa3889fd0
> retrying...
>
> namenode日志详情：
>
> 2023-03-29 16:24:54,995 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK*
> blk_1109699571_35958782 is COMMITTED but not COMPLETE(numNodes= 0 <
> minimum = 1) in file
> /hbase/.hbase-snapshot/.tmp/sdhz_user_info_realtime_1680076952828/region-manifest.603b4d8028af279648af4bfaa3889fd0

Re: 大表快照scan场景中，遇到快照删除，但是快照引用的hfile未删除的现象

2023-03-29 Thread Duo Zhang

ncurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
>
> "dir-scan-pool4-thread-10" #1448 daemon prio=5 os_prio=0
> tid=0x7f89fdbd5800 nid=0x3375 waiting for monitor entry
> [0x7f7aef60d000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner.getDeletableFiles(SnapshotHFileCleaner.java:69)
> - waiting to lock <0x7f813c8e5e38> (a
> org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner)
> at
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteFiles(CleanerChore.java:295)
> at
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.lambda$traverseAndDelete$1(CleanerChore.java:387)
> at
> org.apache.hadoop.hbase.master.cleaner.CleanerChore$$Lambda$323/299183520.act(Unknown
> Source)
> at
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.deleteAction(CleanerChore.java:442)
> at
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.traverseAndDelete(CleanerChore.java:387)
> at
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.lambda$null$2(CleanerChore.java:396)
> at
> org.apache.hadoop.hbase.master.cleaner.CleanerChore$$Lambda$319/2057648148.run(Unknown
> Source)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
>
> .. dir-scan-pool4-thread-(2-10) 都是BLOCKED状态
>
> "dir-scan-pool4-thread-1" #1385 daemon prio=5 os_prio=0
> tid=0x7f89e2579800 nid=0x31b3 runnable [0x7f7b09093000]
>java.lang.Thread.State: RUNNABLE
> at
> org.apache.hadoop.hbase.TableName.createTableNameIfNecessary(TableName.java:377)
> at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:505)
> at
> org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toTableName(ProtobufUtil.java:2175)
> at
> org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toRegionInfo(ProtobufUtil.java:3114)
> at
> org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitRegionStoreFiles(SnapshotReferenceUtil.java:134)
> at
> org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitTableStoreFiles(SnapshotReferenceUtil.java:121)
> at
> org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.getHFileNames(SnapshotReferenceUtil.java:348)
> at
> org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.getHFileNames(SnapshotReferenceUtil.java:331)
> at
> org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner$1.filesUnderSnapshot(SnapshotHFileCleaner.java:108)
> at
> org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.getSnapshotsInProgress(SnapshotFileCache.java:285)
> at
> org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.getUnreferencedFiles(SnapshotFileCache.java:215)
> - locked <0x7f8136e18198> (a
> org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache)
> at
> org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner.getDeletableFiles(SnapshotHFileCleaner.java:69)
> - locked <0x7f813c8e5e38> (a
> org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner)
> at
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteFiles(CleanerChore.java:295)
> at
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.lambda$traverseAndDelete$1(CleanerChore.java:387)
> at
> org.apache.hadoop.hbase.master.cleaner.CleanerChore$$Lambda$323/299183520.act(Unknown
> Source)
> at
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.deleteAction(CleanerChore.java:442)
> at
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.traverseAndDelete(CleanerChore.java:387)
> at
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.lambda$null$2(CleanerChore.java:396)
> at
> org.apache.hadoop.hbase.master.cleaner.CleanerChore$$Lambda$319/2057648148.run(Unknown
> Source)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
>
> 需要麻烦张老师再帮忙看看，非常感谢
>
> leojie  于2023年3月29日周三 14:54写道：
>
>> hi 张老师
>> 我用jstack打印了线程栈，用arthas打印了blocked的线程，快照清理线程卡住的情况如下图：
>>
>> [image: image.png]
>> jstack file 如附件，线程堆栈没太看明白，还需要张老师再指点一下
>>
>>
>> leojie  于2023年3月28日周二 19:19写道：
>>
>>> 非常感谢张老师的回复，今天集群有切换master，之后，出现这个问题，我用jstack检查下是否有快照相关的线程卡住
>>

Re: 大表快照scan场景中，遇到快照删除，但是快照引用的hfile未删除的现象

2023-03-27 Thread Duo Zhang

那个日志的意思是有 snapshot 正在执行，所以不会跑 cleaner。可以查一下是否有正在执行的 snapshot 操作卡住了之类的

leojie  于2023年3月27日周一 20:53写道：

> hi all，
>
> 向社区求助一个HBase的情况，情况描述如下：在我们快照scan的场景中，有些大表会较为频繁的做快照（如天级），但是这些快照元数据删除后，快照引用的hfile貌似未清理，这体现在，我们的集群archive目录空间占用嗖嗖往上涨。hmaster中，只找到如下貌似相关的日志：
> [image: image.png]
> 2023-03-27 13:07:10,939 WARN  [dir-scan-pool4-thread-5]
> snapshot.SnapshotFileCache: Not checking unreferenced files since snapshot
> is running, it will skip to clean the HFiles this time
> 除此之外无有用日志。
> 当我们切换HMaster服务后，清理线程貌似又开始工作了，archive目录空间占用会被大量释放，这在我们集群容量监控指标上体现的非常明显。
> 我们使用的hbase版本是2.2.6，求助社区出现这样的情况可能是什么原因，是否有相似的PR可以供这个版本使用，非常感谢
>

[ANNOUNCE] New HBase committer Tianhang Tang(唐天航)

2023-03-15 Thread Duo Zhang

On behalf of the Apache HBase PMC, I am pleased to announce that Tianhang
Tang(thangTang)
has accepted the PMC's invitation to become a committer on the project. We
appreciate all
of Tianhang's generous contributions thus far and look forward to his
continued involvement.

Congratulations and welcome, Tianhang Tang!

我很高兴代表 Apache HBase PMC 宣布 唐天航 已接受我们的邀请，成
为 Apache HBase 项目的 Committer。感谢 唐天航 一直以来为 HBase 项目
做出的贡献，并期待他在未来继续承担更多的责任。

欢迎 唐天航！

[ANNOUNCE] New HBase committer Tianhang Tang(唐天航)

2023-03-15 Thread Duo Zhang

On behalf of the Apache HBase PMC, I am pleased to announce that Tianhang
Tang(thangTang)
has accepted the PMC's invitation to become a committer on the project. We
appreciate all
of Tianhang's generous contributions thus far and look forward to his
continued involvement.

Congratulations and welcome, Tianhang Tang!

我很高兴代表 Apache HBase PMC 宣布 唐天航 已接受我们的邀请，成
为 Apache HBase 项目的 Committer。感谢 唐天航 一直以来为 HBase 项目
做出的贡献，并期待他在未来继续承担更多的责任。

欢迎 唐天航！

Re: [DISCUSS] How to deal with the disabling of public sign ups for jira.a.o(enable github issues?)

2023-03-06 Thread Duo Zhang

https://selfserve.apache.org/jira-account.html

The INFRA team has delivered a self serving tool for requesting a jira
account, so contributors could request a jira account by their own.

I filed HBASE-27689 for updating the README.md to mention this change and
the PR is ready

https://github.com/apache/hbase/pull/5088

PTAL.

Thanks.

Guanghao Zhang  于2022年12月7日周三 12:35写道：

> Did other projects have the same solution for this, sync github issues to
> jira issues? Github issues will be useful to get more feedback.
>
> 张铎(Duo Zhang)  于2022年12月6日周二 00:13写道：
>
> > The PR for HBASE-27513 is available
> >
> > https://github.com/apache/hbase/pull/4913
> >
> > Let's at least tell our users to send email to private@hbase for
> > acquiring a jira account.
> >
> > Thanks.
> >
> > 张铎(Duo Zhang)  于2022年12月2日周五 12:46写道：
> > >
> > > Currently all the comment on github PR will be sent to issues@hbase,
> > > like this one
> > >
> > > https://lists.apache.org/thread/jbfm269b4m24xl2r82l8b0t3pmqr44hr
> > >
> > > But I think this can only be used as an archive, to make sure that all
> > > discussions are recorded on asf infrastructure.
> > >
> > > For github issues, I'm afraid we can only do the same thing. As the
> > > format of github comment is different, it will be hard to read if we
> > > just sync the message to jira...
> > >
> > > Thanks.
> > >
> > > Bryan Beaudreault  于2022年12月1日周四
> > 21:30写道：
> > > >
> > > > Should we have them sent to private@? Just thinking in terms of
> > reducing
> > > > spam to users who put their email and full name on a public list.
> > > >
> > > > One thought I had about bug tracking is whether we could use some
> sort
> > of
> > > > github -> jira sync. I've seen them used before, where it
> automatically
> > > > syncs issues and comments between the two systems. It's definitely
> not
> > > > ideal, but maybe an option? I'm guessing it would require INFRA help.
> > > >
> > > > On Thu, Dec 1, 2022 at 5:47 AM 张铎(Duo Zhang) 
> > wrote:
> > > >
> > > > > I've filed HBASE-27513 for changing the readme on github.
> > > > >
> > > > > At least let's reuse the existing mailing list for acquiring jira
> > account.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > 张铎(Duo Zhang)  于2022年11月29日周二 22:34写道：
> > > > >
> > > > > >
> > > > > > Bump and also send this to user@hbase.
> > > > > >
> > > > > > We need to find a way to deal with the current situation where
> > > > > > contributors can not create a Jira account on their own...
> > > > > >
> > > > > > At least, we need to change the readme on github page, web site
> and
> > > > > > also the ref guide to tell users how to acquire a jira account...
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > 张铎(Duo Zhang)  于2022年11月27日周日 22:06写道：
> > > > > > >
> > > > > > > For me, I think most developers already have a github account,
> so
> > > > > > > enabling it could help us get more feedback. For lots of
> younger
> > > > > > > Chinese developers, they rarely use email in their daily
> life...
> > > > > > > No doubt later we need to modify our readme on github. If we
> > just let
> > > > > > > users go to github issues on the readme, they will soon open an
> > issue
> > > > > > > there. But if we ask users to first send an email to a mailing
> > list,
> > > > > > > for acquiring a jira account, and then wait for a PMC member to
> > submit
> > > > > > > the request, and receive the email response, set up their
> > account, and
> > > > > > > then they can finally open an issue on jira. I'm afraid lots of
> > users
> > > > > > > will just give up, it is not very friendly...
> > > > > > >
> > > > > > > And I do not mean separate issue systems for users and devs.
> > Users can
> > > > > > > still open jira issues or ask in the mailing list if they want,
> > github
> > > > > > > issues is just another channel. If a user asks something in the
> > > > > > &g

Re: ReadOnlyZKClient question

2023-02-08 Thread Duo Zhang

These are all normal logs for ReadOnlyZkClient, but the problem is we
should not output these logs when running hbase shell.

I'm not a ruby expert but we have this in the entry point of hbase
shell(jar-bootstrap.rb)

# Set logging level to avoid verboseness
org.apache.logging.log4j.core.config.Configurator.setAllLevels('org.apache.zookeeper',
log_level)
org.apache.logging.log4j.core.config.Configurator.setAllLevels('org.apache.hadoop',
log_level)

where the log level is

log_level = org.apache.logging.log4j.Level::ERROR

So I'm not sure why you can still see these logs in hbase shell...

LinuxGuy  于2023年2月8日周三 10:13写道：
>
> hello,
>
> i am using the latest hbase version (2.5.3)
> it gets the following logs in hbase shell,
>
> hbase:002:0> 2023-02-08 10:09:08,805 INFO
> [ReadOnlyZKClient-127.0.0.1:2181@0x3611153f] zookeeper.ZooKeeper
> (ZooKeeper.java:close(1422)) - Session: 0x100e7dbb2a40007 closed
>
> 2023-02-08 10:09:08,806 INFO
> [ReadOnlyZKClient-127.0.0.1:2181@0x3611153f-EventThread]
> zookeeper.ClientCnxn (ClientCnxn.java:run(524)) - EventThread shut down for
> session: 0x100e7dbb2a40007
>
>
> what does this mean and how to fix it?
>
>
> Thanks

[ANNOUNCE] Apache HBase 2.4.16 is now available for download

2023-02-05 Thread Duo Zhang

The HBase team is happy to announce the immediate availability of HBase
2.4.16.

Apache HBase™ is an open-source, distributed, versioned, non-relational
database.

Apache HBase gives you low latency random access to billions of rows with
millions of columns atop non-specialized hardware. To learn more about
HBase, see https://hbase.apache.org/.

HBase 2.4.16 is the latest patch release in the HBase 2.4.x line. The
full list of issues can be found in the included CHANGES and RELEASENOTES,
or via our issue tracker:

https://s.apache.org/2.4.16-jira

To download please follow the links and instructions on our website:

https://hbase.apache.org/downloads.html

Questions, comments, and problems are always welcome at:
d...@hbase.apache.org.

Thanks to all who contributed and made this release possible.

Cheers,
The HBase Dev Team

[ANNOUNCE] Apache HBase 2.4.16 is now available for download

2023-02-05 Thread Duo Zhang

The HBase team is happy to announce the immediate availability of HBase
2.4.16.

Apache HBase™ is an open-source, distributed, versioned, non-relational
database.

Apache HBase gives you low latency random access to billions of rows with
millions of columns atop non-specialized hardware. To learn more about
HBase, see https://hbase.apache.org/.

HBase 2.4.16 is the latest patch release in the HBase 2.4.x line. The
full list of issues can be found in the included CHANGES and RELEASENOTES,
or via our issue tracker:

https://s.apache.org/2.4.16-jira

To download please follow the links and instructions on our website:

https://hbase.apache.org/downloads.html

Questions, comments, and problems are always welcome at:
d...@hbase.apache.org.

Thanks to all who contributed and made this release possible.

Cheers,
The HBase Dev Team

HBase Quarterly report Oct-Dec 2022

2023-02-05 Thread Duo Zhang

Hi all,

HBase submits a report to the ASF board once a quarter, to inform the board
about project health. I'm sending the report to the user@ and dev@ mailing
lists because you are the project, and for transparency. If you have any
questions about the report or the running of the project, you can post them
to any PMC member or committer, or send an email to priv...@hbase.apache.org,
which every PMC member subscribes to.

## Description:
Apache HBase is an open-source, distributed, versioned, non-relational
database. Apache HBase gives you low latency random access to billions of rows
with millions of columns atop non-specialized hardware.

hbase-thirdparty is a set of internal artifacts used by the project to
mitigate the impact of our dependency choices on the wider ecosystem.

hbase-connectors is a collection of integration points with other projects.
The initial release includes artifacts for use with Apache Kafka and Apache
Spark.

hbase-filesystem contains HBase project-specific implementations of the Apache
Hadoop FileSystem API. It is currently experimental and internal to the
project.

hbase-operator-tools is a collection of tools for HBase operators. Now it is
mainly for hosting HBCK2.

hbase-native-client is a client library in C/C++, in its early days.

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache HBase was founded 2010-04-21 (13 years ago)
There are currently 101 committers and 57 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:4.

Community changes, past quarter:
- No new PMC members. Last addition was Xiaolin Ha on 2022-04-06.
- Liangjun He was added as committer on 2022-11-29
- Rushabh Shah was added as committer on 2022-12-06

## Project Activity:
Recent releases:
2.5.2 was released on 2022-12-03.
hbase-thirdparty-4.1.3 was released on 2022-11-15.
2.4.15 was released on 2022-10-28.
2.5.1 was released on 2022-10-28.
hbase-thirdparty-4.1.2 was released on 2022-10-10.

We finally find a way to publish binaries and artifacts compiled with hadoop3.
The guys from the Phoenix community confirmed that the solution worked
perfectly. And the Hudi project could also be benefited.
https://lists.apache.org/thread/y05gspk4mnxsz6nk7hc5ots8wt50366b
https://issues.apache.org/jira/browse/HBASE-27434
https://issues.apache.org/jira/browse/HBASE-27442
https://github.com/apache/hbase/pull/4917

The self registration of jira.a.o had been closed by infra team due to spam,
unfortunately HBase uses jira for issuing track. After discussion, we decided
to let users send a request email to private@hbase and let PMC members to
request a jira account for the contributor. We also changed our readme page to
mention this change.
https://lists.apache.org/thread/vgy1n8brdoxqg9qs7752hg8otjcnhyoz
https://issues.apache.org/jira/browse/HBASE-27513


## Community Health:
- Mailing list activity:
d...@hbase.apache.org:
956 subscribers(952 in the previous quarter)

user@hbase.apache.org:
1987 subscribers(1994 in the previous quarter)

user...@hbase.apache.org
78 subscribers(77 in the previous quarter)

- JIRA activity:
128 issues opened in JIRA, past quarter (-46% change)
120 issues closed in JIRA, past quarter (-42% change)

- Commit activity:
413 commits in the past quarter (-28% change)
42 code contributors in the past quarter (-14% change)

- GitHub PR activity:
132 PRs opened on GitHub, past quarter (-42% change)
126 PRs closed on GitHub, past quarter (-48% change)

We now have more than 400 invididual contributors(401 till now) on the github
statistic page!

The development activity was still decreased. This is partly because we are
still waiting for the final features to be landed on master and branch-2, i.e,
the replication improvements and backup support, so we can cut branch-3 and
branch-2.6 and start to making 3.0.0 and 2.6.0 release. Hope we can cut these
two branches in the next quarter and make more releases.

HBase Quarterly report Oct-Dec 2022

2023-02-05 Thread Duo Zhang

Hi all,

HBase submits a report to the ASF board once a quarter, to inform the board
about project health. I'm sending the report to the user@ and dev@ mailing
lists because you are the project, and for transparency. If you have any
questions about the report or the running of the project, you can post them
to any PMC member or committer, or send an email to priv...@hbase.apache.org,
which every PMC member subscribes to.

## Description:
Apache HBase is an open-source, distributed, versioned, non-relational
database. Apache HBase gives you low latency random access to billions of rows
with millions of columns atop non-specialized hardware.

hbase-thirdparty is a set of internal artifacts used by the project to
mitigate the impact of our dependency choices on the wider ecosystem.

hbase-connectors is a collection of integration points with other projects.
The initial release includes artifacts for use with Apache Kafka and Apache
Spark.

hbase-filesystem contains HBase project-specific implementations of the Apache
Hadoop FileSystem API. It is currently experimental and internal to the
project.

hbase-operator-tools is a collection of tools for HBase operators. Now it is
mainly for hosting HBCK2.

hbase-native-client is a client library in C/C++, in its early days.

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache HBase was founded 2010-04-21 (13 years ago)
There are currently 101 committers and 57 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:4.

Community changes, past quarter:
- No new PMC members. Last addition was Xiaolin Ha on 2022-04-06.
- Liangjun He was added as committer on 2022-11-29
- Rushabh Shah was added as committer on 2022-12-06

## Project Activity:
Recent releases:
2.5.2 was released on 2022-12-03.
hbase-thirdparty-4.1.3 was released on 2022-11-15.
2.4.15 was released on 2022-10-28.
2.5.1 was released on 2022-10-28.
hbase-thirdparty-4.1.2 was released on 2022-10-10.

We finally find a way to publish binaries and artifacts compiled with hadoop3.
The guys from the Phoenix community confirmed that the solution worked
perfectly. And the Hudi project could also be benefited.
https://lists.apache.org/thread/y05gspk4mnxsz6nk7hc5ots8wt50366b
https://issues.apache.org/jira/browse/HBASE-27434
https://issues.apache.org/jira/browse/HBASE-27442
https://github.com/apache/hbase/pull/4917

The self registration of jira.a.o had been closed by infra team due to spam,
unfortunately HBase uses jira for issuing track. After discussion, we decided
to let users send a request email to private@hbase and let PMC members to
request a jira account for the contributor. We also changed our readme page to
mention this change.
https://lists.apache.org/thread/vgy1n8brdoxqg9qs7752hg8otjcnhyoz
https://issues.apache.org/jira/browse/HBASE-27513


## Community Health:
- Mailing list activity:
d...@hbase.apache.org:
956 subscribers(952 in the previous quarter)

u...@hbase.apache.org:
1987 subscribers(1994 in the previous quarter)

user-zh@hbase.apache.org
78 subscribers(77 in the previous quarter)

- JIRA activity:
128 issues opened in JIRA, past quarter (-46% change)
120 issues closed in JIRA, past quarter (-42% change)

- Commit activity:
413 commits in the past quarter (-28% change)
42 code contributors in the past quarter (-14% change)

- GitHub PR activity:
132 PRs opened on GitHub, past quarter (-42% change)
126 PRs closed on GitHub, past quarter (-48% change)

We now have more than 400 invididual contributors(401 till now) on the github
statistic page!

The development activity was still decreased. This is partly because we are
still waiting for the final features to be landed on master and branch-2, i.e,
the replication improvements and backup support, so we can cut branch-3 and
branch-2.6 and start to making 3.0.0 and 2.6.0 release. Hope we can cut these
two branches in the next quarter and make more releases.

[ANNOUNCE] Please welcome Tak Lon (Stephen) Wu to the HBase PMC

2023-01-29 Thread Duo Zhang

On behalf of the Apache HBase PMC I am pleased to announce that
Tak Lon (Stephen) Wu has accepted our invitation to become a PMC member
on the Apache HBase project. We appreciate Tak Lon (Stephen) Wu stepping
up to take more responsibility in the HBase project.

Please join me in welcoming Tak Lon (Stephen) Wu to the HBase PMC!

我很高兴代表 Apache HBase PMC 宣布 Tak Lon (Stephen) Wu 已接受我们的邀请，
成为 Apache HBase 项目的 PMC 成员。感谢 Tak Lon (Stephen) Wu 愿意在 HBase
项目中承担更大的责任。

欢迎 Tak Lon (Stephen) Wu！

[ANNOUNCE] Please welcome Tak Lon (Stephen) Wu to the HBase PMC

2023-01-29 Thread Duo Zhang

On behalf of the Apache HBase PMC I am pleased to announce that
Tak Lon (Stephen) Wu has accepted our invitation to become a PMC member
on the Apache HBase project. We appreciate Tak Lon (Stephen) Wu stepping
up to take more responsibility in the HBase project.

Please join me in welcoming Tak Lon (Stephen) Wu to the HBase PMC!

我很高兴代表 Apache HBase PMC 宣布 Tak Lon (Stephen) Wu 已接受我们的邀请，
成为 Apache HBase 项目的 PMC 成员。感谢 Tak Lon (Stephen) Wu 愿意在 HBase
项目中承担更大的责任。

欢迎 Tak Lon (Stephen) Wu！

Re: scan 引发未捕获运行时异常导致RS宕机

2023-01-27 Thread Duo Zhang

这个就是兜底用的，如果发生了说明代码有 bug

leojie  于2023年1月28日周六 11:14写道：
>
> 感谢张老师的回复，再请教一下，BlockCacheUtil类中的validateBlockAddition方法内，
>
> public static int validateBlockAddition(Cacheable existing, Cacheable 
> newBlock,
>   BlockCacheKey cacheKey) {
>   int comparison = compareCacheBlock(existing, newBlock, false);
>   if (comparison != 0) {
> throw new RuntimeException(
>   "Cached block contents differ, which should not have happened."
> + "cacheKey:" + cacheKey);
>   }
>   if ((existing instanceof HFileBlock) && (newBlock instanceof HFileBlock)) {
> comparison = ((HFileBlock) existing).getNextBlockOnDiskSize()
>   - ((HFileBlock) newBlock).getNextBlockOnDiskSize();
>   }
>   return comparison;
> }
>
> 这里comparison != 0这个判断逻辑是必须的？
>
>
> 张铎(Duo Zhang)  于2023年1月28日周六 11:09写道：
>
> > 建议先升级到2.4或者2.5的最新版试试？2.2.6 已经是比较老的版本了，后面 BucketCache 修过不少小 bug
> >
> > https://issues.apache.org/jira/browse/HBASE-26281
> >
> > 比如这个就有可能导致BucketCache里的内容错乱，出现各种奇怪的错误
> >
> > leojie  于2023年1月28日周六 10:22写道：
> >
> > >
> > > Hi all，
> > > 向社区求助一个HBase的scan异常导致RS宕机的问题，异常日志如下：
> > >
> > > 2023-01-19 03:19:06,986 ERROR
> > > [RpcServer.default.RWQ.Fifo.scan.handler=226,queue=19,port=60020]
> > > ipc.RpcServer: Unexpected throwable object
> > > java.lang.RuntimeException: Cached block contents differ, which should
> > not
> > > have happened.cacheKey:bbec4ed53b6d475cbb8711f183556eb0_14145152
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.BlockCacheUtil.validateBlockAddition(BlockCacheUtil.java:205)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.BlockCacheUtil.shouldReplaceExistingCacheBlock(BlockCacheUtil.java:237)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.bucket.BucketCache.shouldReplaceExistingCacheBlock(BucketCache.java:432)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.bucket.BucketCache.cacheBlockWithWait(BucketCache.java:417)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.bucket.BucketCache.cacheBlock(BucketCache.java:403)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.CombinedBlockCache.cacheBlock(CombinedBlockCache.java:68)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.HFileReaderImpl.lambda$readBlock$2(HFileReaderImpl.java:1528)
> > > at java.util.Optional.ifPresent(Optional.java:159)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1526)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.HFileReaderImpl$HFileScannerImpl.readNextDataBlock(HFileReaderImpl.java:928)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.HFileReaderImpl$HFileScannerImpl.isNextBlock(HFileReaderImpl.java:1061)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.HFileReaderImpl$HFileScannerImpl.positionForNextBlock(HFileReaderImpl.java:1055)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.HFileReaderImpl$HFileScannerImpl._next(HFileReaderImpl.java:1073)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.HFileReaderImpl$HFileScannerImpl.next(HFileReaderImpl.java:1094)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:351)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:244)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:324)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:267)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:1099)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:1088)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.StoreScanner.seekOrSkipToNextColumn(StoreScanner.java:823)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:730)
> > > at
> > >
> > org.apach

Re: scan 引发未捕获运行时异常导致RS宕机

2023-01-27 Thread Duo Zhang

建议先升级到2.4或者2.5的最新版试试？2.2.6 已经是比较老的版本了，后面 BucketCache 修过不少小 bug

https://issues.apache.org/jira/browse/HBASE-26281

比如这个就有可能导致BucketCache里的内容错乱，出现各种奇怪的错误

leojie  于2023年1月28日周六 10:22写道：

>
> Hi all，
> 向社区求助一个HBase的scan异常导致RS宕机的问题，异常日志如下：
>
> 2023-01-19 03:19:06,986 ERROR
> [RpcServer.default.RWQ.Fifo.scan.handler=226,queue=19,port=60020]
> ipc.RpcServer: Unexpected throwable object
> java.lang.RuntimeException: Cached block contents differ, which should not
> have happened.cacheKey:bbec4ed53b6d475cbb8711f183556eb0_14145152
> at
> org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.validateBlockAddition(BlockCacheUtil.java:205)
> at
> org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.shouldReplaceExistingCacheBlock(BlockCacheUtil.java:237)
> at
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.shouldReplaceExistingCacheBlock(BucketCache.java:432)
> at
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlockWithWait(BucketCache.java:417)
> at
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlock(BucketCache.java:403)
> at
> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.cacheBlock(CombinedBlockCache.java:68)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.lambda$readBlock$2(HFileReaderImpl.java:1528)
> at java.util.Optional.ifPresent(Optional.java:159)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1526)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readNextDataBlock(HFileReaderImpl.java:928)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.isNextBlock(HFileReaderImpl.java:1061)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.positionForNextBlock(HFileReaderImpl.java:1055)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl._next(HFileReaderImpl.java:1073)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.next(HFileReaderImpl.java:1094)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:351)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:244)
> at
> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:324)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:267)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:1099)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:1088)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekOrSkipToNextColumn(StoreScanner.java:823)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:730)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:157)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:6681)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6845)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:6615)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3238)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3483)
> at
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42278)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:379)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
>
> 对应的源码（2.2.6）：
>
> public static int validateBlockAddition(Cacheable existing, Cacheable 
> newBlock,
>   BlockCacheKey cacheKey) {
>   int comparison = compareCacheBlock(existing, newBlock, false);
>   if (comparison != 0) {
> throw new RuntimeException(
>   "Cached block contents differ, which should not have happened."
> + "cacheKey:" + cacheKey);
>   }
>   if ((existing instanceof HFileBlock) && (newBlock instanceof HFileBlock)) {
> comparison = ((HFileBlock) existing).getNextBlockOnDiskSize()
>   - ((HFileBlock) newBlock).getNextBlockOnDiskSize();
>   }
>   return comparison;
> }
>
>
> scan操作的缓存置换过程中，触发了一个未捕获的RunTime异常，导致RS突然宕机，请教一下，什么情况会触发此异常，然后如何避免此异常的发生呢？

Re: [DISCUSS] Allow namespace admins to clone snapshots created by them

2023-01-18 Thread Duo Zhang

Approved the PR on github. Please go ahead.

Szabolcs Bukros  于2023年1月16日周一 22:07写道：
>
> I would be fine with adding this only to 2.6 and 3.0. I only proposed the
> other branches because of the perceived low impact of the change.
> This change is not important enough to make a sooner release necessary.
>
> If the change is accepted could I please get a review for the PR?
> https://github.com/apache/hbase/pull/4885
>
> On Wed, Jan 4, 2023 at 3:17 AM 张铎(Duo Zhang)  wrote:
>
> > +1 on releasing 2.6.0 sooner.
> >
> > And I think it is time to EOL 2.4.x after we release 2.6.0?
> >
> > Bryan Beaudreault  于2023年1月3日周二 21:02写道：
> > >
> > > I think development is done on TLS. We are just waiting on requested
> > > testing. Andor was working on that, but I believe he had some stuff come
> > up
> > > at his work.
> > >
> > > I also want to get backups in place, but there is 1 backwards
> > compatibility
> > > issue to work through. Hoping to have that squared away soon.
> > >
> > > On Sat, Dec 31, 2022 at 9:32 PM Andrew Purtell  > >
> > > wrote:
> > >
> > > > +1
> > > >
> > > > If this is needed soon in a release we could start on 2.6.0?
> > > >
> > > > (How is TLS RPC coming along? - that would be the big ticket item.)
> > > >
> > > > > On Dec 23, 2022, at 7:06 AM, 张铎  wrote:
> > > > >
> > > > > This is a behavior change, it makes non admin users can clone
> > snapshot.
> > > > >
> > > > > For me I do not think we should include changes like this in a patch
> > > > > release, unless it is considered as a critical bug which must be
> > > > > fixed.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > Szabolcs Bukros  于2022年11月30日周三
> > 00:06写道：
> > > > >>
> > > > >> This should not break any existing use case so I see no reason to
> > not
> > > > add
> > > > >> this to branch-2.5 and
> > > > >> branch-2.4.
> > > > >>
> > > > >>> On Thu, Nov 24, 2022 at 3:03 AM 张铎(Duo Zhang) <
> > palomino...@gmail.com>
> > > > wrote:
> > > > >>>
> > > > >>> I'm OK with this change.
> > > > >>>
> > > > >>> But maybe we still need to determine which branches we can apply
> > this
> > > > >>> change to? Is it OK to include this change for branch-2.5 and
> > > > >>> branch-2.4?
> > > > >>>
> > > > >>> Tak Lon (Stephen) Wu  于2022年11月22日周二 06:31写道：
> > > > >>>>
> > > > >>>> FYI the PR is https://github.com/apache/hbase/pull/4885
> > > > <https://github.com/apache/hbase/pull/4885>
> > > > and
> > > > >>>> https://issues.apache.org/jira/browse/HBASE-27493
> > > > <https://issues.apache.org/jira/browse/HBASE-27493>
> > > > .
> > > > >>>>
> > > > >>>> the proposal seems to be, should we allow cloning snapshot to any
> > > > >>>> namespace if they're not the global admin.
> > > > >>>>
> > > > >>>> logically, it should be fine because they're the admin for the
> > > > >>>> namespace, and should be able to do whatever within that
> > namespace.
> > > > >>>>
> > > > >>>> Thanks,
> > > > >>>> Stephen
> > > > >>>>
> > > > >>>>
> > > > >>>> On Mon, Nov 21, 2022 at 11:38 AM Szabolcs Bukros
> > > > >>>>  wrote:
> > > > >>>>>
> > > > >>>>> Hi Everyone,
> > > > >>>>>
> > > > >>>>> Creating a snapshot requires table admin permissions. But
> > cloning it
> > > > >>>>> requires global admin permissions unless the user owns the
> > snapshot
> > > > and
> > > > >>>>> wants to recreate the original table the snapshot was based on
> > using
> > > > >>> the
> > > > >>>>> same table name. This puts unnecessary load on the few users
> > having
> > > > >>> global
> > > > >>>>> admin permissions on the cluster. I would like to relax this
> > rule a
> > > > >>> bit and
> > > > >>>>> allow the owner of the snapshot to clone it into any namespace
> > where
> > > > >>> they
> > > > >>>>> have admin permissions regardless of the table name used.
> > > > >>>>>
> > > > >>>>> Please let me know what you think about this proposal. And if you
> > > > find
> > > > >>> it
> > > > >>>>> acceptable which branch do you think this could land on.
> > > > >>>>>
> > > > >>>>> Thanks,
> > > > >>>>> Szabolcs Bukros
> > > > >>>
> > > >
> >
>
>
> --
> *Szabolcs Bukros* | Staff Engineer
> e. szabo...@cloudera.com
> cloudera.com <https://www.cloudera.com>
> [image: Cloudera] <https://www.cloudera.com/>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> <https://www.cloudera.com/>
> --

Re: [DISCUSS] Allow namespace admins to clone snapshots created by them

2023-01-03 Thread Duo Zhang

+1 on releasing 2.6.0 sooner.

And I think it is time to EOL 2.4.x after we release 2.6.0?

Bryan Beaudreault  于2023年1月3日周二 21:02写道：
>
> I think development is done on TLS. We are just waiting on requested
> testing. Andor was working on that, but I believe he had some stuff come up
> at his work.
>
> I also want to get backups in place, but there is 1 backwards compatibility
> issue to work through. Hoping to have that squared away soon.
>
> On Sat, Dec 31, 2022 at 9:32 PM Andrew Purtell 
> wrote:
>
> > +1
> >
> > If this is needed soon in a release we could start on 2.6.0?
> >
> > (How is TLS RPC coming along? - that would be the big ticket item.)
> >
> > > On Dec 23, 2022, at 7:06 AM, 张铎  wrote:
> > >
> > > This is a behavior change, it makes non admin users can clone snapshot.
> > >
> > > For me I do not think we should include changes like this in a patch
> > > release, unless it is considered as a critical bug which must be
> > > fixed.
> > >
> > > Thanks.
> > >
> > > Szabolcs Bukros  于2022年11月30日周三 00:06写道：
> > >>
> > >> This should not break any existing use case so I see no reason to not
> > add
> > >> this to branch-2.5 and
> > >> branch-2.4.
> > >>
> > >>> On Thu, Nov 24, 2022 at 3:03 AM 张铎(Duo Zhang) 
> > wrote:
> > >>>
> > >>> I'm OK with this change.
> > >>>
> > >>> But maybe we still need to determine which branches we can apply this
> > >>> change to? Is it OK to include this change for branch-2.5 and
> > >>> branch-2.4?
> > >>>
> > >>> Tak Lon (Stephen) Wu  于2022年11月22日周二 06:31写道：
> > >>>>
> > >>>> FYI the PR is https://github.com/apache/hbase/pull/4885
> > <https://github.com/apache/hbase/pull/4885>
> > and
> > >>>> https://issues.apache.org/jira/browse/HBASE-27493
> > <https://issues.apache.org/jira/browse/HBASE-27493>
> > .
> > >>>>
> > >>>> the proposal seems to be, should we allow cloning snapshot to any
> > >>>> namespace if they're not the global admin.
> > >>>>
> > >>>> logically, it should be fine because they're the admin for the
> > >>>> namespace, and should be able to do whatever within that namespace.
> > >>>>
> > >>>> Thanks,
> > >>>> Stephen
> > >>>>
> > >>>>
> > >>>> On Mon, Nov 21, 2022 at 11:38 AM Szabolcs Bukros
> > >>>>  wrote:
> > >>>>>
> > >>>>> Hi Everyone,
> > >>>>>
> > >>>>> Creating a snapshot requires table admin permissions. But cloning it
> > >>>>> requires global admin permissions unless the user owns the snapshot
> > and
> > >>>>> wants to recreate the original table the snapshot was based on using
> > >>> the
> > >>>>> same table name. This puts unnecessary load on the few users having
> > >>> global
> > >>>>> admin permissions on the cluster. I would like to relax this rule a
> > >>> bit and
> > >>>>> allow the owner of the snapshot to clone it into any namespace where
> > >>> they
> > >>>>> have admin permissions regardless of the table name used.
> > >>>>>
> > >>>>> Please let me know what you think about this proposal. And if you
> > find
> > >>> it
> > >>>>> acceptable which branch do you think this could land on.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Szabolcs Bukros
> > >>>
> >

Re: [DISCUSS] Allow namespace admins to clone snapshots created by them

2022-12-23 Thread Duo Zhang

This is a behavior change, it makes non admin users can clone snapshot.

For me I do not think we should include changes like this in a patch
release, unless it is considered as a critical bug which must be
fixed.

Thanks.

Szabolcs Bukros  于2022年11月30日周三 00:06写道：
>
> This should not break any existing use case so I see no reason to not add
> this to branch-2.5 and
> branch-2.4.
>
> On Thu, Nov 24, 2022 at 3:03 AM 张铎(Duo Zhang)  wrote:
>
> > I'm OK with this change.
> >
> > But maybe we still need to determine which branches we can apply this
> > change to? Is it OK to include this change for branch-2.5 and
> > branch-2.4?
> >
> > Tak Lon (Stephen) Wu  于2022年11月22日周二 06:31写道：
> > >
> > > FYI the PR is https://github.com/apache/hbase/pull/4885 and
> > > https://issues.apache.org/jira/browse/HBASE-27493.
> > >
> > > the proposal seems to be, should we allow cloning snapshot to any
> > > namespace if they're not the global admin.
> > >
> > > logically, it should be fine because they're the admin for the
> > > namespace, and should be able to do whatever within that namespace.
> > >
> > > Thanks,
> > > Stephen
> > >
> > >
> > > On Mon, Nov 21, 2022 at 11:38 AM Szabolcs Bukros
> > >  wrote:
> > > >
> > > > Hi Everyone,
> > > >
> > > > Creating a snapshot requires table admin permissions. But cloning it
> > > > requires global admin permissions unless the user owns the snapshot and
> > > > wants to recreate the original table the snapshot was based on using
> > the
> > > > same table name. This puts unnecessary load on the few users having
> > global
> > > > admin permissions on the cluster. I would like to relax this rule a
> > bit and
> > > > allow the owner of the snapshot to clone it into any namespace where
> > they
> > > > have admin permissions regardless of the table name used.
> > > >
> > > > Please let me know what you think about this proposal. And if you find
> > it
> > > > acceptable which branch do you think this could land on.
> > > >
> > > > Thanks,
> > > > Szabolcs Bukros
> >

[ANNOUNCE] New HBase committer Rushabh Shah

2022-12-14 Thread Duo Zhang

On behalf of the Apache HBase PMC, I am pleased to announce that
Rushabh Shah(shahrs87)
has accepted the PMC's invitation to become a committer on the
project. We appreciate all
of Rushabh's generous contributions thus far and look forward to his
continued involvement.

Congratulations and welcome, Rushabh Shah!

我很高兴代表 Apache HBase PMC 宣布 Rushabh Shah 已接受我们的邀请，成
为 Apache HBase 项目的 Committer。感谢 Rushabh Shah 一直以来为 HBase 项目
做出的贡献，并期待他在未来继续承担更多的责任。

欢迎 Rushabh Shah！

[ANNOUNCE] New HBase committer Rushabh Shah

2022-12-14 Thread Duo Zhang

On behalf of the Apache HBase PMC, I am pleased to announce that
Rushabh Shah(shahrs87)
has accepted the PMC's invitation to become a committer on the
project. We appreciate all
of Rushabh's generous contributions thus far and look forward to his
continued involvement.

Congratulations and welcome, Rushabh Shah!

我很高兴代表 Apache HBase PMC 宣布 Rushabh Shah 已接受我们的邀请，成
为 Apache HBase 项目的 Committer。感谢 Rushabh Shah 一直以来为 HBase 项目
做出的贡献，并期待他在未来继续承担更多的责任。

欢迎 Rushabh Shah！

Re: [DISCUSS] How to deal with the disabling of public sign ups for jira.a.o(enable github issues?)

2022-12-05 Thread Duo Zhang

The PR for HBASE-27513 is available

https://github.com/apache/hbase/pull/4913

Let's at least tell our users to send email to private@hbase for
acquiring a jira account.

Thanks.

张铎(Duo Zhang)  于2022年12月2日周五 12:46写道：
>
> Currently all the comment on github PR will be sent to issues@hbase,
> like this one
>
> https://lists.apache.org/thread/jbfm269b4m24xl2r82l8b0t3pmqr44hr
>
> But I think this can only be used as an archive, to make sure that all
> discussions are recorded on asf infrastructure.
>
> For github issues, I'm afraid we can only do the same thing. As the
> format of github comment is different, it will be hard to read if we
> just sync the message to jira...
>
> Thanks.
>
> Bryan Beaudreault  于2022年12月1日周四 21:30写道：
> >
> > Should we have them sent to private@? Just thinking in terms of reducing
> > spam to users who put their email and full name on a public list.
> >
> > One thought I had about bug tracking is whether we could use some sort of
> > github -> jira sync. I've seen them used before, where it automatically
> > syncs issues and comments between the two systems. It's definitely not
> > ideal, but maybe an option? I'm guessing it would require INFRA help.
> >
> > On Thu, Dec 1, 2022 at 5:47 AM 张铎(Duo Zhang)  wrote:
> >
> > > I've filed HBASE-27513 for changing the readme on github.
> > >
> > > At least let's reuse the existing mailing list for acquiring jira account.
> > >
> > > Thanks.
> > >
> > > 张铎(Duo Zhang)  于2022年11月29日周二 22:34写道：
> > >
> > > >
> > > > Bump and also send this to user@hbase.
> > > >
> > > > We need to find a way to deal with the current situation where
> > > > contributors can not create a Jira account on their own...
> > > >
> > > > At least, we need to change the readme on github page, web site and
> > > > also the ref guide to tell users how to acquire a jira account...
> > > >
> > > > Thanks.
> > > >
> > > > 张铎(Duo Zhang)  于2022年11月27日周日 22:06写道：
> > > > >
> > > > > For me, I think most developers already have a github account, so
> > > > > enabling it could help us get more feedback. For lots of younger
> > > > > Chinese developers, they rarely use email in their daily life...
> > > > > No doubt later we need to modify our readme on github. If we just let
> > > > > users go to github issues on the readme, they will soon open an issue
> > > > > there. But if we ask users to first send an email to a mailing list,
> > > > > for acquiring a jira account, and then wait for a PMC member to submit
> > > > > the request, and receive the email response, set up their account, and
> > > > > then they can finally open an issue on jira. I'm afraid lots of users
> > > > > will just give up, it is not very friendly...
> > > > >
> > > > > And I do not mean separate issue systems for users and devs. Users can
> > > > > still open jira issues or ask in the mailing list if they want, github
> > > > > issues is just another channel. If a user asks something in the
> > > > > mailing list and we think it is a bug, we will ask the user to file an
> > > > > issue or we will file an issue for it. It is just the same with github
> > > > > issues.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > Nick Dimiduk  于2022年11月24日周四 15:44写道：
> > > > > >
> > > > > > This new situation around JIRA seems very similar to the existing
> > > situation
> > > > > > around Slack. A new community member currently must acquire a Slack
> > > invite
> > > > > > somehow, usually by emailing one of the lists. Mailing lists
> > > themselves
> > > > > > involve a signup process, though it may be possible to email
> > > user/-zh/dev
> > > > > > without first subscribing to the list.
> > > > > >
> > > > > > I have a -0 opinion on using GitHub Issues to manage JIRA
> > > subscription
> > > > > > access. It seems like a comical cascade of complexity. I’d prefer to
> > > keep
> > > > > > GitHub Issues available to us as a future alternative to JIRA for
> > > project
> > > > > > issue tracking. I agree with you that migrating away from JIRA will
> > > be
> > > > > > painful.
>

[ANNOUNCE] Apache HBase 2.5.2 is now available for download

2022-12-04 Thread Duo Zhang

The HBase team is happy to announce the immediate availability of HBase
2.5.2.

Apache HBase™ is an open-source, distributed, versioned, non-relational
database.

Apache HBase gives you low latency random access to billions of rows with
millions of columns atop non-specialized hardware. To learn more about
HBase, see https://hbase.apache.org/.

HBase 2.5.2 is the latest patch release in the HBase 2.5.x line. The
full list of issues can be found in the included CHANGES and RELEASENOTES,
or via our issue tracker:

https://s.apache.org/hbase-2.5.2-jira

To download please follow the links and instructions on our website:

https://hbase.apache.org/downloads.html

Notice that, starting from 2.5.2, we will publish dists and maven artifacts
which are built against hadoop3. You can download the hadoop3 dists from the
download page which has 'hadoop3-' prefix. And for maven artifacts, please
use 2.5.2-hadoop3 as the version of the hbase dependencies in your pom.

Questions, comments, and problems are always welcome at:
d...@hbase.apache.org.

Thanks to all who contributed and made this release possible.

Cheers,
The HBase Dev Team

[ANNOUNCE] Apache HBase 2.5.2 is now available for download

2022-12-04 Thread Duo Zhang

The HBase team is happy to announce the immediate availability of HBase
2.5.2.

Apache HBase™ is an open-source, distributed, versioned, non-relational
database.

Apache HBase gives you low latency random access to billions of rows with
millions of columns atop non-specialized hardware. To learn more about
HBase, see https://hbase.apache.org/.

HBase 2.5.2 is the latest patch release in the HBase 2.5.x line. The
full list of issues can be found in the included CHANGES and RELEASENOTES,
or via our issue tracker:

https://s.apache.org/hbase-2.5.2-jira

To download please follow the links and instructions on our website:

https://hbase.apache.org/downloads.html

Notice that, starting from 2.5.2, we will publish dists and maven artifacts
which are built against hadoop3. You can download the hadoop3 dists from the
download page which has 'hadoop3-' prefix. And for maven artifacts, please
use 2.5.2-hadoop3 as the version of the hbase dependencies in your pom.

Questions, comments, and problems are always welcome at:
d...@hbase.apache.org.

Thanks to all who contributed and made this release possible.

Cheers,
The HBase Dev Team

Re: [ANNOUNCE] New HBase Committer Liangjun He

2022-12-03 Thread Duo Zhang

Congratulations!

Yu Li  于2022年12月3日周六 21:51写道：
>
> Hi All,
>
> On behalf of the Apache HBase PMC, I am pleased to announce that Liangjun
> He (heliangjun) has accepted the PMC's invitation to become a committer on
> the project. We appreciate all of Liangjun's generous contributions thus
> far and look forward to his continued involvement.
>
> Congratulations and welcome, Liangjun!
>
> 我很高兴代表 Apache HBase PMC 宣布 Liangjun He (何良均) 已接受我们的邀请，成为 Apache HBase 项目的
> Committer。感谢何良均一直以来为 HBase 项目做出的贡献，并期待他在未来继续承担更多的责任。
>
> 欢迎良均！
>
> Best Regards,
> Yu
> --
> Best Regards,
> Yu

Re: [ANNOUNCE] New HBase Committer Liangjun He

2022-12-03 Thread Duo Zhang

Congratulations!

Yu Li  于2022年12月3日周六 21:51写道：
>
> Hi All,
>
> On behalf of the Apache HBase PMC, I am pleased to announce that Liangjun
> He (heliangjun) has accepted the PMC's invitation to become a committer on
> the project. We appreciate all of Liangjun's generous contributions thus
> far and look forward to his continued involvement.
>
> Congratulations and welcome, Liangjun!
>
> 我很高兴代表 Apache HBase PMC 宣布 Liangjun He (何良均) 已接受我们的邀请，成为 Apache HBase 项目的
> Committer。感谢何良均一直以来为 HBase 项目做出的贡献，并期待他在未来继续承担更多的责任。
>
> 欢迎良均！
>
> Best Regards,
> Yu
> --
> Best Regards,
> Yu

Re: [DISCUSS] How to deal with the disabling of public sign ups for jira.a.o(enable github issues?)

2022-12-01 Thread Duo Zhang

Currently all the comment on github PR will be sent to issues@hbase,
like this one

https://lists.apache.org/thread/jbfm269b4m24xl2r82l8b0t3pmqr44hr

But I think this can only be used as an archive, to make sure that all
discussions are recorded on asf infrastructure.

For github issues, I'm afraid we can only do the same thing. As the
format of github comment is different, it will be hard to read if we
just sync the message to jira...

Thanks.

Bryan Beaudreault  于2022年12月1日周四 21:30写道：
>
> Should we have them sent to private@? Just thinking in terms of reducing
> spam to users who put their email and full name on a public list.
>
> One thought I had about bug tracking is whether we could use some sort of
> github -> jira sync. I've seen them used before, where it automatically
> syncs issues and comments between the two systems. It's definitely not
> ideal, but maybe an option? I'm guessing it would require INFRA help.
>
> On Thu, Dec 1, 2022 at 5:47 AM 张铎(Duo Zhang)  wrote:
>
> > I've filed HBASE-27513 for changing the readme on github.
> >
> > At least let's reuse the existing mailing list for acquiring jira account.
> >
> > Thanks.
> >
> > 张铎(Duo Zhang)  于2022年11月29日周二 22:34写道：
> >
> > >
> > > Bump and also send this to user@hbase.
> > >
> > > We need to find a way to deal with the current situation where
> > > contributors can not create a Jira account on their own...
> > >
> > > At least, we need to change the readme on github page, web site and
> > > also the ref guide to tell users how to acquire a jira account...
> > >
> > > Thanks.
> > >
> > > 张铎(Duo Zhang)  于2022年11月27日周日 22:06写道：
> > > >
> > > > For me, I think most developers already have a github account, so
> > > > enabling it could help us get more feedback. For lots of younger
> > > > Chinese developers, they rarely use email in their daily life...
> > > > No doubt later we need to modify our readme on github. If we just let
> > > > users go to github issues on the readme, they will soon open an issue
> > > > there. But if we ask users to first send an email to a mailing list,
> > > > for acquiring a jira account, and then wait for a PMC member to submit
> > > > the request, and receive the email response, set up their account, and
> > > > then they can finally open an issue on jira. I'm afraid lots of users
> > > > will just give up, it is not very friendly...
> > > >
> > > > And I do not mean separate issue systems for users and devs. Users can
> > > > still open jira issues or ask in the mailing list if they want, github
> > > > issues is just another channel. If a user asks something in the
> > > > mailing list and we think it is a bug, we will ask the user to file an
> > > > issue or we will file an issue for it. It is just the same with github
> > > > issues.
> > > >
> > > > Thanks.
> > > >
> > > > Nick Dimiduk  于2022年11月24日周四 15:44写道：
> > > > >
> > > > > This new situation around JIRA seems very similar to the existing
> > situation
> > > > > around Slack. A new community member currently must acquire a Slack
> > invite
> > > > > somehow, usually by emailing one of the lists. Mailing lists
> > themselves
> > > > > involve a signup process, though it may be possible to email
> > user/-zh/dev
> > > > > without first subscribing to the list.
> > > > >
> > > > > I have a -0 opinion on using GitHub Issues to manage JIRA
> > subscription
> > > > > access. It seems like a comical cascade of complexity. I’d prefer to
> > keep
> > > > > GitHub Issues available to us as a future alternative to JIRA for
> > project
> > > > > issue tracking. I agree with you that migrating away from JIRA will
> > be
> > > > > painful.
> > > > >
> > > > > I’m not a big fan of having separate issue systems for users vs.
> > devs. It
> > > > > emphasizes the idea that users and devs are different groups of
> > people with
> > > > > unequal voice in the project direction. I suppose it could be done
> > well,
> > > > > but I think it is more likely to be done poorly.
> > > > >
> > > > > I follow the Infra list, but only casually. It seems there’s a plan
> > to
> > > > > eventually adopt some Atlassian Cloud service, which has better
> > ant

Re: [DISCUSS] How to deal with the disabling of public sign ups for jira.a.o(enable github issues?)

2022-12-01 Thread Duo Zhang

I've filed HBASE-27513 for changing the readme on github.

At least let's reuse the existing mailing list for acquiring jira account.

Thanks.

张铎(Duo Zhang)  于2022年11月29日周二 22:34写道：

>
> Bump and also send this to user@hbase.
>
> We need to find a way to deal with the current situation where
> contributors can not create a Jira account on their own...
>
> At least, we need to change the readme on github page, web site and
> also the ref guide to tell users how to acquire a jira account...
>
> Thanks.
>
> 张铎(Duo Zhang)  于2022年11月27日周日 22:06写道：
> >
> > For me, I think most developers already have a github account, so
> > enabling it could help us get more feedback. For lots of younger
> > Chinese developers, they rarely use email in their daily life...
> > No doubt later we need to modify our readme on github. If we just let
> > users go to github issues on the readme, they will soon open an issue
> > there. But if we ask users to first send an email to a mailing list,
> > for acquiring a jira account, and then wait for a PMC member to submit
> > the request, and receive the email response, set up their account, and
> > then they can finally open an issue on jira. I'm afraid lots of users
> > will just give up, it is not very friendly...
> >
> > And I do not mean separate issue systems for users and devs. Users can
> > still open jira issues or ask in the mailing list if they want, github
> > issues is just another channel. If a user asks something in the
> > mailing list and we think it is a bug, we will ask the user to file an
> > issue or we will file an issue for it. It is just the same with github
> > issues.
> >
> > Thanks.
> >
> > Nick Dimiduk  于2022年11月24日周四 15:44写道：
> > >
> > > This new situation around JIRA seems very similar to the existing 
> > > situation
> > > around Slack. A new community member currently must acquire a Slack invite
> > > somehow, usually by emailing one of the lists. Mailing lists themselves
> > > involve a signup process, though it may be possible to email user/-zh/dev
> > > without first subscribing to the list.
> > >
> > > I have a -0 opinion on using GitHub Issues to manage JIRA subscription
> > > access. It seems like a comical cascade of complexity. I’d prefer to keep
> > > GitHub Issues available to us as a future alternative to JIRA for project
> > > issue tracking. I agree with you that migrating away from JIRA will be
> > > painful.
> > >
> > > I’m not a big fan of having separate issue systems for users vs. devs. It
> > > emphasizes the idea that users and devs are different groups of people 
> > > with
> > > unequal voice in the project direction. I suppose it could be done well,
> > > but I think it is more likely to be done poorly.
> > >
> > > I follow the Infra list, but only casually. It seems there’s a plan to
> > > eventually adopt some Atlassian Cloud service, which has better anti-spam
> > > controls. If that is on the roadmap, I’m content to let JIRA access follow
> > > Slack access: using some existing outreach to request access. Introducing 
> > > a
> > > dedicated list would be fine with me as well.
> > >
> > > -n
> > >
> > > On Thu, Nov 24, 2022 at 03:19 Duo Zhang  wrote:
> > >
> > > > I've forwarded an announcement email from the INFRA team recently
> > > > about disabling the public sign ups for jira.a.o because of spamming.
> > > > And now the rule is finally applied, when you open jira.a.o, you can
> > > > see there is a gray bar on the top to tell you 'Public signup for this
> > > > instance is disabled. Our Jira Guidelines page explains how to get an
> > > > account.'.
> > > >
> > > > For me, I do not think it is easy for us to completely drop jira since
> > > > it is the issue tracker we have been using for years and all our
> > > > release processes are bound to it. So at least we need to find a way
> > > > to let our contributors know how to acquire a jira account.
> > > >
> > > > The hive project decided to use a dedicated mailing list for acquiring
> > > > a jira account.
> > > >
> > > >
> > > > https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-JIRA
> > > >
> > > > For me, I think maybe we could enable github issues for our users and
> > > > contributors. They can ask questions and report issues there and if we
> > > > think it is a bug that needs to be fixed, we could open a jira issue
> > > > for it. And we could also create a special issue template for
> > > > acquiring jira accounts.
> > > >
> > > > Thoughts?
> > > >
> > > > Thanks.
> > > >

Re: [DISCUSS] How to deal with the disabling of public sign ups for jira.a.o(enable github issues?)

2022-11-29 Thread Duo Zhang

Bump and also send this to user@hbase.

We need to find a way to deal with the current situation where
contributors can not create a Jira account on their own...

At least, we need to change the readme on github page, web site and
also the ref guide to tell users how to acquire a jira account...

Thanks.

张铎(Duo Zhang)  于2022年11月27日周日 22:06写道：
>
> For me, I think most developers already have a github account, so
> enabling it could help us get more feedback. For lots of younger
> Chinese developers, they rarely use email in their daily life...
> No doubt later we need to modify our readme on github. If we just let
> users go to github issues on the readme, they will soon open an issue
> there. But if we ask users to first send an email to a mailing list,
> for acquiring a jira account, and then wait for a PMC member to submit
> the request, and receive the email response, set up their account, and
> then they can finally open an issue on jira. I'm afraid lots of users
> will just give up, it is not very friendly...
>
> And I do not mean separate issue systems for users and devs. Users can
> still open jira issues or ask in the mailing list if they want, github
> issues is just another channel. If a user asks something in the
> mailing list and we think it is a bug, we will ask the user to file an
> issue or we will file an issue for it. It is just the same with github
> issues.
>
> Thanks.
>
> Nick Dimiduk  于2022年11月24日周四 15:44写道：
> >
> > This new situation around JIRA seems very similar to the existing situation
> > around Slack. A new community member currently must acquire a Slack invite
> > somehow, usually by emailing one of the lists. Mailing lists themselves
> > involve a signup process, though it may be possible to email user/-zh/dev
> > without first subscribing to the list.
> >
> > I have a -0 opinion on using GitHub Issues to manage JIRA subscription
> > access. It seems like a comical cascade of complexity. I’d prefer to keep
> > GitHub Issues available to us as a future alternative to JIRA for project
> > issue tracking. I agree with you that migrating away from JIRA will be
> > painful.
> >
> > I’m not a big fan of having separate issue systems for users vs. devs. It
> > emphasizes the idea that users and devs are different groups of people with
> > unequal voice in the project direction. I suppose it could be done well,
> > but I think it is more likely to be done poorly.
> >
> > I follow the Infra list, but only casually. It seems there’s a plan to
> > eventually adopt some Atlassian Cloud service, which has better anti-spam
> > controls. If that is on the roadmap, I’m content to let JIRA access follow
> > Slack access: using some existing outreach to request access. Introducing a
> > dedicated list would be fine with me as well.
> >
> > -n
> >
> > On Thu, Nov 24, 2022 at 03:19 Duo Zhang  wrote:
> >
> > > I've forwarded an announcement email from the INFRA team recently
> > > about disabling the public sign ups for jira.a.o because of spamming.
> > > And now the rule is finally applied, when you open jira.a.o, you can
> > > see there is a gray bar on the top to tell you 'Public signup for this
> > > instance is disabled. Our Jira Guidelines page explains how to get an
> > > account.'.
> > >
> > > For me, I do not think it is easy for us to completely drop jira since
> > > it is the issue tracker we have been using for years and all our
> > > release processes are bound to it. So at least we need to find a way
> > > to let our contributors know how to acquire a jira account.
> > >
> > > The hive project decided to use a dedicated mailing list for acquiring
> > > a jira account.
> > >
> > >
> > > https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-JIRA
> > >
> > > For me, I think maybe we could enable github issues for our users and
> > > contributors. They can ask questions and report issues there and if we
> > > think it is a bug that needs to be fixed, we could open a jira issue
> > > for it. And we could also create a special issue template for
> > > acquiring jira accounts.
> > >
> > > Thoughts?
> > >
> > > Thanks.
> > >

Re: [DISCUSS] Allow namespace admins to clone snapshots created by them

2022-11-23 Thread Duo Zhang

I'm OK with this change.

But maybe we still need to determine which branches we can apply this
change to? Is it OK to include this change for branch-2.5 and
branch-2.4?

Tak Lon (Stephen) Wu  于2022年11月22日周二 06:31写道：
>
> FYI the PR is https://github.com/apache/hbase/pull/4885 and
> https://issues.apache.org/jira/browse/HBASE-27493.
>
> the proposal seems to be, should we allow cloning snapshot to any
> namespace if they're not the global admin.
>
> logically, it should be fine because they're the admin for the
> namespace, and should be able to do whatever within that namespace.
>
> Thanks,
> Stephen
>
>
> On Mon, Nov 21, 2022 at 11:38 AM Szabolcs Bukros
>  wrote:
> >
> > Hi Everyone,
> >
> > Creating a snapshot requires table admin permissions. But cloning it
> > requires global admin permissions unless the user owns the snapshot and
> > wants to recreate the original table the snapshot was based on using the
> > same table name. This puts unnecessary load on the few users having global
> > admin permissions on the cluster. I would like to relax this rule a bit and
> > allow the owner of the snapshot to clone it into any namespace where they
> > have admin permissions regardless of the table name used.
> >
> > Please let me know what you think about this proposal. And if you find it
> > acceptable which branch do you think this could land on.
> >
> > Thanks,
> > Szabolcs Bukros

1 2 3 4 >

1 - 100 of 343 matches

Mail list logo