Re: [DISCUSS] Removing tests and/or Hadoop from the binary assemblies

2024-03-05 Thread Bryan Beaudreault
I'm +0 on hbase-examples, but +100 on any improvements we can make to
ltt/pe/chaos/minicluster/etc. It's extremely frustrating how much reliance
we have on test jars both generally but also specifically around these core
test executables. Unfortunately I haven't had time to dedicate to these
frustrations myself, but happy to help with review, etc.

On Tue, Mar 5, 2024 at 1:03 PM Nihal Jain  wrote:

> Thank you for bringing this up.
>
> +1 for this change.
>
> In fact, some time back, we had faced similar problem. Security scans found
> that we were bundling some vulnerable hadoop test jar. To deal with that we
> had to make a change in our internal HBase fork to exclude all HBase and
> Hadoop test jars from assembly. This helped us get rid of vulnerable jar.
> (Although I hadn't dealt with test scope dependencies there.)
>
> But, I have been thinking of pushing this change in Apache HBase, just
> wasn't sure if this was even acceptable. It's great to see same has been
> brought up here today.
>
> We hadn't dealt with the ltt, pe etc. tools and wrote a script to download
> them on demand to avoid massive code change in internal fork. But I have a
> +1 on the idea of identifying and moving all such tools to a new module.
> This would be great and make things easier for us as well.
>
> Also, a way we could help new users easily get started, in case we
> completely stop bundling hadoop jars, is by providing a script which starts
> a hbase cluster in a single node setup. In fact I had written a simple
> script sometime back that automates this process given a release link for
> both. It first downloads Hadoop and HBase binaries and then starts both
> with the hbase root directory set to be on hdfs. We could provide something
> similar to help new users to get started easily.
>
> Although I am also +1 on the idea to provide both variants as mentioned by
> Nick, which might not even need any such script.
>
> Also, I am willing to volunteer for help towards this effort. Please let me
> know if anything is needed.
>
> Thanks,
> Nihal
>
>
> On Tue, 5 Mar 2024, 15:35 Nick Dimiduk,  wrote:
>
> > This would be great cleanup, big +1 from me for all three of these
> > adjustments, including the promotion of pe, ltt, and friends out of the
> > test scope.
> >
> > I believe that we included hbase test jars because we used to freely mix
> > classes needed for minicluster between runtime and test jars, which in
> turn
> > relied on Hadoop minicluster capabilities. The big cleanup around
> > HBaseTestingUtil/it addressed much (or all) of these issues on branch-3.
> >
> > I believe that we include a Hadoop distribution in our assembly because
> > that makes it easy for a new user to download our release bin.tgz and get
> > started immediately with learning. I guess it’s high time that we work
> out
> > the with- and without-Hadoop variants.
> >
> > Thanks,
> > Nick
> >
> > On Tue, 5 Mar 2024 at 09:14, Istvan Toth  wrote:
> >
> > > DISCLAIMER: I don't have a patch ready, or even an elegant way mapped
> out
> > > to achieve this, this is about discussing whether we even want to make
> > > these changes.
> > > These are also substantial changes, but they could be targeted for
> HBase
> > > 3.0.
> > >
> > > One issue I have noticed is that we ship test jars and test
> dependencies
> > in
> > > the assembly.
> > > I can't see anyone using those, but it bloats the assembly and
> classpath,
> > > and adds unnecessary JARs with possible CVE issues. (for example Kerby
> > > which is a Hadoop minicluster dependency)
> > >
> > > My proposal is to exclude the test jars and the test scope dependencies
> > > from the assembly.
> > >
> > > The advantages would be:
> > > * Smaller distro size
> > > * Faster startup (this is marginal)
> > > * Less CVE-prone JARs in the binary assemblies
> > >
> > > The other issue is that the assembly includes much of the Hadoop
> > > distribution.
> > > The basic assumption in all scripts and instructions is that the node
> > has a
> > > fully configured Hadoop installation, and we include it in the
> classpath
> > of
> > > HBase.
> > >
> > > If that is true, then there is no reason to include Hadoop in the
> > assembly,
> > > HBase and its direct dependencies should be enough.
> > >
> > > One could argue that it would simplify the client side, which is true
> to
> > > some extent (though 95% of the client distro use cases are served
> better
> > by
> > > simply using hbase-shaded-client).
> > >
> > > We could either remove the Hadoop libraries from either or both of the
> > > assemblies unconditionally, or provide two variants for either or both
> > > assemblies, one with Hadoop included, and one without it.
> > > Spark already does this, it has binary distributions both with and
> > without
> > > Hadoop.
> > >
> > > The advantages would be:
> > > * Smaller distro size
> > > * Faster startup (this is marginal)
> > > * Less chance of conflicts with the Hadoop jars
> > > * Less CVE-prone JARs in the binary

Re: [DISCUSS] Removing tests and/or Hadoop from the binary assemblies

2024-03-05 Thread Nihal Jain
Thank you for bringing this up.

+1 for this change.

In fact, some time back, we had faced similar problem. Security scans found
that we were bundling some vulnerable hadoop test jar. To deal with that we
had to make a change in our internal HBase fork to exclude all HBase and
Hadoop test jars from assembly. This helped us get rid of vulnerable jar.
(Although I hadn't dealt with test scope dependencies there.)

But, I have been thinking of pushing this change in Apache HBase, just
wasn't sure if this was even acceptable. It's great to see same has been
brought up here today.

We hadn't dealt with the ltt, pe etc. tools and wrote a script to download
them on demand to avoid massive code change in internal fork. But I have a
+1 on the idea of identifying and moving all such tools to a new module.
This would be great and make things easier for us as well.

Also, a way we could help new users easily get started, in case we
completely stop bundling hadoop jars, is by providing a script which starts
a hbase cluster in a single node setup. In fact I had written a simple
script sometime back that automates this process given a release link for
both. It first downloads Hadoop and HBase binaries and then starts both
with the hbase root directory set to be on hdfs. We could provide something
similar to help new users to get started easily.

Although I am also +1 on the idea to provide both variants as mentioned by
Nick, which might not even need any such script.

Also, I am willing to volunteer for help towards this effort. Please let me
know if anything is needed.

Thanks,
Nihal


On Tue, 5 Mar 2024, 15:35 Nick Dimiduk,  wrote:

> This would be great cleanup, big +1 from me for all three of these
> adjustments, including the promotion of pe, ltt, and friends out of the
> test scope.
>
> I believe that we included hbase test jars because we used to freely mix
> classes needed for minicluster between runtime and test jars, which in turn
> relied on Hadoop minicluster capabilities. The big cleanup around
> HBaseTestingUtil/it addressed much (or all) of these issues on branch-3.
>
> I believe that we include a Hadoop distribution in our assembly because
> that makes it easy for a new user to download our release bin.tgz and get
> started immediately with learning. I guess it’s high time that we work out
> the with- and without-Hadoop variants.
>
> Thanks,
> Nick
>
> On Tue, 5 Mar 2024 at 09:14, Istvan Toth  wrote:
>
> > DISCLAIMER: I don't have a patch ready, or even an elegant way mapped out
> > to achieve this, this is about discussing whether we even want to make
> > these changes.
> > These are also substantial changes, but they could be targeted for HBase
> > 3.0.
> >
> > One issue I have noticed is that we ship test jars and test dependencies
> in
> > the assembly.
> > I can't see anyone using those, but it bloats the assembly and classpath,
> > and adds unnecessary JARs with possible CVE issues. (for example Kerby
> > which is a Hadoop minicluster dependency)
> >
> > My proposal is to exclude the test jars and the test scope dependencies
> > from the assembly.
> >
> > The advantages would be:
> > * Smaller distro size
> > * Faster startup (this is marginal)
> > * Less CVE-prone JARs in the binary assemblies
> >
> > The other issue is that the assembly includes much of the Hadoop
> > distribution.
> > The basic assumption in all scripts and instructions is that the node
> has a
> > fully configured Hadoop installation, and we include it in the classpath
> of
> > HBase.
> >
> > If that is true, then there is no reason to include Hadoop in the
> assembly,
> > HBase and its direct dependencies should be enough.
> >
> > One could argue that it would simplify the client side, which is true to
> > some extent (though 95% of the client distro use cases are served better
> by
> > simply using hbase-shaded-client).
> >
> > We could either remove the Hadoop libraries from either or both of the
> > assemblies unconditionally, or provide two variants for either or both
> > assemblies, one with Hadoop included, and one without it.
> > Spark already does this, it has binary distributions both with and
> without
> > Hadoop.
> >
> > The advantages would be:
> > * Smaller distro size
> > * Faster startup (this is marginal)
> > * Less chance of conflicts with the Hadoop jars
> > * Less CVE-prone JARs in the binary assemblies
> >
> >
> > Thirdly, we could consider excluding the
> > full-fat org.apache.hbase:hbase-shaded-client JAR from the Hadoop-less
> > binary assemblies. It is not used by the assembly, and AFAIK it is not
> > included in any of the 'hbase classpath' command variants.
> >
> > This would make sure that no Hadoop libraries are included (even in
> shaded
> > form) and would make the HBase distribution fully insulated from Hadoop's
> > CVE issues.
> >
> > (The full-fat hbase-shaded-client works best as direct build-time
> > dependency anyway)
> >
> > best regards
> > Istvan
> >
>


[jira] [Created] (HBASE-28422) SplitWalProcedure will attempt SplitWalRemoteProcedure on the same target RegionServer indefinitely

2024-03-05 Thread David Manning (Jira)
David Manning created HBASE-28422:
-

 Summary: SplitWalProcedure will attempt SplitWalRemoteProcedure on 
the same target RegionServer indefinitely
 Key: HBASE-28422
 URL: https://issues.apache.org/jira/browse/HBASE-28422
 Project: HBase
  Issue Type: Bug
  Components: master, proc-v2, wal
Affects Versions: 2.5.5
Reporter: David Manning


Similar to HBASE-28050. If HMaster selects a RegionServer for 
SplitWalRemoteProcedure, it will retry this server as long as the server is 
alive. I believe this is because even though 
{{RSProcedureDispatcher.ExecuteProceduresRemoteCall.run}} calls 
{{{}remoteCallFailed{}}}, there is no logic after this to select a new target 
server. For {{TransitRegionStateProcedure}} there is logic to select a new 
server for opening a region, using {{{}forceNewPlan{}}}. But 
SplitWalRemoteProcedure only has logic to try another server if we receive a 
{{DoNotRetryIOException}} in SplitWALRemoteProcedure#complete: 
[https://github.com/apache/hbase/blob/780ff56b3f23e7041ef1b705b7d3d0a53fdd05ae/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/SplitWALRemoteProcedure.java#L104-L110]

If we receive any other IOException, we will just retry the target server 
forever. Just like in HBASE-28050, if there is a SaslException, this will never 
lead to retrying a SplitWalRemoteProcedure on a new server, which can lead to 
ServerCrashProcedure never finishing until the target server for 
SplitWalRemoteProcedure is restarted. The following log is seen repeatedly, 
always sending to the same host.
{code:java}
2024-01-31 15:59:43,616 WARN  [RSProcedureDispatcher-pool-72846] 
procedure.SplitWALRemoteProcedure - Failed split of 
hdfs:///hbase/WALs/,1704984571464-splitting/1704984571464.1706710908543,
 retry...
java.io.IOException: Call to address= failed on local exception: 
java.io.IOException: Can not send request because relogin is in progress.
at sun.reflect.GeneratedConstructorAccessor363.newInstance(Unknown 
Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:239)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:391)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:92)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:425)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:420)
at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:114)
at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:129)
at 
org.apache.hadoop.hbase.ipc.NettyRpcConnection.lambda$sendRequest$4(NettyRpcConnection.java:365)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:403)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at 
org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Can not send request because relogin is in 
progress.
at 
org.apache.hadoop.hbase.ipc.NettyRpcConnection.sendRequest0(NettyRpcConnection.java:321)
at 
org.apache.hadoop.hbase.ipc.NettyRpcConnection.lambda$sendRequest$4(NettyRpcConnection.java:363)
... 8 more
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28421) Add ofs (Ozone Filesystem) support for acquireDelegationToken

2024-03-05 Thread Pratyush Bhatt (Jira)
Pratyush Bhatt created HBASE-28421:
--

 Summary: Add ofs (Ozone Filesystem) support for 
acquireDelegationToken
 Key: HBASE-28421
 URL: https://issues.apache.org/jira/browse/HBASE-28421
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Pratyush Bhatt


Currently acquireDelegationToken is hardcoded for checking swebhdfs, webhdfs 
and hdfs(Refer below or 
[here|[https://github.com/apache/hbase/blob/4f97ece9f5ab9288ea44f5842be55a4dbaa866e0/hbase-server/src/main/java/org/apache/hadoop/hbase/security/token/FsDelegationToken.java#L62-L84]]).
We should add support for Ozone also. 
{code:java}
public void acquireDelegationToken(final FileSystem fs) throws IOException {
  String tokenKind;
  String scheme = fs.getUri().getScheme();
  if (SWEBHDFS_SCHEME.equalsIgnoreCase(scheme)) {
tokenKind = SWEBHDFS_TOKEN_KIND.toString();
  } else if (WEBHDFS_SCHEME.equalsIgnoreCase(scheme)) {
tokenKind = WEBHDFS_TOKEN_KIND.toString();
  } else if (HDFS_URI_SCHEME.equalsIgnoreCase(scheme)) {
tokenKind = HDFS_DELEGATION_KIND.toString();
  } else {
LOG.warn("Unknown FS URI scheme: " + scheme);
// Preserve default behavior
tokenKind = HDFS_DELEGATION_KIND.toString();
  }

  acquireDelegationToken(tokenKind, fs);
} {code}
This can impact jobs like BulkLoad on a secure environment.

Thanks [~bszabolcs] for the debug help!

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] removing hbase-examples from the assembly

2024-03-05 Thread Istvan Toth
This sounds great to me.
The current PR does this, so I think we are all in agreement.

On Tue, Mar 5, 2024 at 10:49 AM 张铎(Duo Zhang)  wrote:

> I prefer we still have the hbase-examples in the main repo and publish
> it to maven central, but we do not need to ship it in the binary
> releases. The most important thing for hbase-examples is its source
> code, so including it in binary releases does not help.
>
> Istvan Toth  于2024年3月5日周二 03:28写道:
> >
> > I don't have a problem with having an examples module in the main repo,
> it
> > can be useful, and this way it is guaranteed to always work with the
> latest
> > version, and we don't have to maintain another repo.
> >
> > Publishing the binary artifact to maven (as we do now) doesn't sound very
> > useful, but if nothing depends on it then it doesn't hurt either. It's
> > easier to keep publishing it than it is to disable publishing.
> >
> > I don't really see the need for a separate download (as long as the
> > examples can be found easily via the docs).
> >
> > Thanks,
> > Istvan
> >
> >
> > On Mon, Mar 4, 2024 at 7:24 PM Nick Dimiduk  wrote:
> >
> > > Should we remove hbase-examples from the main repository entirely?
> Should
> > > it be its own download? Should we even ship it in binary form at all?
> > >
> > > Anyway I’m fine with removing it from the assembly.
> > >
> > > Thanks,
> > > Nick
> > >
> > > On Mon, 4 Mar 2024 at 13:27, Istvan Toth  wrote:
> > >
> > > > hbase assembly (and consequently the binary distributions) now
> depend on
> > > > hbase-examples.
> > > >
> > > > I think this is problematic, as
> > > > * many of those examples are explicitly not production quality.
> > > > * It adds extra curator dependencies to the assembly and to the
> various
> > > > HBase classpaths. (whic the rest of HBase does not use)
> > > >
> > > > I propose removing hbase-examples and its dependencies from the HBase
> > > > assembly, starting with HBase 3.0.
> > > >
> > > > This would have two effects:
> > > > - The example code will not be present on the classpath
> > > > - Curator libraries will not be added to the HBase classpath.
> Depending
> > > on
> > > > the shaded/non shaded classpath, the Curator from Hadoop in
> relocated or
> > > > unrelocated form will still be present.
> > > >
> > > > Related tickets:
> > > > HBASE-28416  :
> This
> > > > proposal
> > > > HBASE-28415  :
> > > Removing
> > > > erroneous curator dependency from hbase-endpoint (no brainer)
> > > > HBASE-28411  :
> The
> > > > original proposal to remove curator completely
> > > >
> > > > best regards
> > > > Istvan
> > > >
> > >
> >
> >
> > --
> > *István Tóth* | Sr. Staff Software Engineer
> > *Email*: st...@cloudera.com
> > cloudera.com 
> > [image: Cloudera] 
> > [image: Cloudera on Twitter]  [image:
> > Cloudera on Facebook]  [image:
> Cloudera
> > on LinkedIn] 
> > --
> > --
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com 
[image: Cloudera] 
[image: Cloudera on Twitter]  [image:
Cloudera on Facebook]  [image: Cloudera
on LinkedIn] 
--
--


Re: [DISCUSS] Removing tests and/or Hadoop from the binary assemblies

2024-03-05 Thread Nick Dimiduk
This would be great cleanup, big +1 from me for all three of these
adjustments, including the promotion of pe, ltt, and friends out of the
test scope.

I believe that we included hbase test jars because we used to freely mix
classes needed for minicluster between runtime and test jars, which in turn
relied on Hadoop minicluster capabilities. The big cleanup around
HBaseTestingUtil/it addressed much (or all) of these issues on branch-3.

I believe that we include a Hadoop distribution in our assembly because
that makes it easy for a new user to download our release bin.tgz and get
started immediately with learning. I guess it’s high time that we work out
the with- and without-Hadoop variants.

Thanks,
Nick

On Tue, 5 Mar 2024 at 09:14, Istvan Toth  wrote:

> DISCLAIMER: I don't have a patch ready, or even an elegant way mapped out
> to achieve this, this is about discussing whether we even want to make
> these changes.
> These are also substantial changes, but they could be targeted for HBase
> 3.0.
>
> One issue I have noticed is that we ship test jars and test dependencies in
> the assembly.
> I can't see anyone using those, but it bloats the assembly and classpath,
> and adds unnecessary JARs with possible CVE issues. (for example Kerby
> which is a Hadoop minicluster dependency)
>
> My proposal is to exclude the test jars and the test scope dependencies
> from the assembly.
>
> The advantages would be:
> * Smaller distro size
> * Faster startup (this is marginal)
> * Less CVE-prone JARs in the binary assemblies
>
> The other issue is that the assembly includes much of the Hadoop
> distribution.
> The basic assumption in all scripts and instructions is that the node has a
> fully configured Hadoop installation, and we include it in the classpath of
> HBase.
>
> If that is true, then there is no reason to include Hadoop in the assembly,
> HBase and its direct dependencies should be enough.
>
> One could argue that it would simplify the client side, which is true to
> some extent (though 95% of the client distro use cases are served better by
> simply using hbase-shaded-client).
>
> We could either remove the Hadoop libraries from either or both of the
> assemblies unconditionally, or provide two variants for either or both
> assemblies, one with Hadoop included, and one without it.
> Spark already does this, it has binary distributions both with and without
> Hadoop.
>
> The advantages would be:
> * Smaller distro size
> * Faster startup (this is marginal)
> * Less chance of conflicts with the Hadoop jars
> * Less CVE-prone JARs in the binary assemblies
>
>
> Thirdly, we could consider excluding the
> full-fat org.apache.hbase:hbase-shaded-client JAR from the Hadoop-less
> binary assemblies. It is not used by the assembly, and AFAIK it is not
> included in any of the 'hbase classpath' command variants.
>
> This would make sure that no Hadoop libraries are included (even in shaded
> form) and would make the HBase distribution fully insulated from Hadoop's
> CVE issues.
>
> (The full-fat hbase-shaded-client works best as direct build-time
> dependency anyway)
>
> best regards
> Istvan
>


Re: [DISCUSS] removing hbase-examples from the assembly

2024-03-05 Thread Duo Zhang
I prefer we still have the hbase-examples in the main repo and publish
it to maven central, but we do not need to ship it in the binary
releases. The most important thing for hbase-examples is its source
code, so including it in binary releases does not help.

Istvan Toth  于2024年3月5日周二 03:28写道:
>
> I don't have a problem with having an examples module in the main repo, it
> can be useful, and this way it is guaranteed to always work with the latest
> version, and we don't have to maintain another repo.
>
> Publishing the binary artifact to maven (as we do now) doesn't sound very
> useful, but if nothing depends on it then it doesn't hurt either. It's
> easier to keep publishing it than it is to disable publishing.
>
> I don't really see the need for a separate download (as long as the
> examples can be found easily via the docs).
>
> Thanks,
> Istvan
>
>
> On Mon, Mar 4, 2024 at 7:24 PM Nick Dimiduk  wrote:
>
> > Should we remove hbase-examples from the main repository entirely? Should
> > it be its own download? Should we even ship it in binary form at all?
> >
> > Anyway I’m fine with removing it from the assembly.
> >
> > Thanks,
> > Nick
> >
> > On Mon, 4 Mar 2024 at 13:27, Istvan Toth  wrote:
> >
> > > hbase assembly (and consequently the binary distributions) now depend on
> > > hbase-examples.
> > >
> > > I think this is problematic, as
> > > * many of those examples are explicitly not production quality.
> > > * It adds extra curator dependencies to the assembly and to the various
> > > HBase classpaths. (whic the rest of HBase does not use)
> > >
> > > I propose removing hbase-examples and its dependencies from the HBase
> > > assembly, starting with HBase 3.0.
> > >
> > > This would have two effects:
> > > - The example code will not be present on the classpath
> > > - Curator libraries will not be added to the HBase classpath. Depending
> > on
> > > the shaded/non shaded classpath, the Curator from Hadoop in relocated or
> > > unrelocated form will still be present.
> > >
> > > Related tickets:
> > > HBASE-28416  : This
> > > proposal
> > > HBASE-28415  :
> > Removing
> > > erroneous curator dependency from hbase-endpoint (no brainer)
> > > HBASE-28411  : The
> > > original proposal to remove curator completely
> > >
> > > best regards
> > > Istvan
> > >
> >
>
>
> --
> *István Tóth* | Sr. Staff Software Engineer
> *Email*: st...@cloudera.com
> cloudera.com 
> [image: Cloudera] 
> [image: Cloudera on Twitter]  [image:
> Cloudera on Facebook]  [image: Cloudera
> on LinkedIn] 
> --
> --


Re: [DISCUSS] Removing tests and/or Hadoop from the binary assemblies

2024-03-05 Thread Istvan Toth
I agree, we don't want to omit those from the binary distro.
We should identify what those tools are. (Should be easy based on the
presence of main() or the Tool interface)
Such tools could either be moved into a new module, like hbase-tools, or
simply moved to the runtime JARs.

Istvan

On Tue, Mar 5, 2024 at 10:34 AM 张铎(Duo Zhang)  wrote:

> There are some tools in the tests jar, such as PerformanceEvaluation.
>
> But anyway, maybe they should be moved to main...
>
> Istvan Toth  于2024年3月5日周二 16:14写道:
> >
> > DISCLAIMER: I don't have a patch ready, or even an elegant way mapped out
> > to achieve this, this is about discussing whether we even want to make
> > these changes.
> > These are also substantial changes, but they could be targeted for HBase
> > 3.0.
> >
> > One issue I have noticed is that we ship test jars and test dependencies
> in
> > the assembly.
> > I can't see anyone using those, but it bloats the assembly and classpath,
> > and adds unnecessary JARs with possible CVE issues. (for example Kerby
> > which is a Hadoop minicluster dependency)
> >
> > My proposal is to exclude the test jars and the test scope dependencies
> > from the assembly.
> >
> > The advantages would be:
> > * Smaller distro size
> > * Faster startup (this is marginal)
> > * Less CVE-prone JARs in the binary assemblies
> >
> > The other issue is that the assembly includes much of the Hadoop
> > distribution.
> > The basic assumption in all scripts and instructions is that the node
> has a
> > fully configured Hadoop installation, and we include it in the classpath
> of
> > HBase.
> >
> > If that is true, then there is no reason to include Hadoop in the
> assembly,
> > HBase and its direct dependencies should be enough.
> >
> > One could argue that it would simplify the client side, which is true to
> > some extent (though 95% of the client distro use cases are served better
> by
> > simply using hbase-shaded-client).
> >
> > We could either remove the Hadoop libraries from either or both of the
> > assemblies unconditionally, or provide two variants for either or both
> > assemblies, one with Hadoop included, and one without it.
> > Spark already does this, it has binary distributions both with and
> without
> > Hadoop.
> >
> > The advantages would be:
> > * Smaller distro size
> > * Faster startup (this is marginal)
> > * Less chance of conflicts with the Hadoop jars
> > * Less CVE-prone JARs in the binary assemblies
> >
> >
> > Thirdly, we could consider excluding the
> > full-fat org.apache.hbase:hbase-shaded-client JAR from the Hadoop-less
> > binary assemblies. It is not used by the assembly, and AFAIK it is not
> > included in any of the 'hbase classpath' command variants.
> >
> > This would make sure that no Hadoop libraries are included (even in
> shaded
> > form) and would make the HBase distribution fully insulated from Hadoop's
> > CVE issues.
> >
> > (The full-fat hbase-shaded-client works best as direct build-time
> > dependency anyway)
> >
> > best regards
> > Istvan
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com 
[image: Cloudera] 
[image: Cloudera on Twitter]  [image:
Cloudera on Facebook]  [image: Cloudera
on LinkedIn] 
--
--


[jira] [Resolved] (HBASE-28379) Upgrade thirdparty dep to 4.1.6

2024-03-05 Thread Nick Dimiduk (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-28379.
--
Resolution: Fixed

This is merged. FYI [~bbeaudreault].

> Upgrade thirdparty dep to 4.1.6
> ---
>
> Key: HBASE-28379
> URL: https://issues.apache.org/jira/browse/HBASE-28379
> Project: HBase
>  Issue Type: Task
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 4.0.0-alpha-1, 3.0.0-beta-2
>
>
> Adopt the next hbase-thirdparty release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Removing tests and/or Hadoop from the binary assemblies

2024-03-05 Thread Duo Zhang
There are some tools in the tests jar, such as PerformanceEvaluation.

But anyway, maybe they should be moved to main...

Istvan Toth  于2024年3月5日周二 16:14写道:
>
> DISCLAIMER: I don't have a patch ready, or even an elegant way mapped out
> to achieve this, this is about discussing whether we even want to make
> these changes.
> These are also substantial changes, but they could be targeted for HBase
> 3.0.
>
> One issue I have noticed is that we ship test jars and test dependencies in
> the assembly.
> I can't see anyone using those, but it bloats the assembly and classpath,
> and adds unnecessary JARs with possible CVE issues. (for example Kerby
> which is a Hadoop minicluster dependency)
>
> My proposal is to exclude the test jars and the test scope dependencies
> from the assembly.
>
> The advantages would be:
> * Smaller distro size
> * Faster startup (this is marginal)
> * Less CVE-prone JARs in the binary assemblies
>
> The other issue is that the assembly includes much of the Hadoop
> distribution.
> The basic assumption in all scripts and instructions is that the node has a
> fully configured Hadoop installation, and we include it in the classpath of
> HBase.
>
> If that is true, then there is no reason to include Hadoop in the assembly,
> HBase and its direct dependencies should be enough.
>
> One could argue that it would simplify the client side, which is true to
> some extent (though 95% of the client distro use cases are served better by
> simply using hbase-shaded-client).
>
> We could either remove the Hadoop libraries from either or both of the
> assemblies unconditionally, or provide two variants for either or both
> assemblies, one with Hadoop included, and one without it.
> Spark already does this, it has binary distributions both with and without
> Hadoop.
>
> The advantages would be:
> * Smaller distro size
> * Faster startup (this is marginal)
> * Less chance of conflicts with the Hadoop jars
> * Less CVE-prone JARs in the binary assemblies
>
>
> Thirdly, we could consider excluding the
> full-fat org.apache.hbase:hbase-shaded-client JAR from the Hadoop-less
> binary assemblies. It is not used by the assembly, and AFAIK it is not
> included in any of the 'hbase classpath' command variants.
>
> This would make sure that no Hadoop libraries are included (even in shaded
> form) and would make the HBase distribution fully insulated from Hadoop's
> CVE issues.
>
> (The full-fat hbase-shaded-client works best as direct build-time
> dependency anyway)
>
> best regards
> Istvan


[jira] [Created] (HBASE-28420) Aborting Active HMaster is not rejecting remote Procedure Reports

2024-03-05 Thread Umesh Kumar Kumawat (Jira)
Umesh Kumar Kumawat created HBASE-28420:
---

 Summary: Aborting Active HMaster is not rejecting remote Procedure 
Reports
 Key: HBASE-28420
 URL: https://issues.apache.org/jira/browse/HBASE-28420
 Project: HBase
  Issue Type: Bug
  Components: master, proc-v2
Affects Versions: 2.5.7
Reporter: Umesh Kumar Kumawat
Assignee: Umesh Kumar Kumawat


If the Active Hmaster is in the process of abortion and another HMaster is 
becoming Active HMaster.If at the same time region server reports the 
completion of the remote procedure, it generally goes to the old active HMaster 
because of the cached value of rssStub -> 
[code|https://github.com/apache/hbase/blob/branch-2.5/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L2829]
 ([caller 
method|https://github.com/apache/hbase/blob/branch-2.5/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L3941]).
 On the Master side 
([code|https://github.com/apache/hbase/blob/branch-2.5/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java#L2381]),
 It did check if the service is started but that returns true if the master is 
in the process of abortion.  

This issue becomes *critical* when *ServerCrash of meta hosting RS and master 
failover* happens at the same time and hbase:meta got stuck in the offline 
state.

Logs for abortion start of HMaster 
{noformat}
2024-02-02 07:33:11,581 ERROR [PEWorker-6] master.HMaster - * ABORTING 
master server4-1xxx,61000,1705169084562: FAILED persisting 
region=52d36581218e00a2668776cfea897132 state=CLOSING *{noformat}
Logs of starting SCP for meta carrying host 
{noformat}
2024-02-02 07:33:32,622 INFO [aster/server3-1xxx61000:becomeActiveMaster] 
assignment.AssignmentManager - Scheduled ServerCrashProcedure pid=3305546 for 
server5-1xxx61020,1706857451955 (carryingMeta=true) 
server5-1-xxx61020,1706857451955/CRASHED/regionCount=1/lock=java.util.concurrent.locks.ReentrantReadWriteLock@1b0a5293[Write
 locks = 1, Read locks = 0], oldState=ONLINE.{noformat}
initialization of remote procedure
{noformat}
2024-02-02 07:33:33,178 INFO [PEWorker-4] procedure2.ProcedureExecutor - 
Initialized subprocedures=[{pid=3305548, ppid=3305547, state=RUNNABLE; 
SplitWALRemoteProcedure 
server5-1-t%2C61020%2C1706857451955.meta.1706858156058.meta, 
worker=server4-1-,61020,1705169180881}]{noformat}
Logs of remote procedure handling on Old Active Hmaster(server4-1xxx,61000) (in 
the process of abortion)
{noformat}
2024-02-02 07:33:37,990 DEBUG 
[r.default.FPBQ.Fifo.handler=243,queue=9,port=61000] master.HMaster - Remote 
procedure done, pid=3305548{noformat}
Logs of Hmaster trying to becomeActivehmaster -

 
{noformat}
2024-02-02 07:33:43,159 WARN [aster/server3-1-ia2:61000:becomeActiveMaster] 
master.HMaster - hbase:meta,,1.1588230740 is NOT online; state={1588230740 
state=OPEN, ts=1706859212481, server=server5-1-xxx,61020,1706857451955}; 
ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern 
until region onlined.{noformat}
After this master was stuck for almost 1 hour. We had to do hmaster failover to 
come out of this situation. 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] Removing tests and/or Hadoop from the binary assemblies

2024-03-05 Thread Istvan Toth
DISCLAIMER: I don't have a patch ready, or even an elegant way mapped out
to achieve this, this is about discussing whether we even want to make
these changes.
These are also substantial changes, but they could be targeted for HBase
3.0.

One issue I have noticed is that we ship test jars and test dependencies in
the assembly.
I can't see anyone using those, but it bloats the assembly and classpath,
and adds unnecessary JARs with possible CVE issues. (for example Kerby
which is a Hadoop minicluster dependency)

My proposal is to exclude the test jars and the test scope dependencies
from the assembly.

The advantages would be:
* Smaller distro size
* Faster startup (this is marginal)
* Less CVE-prone JARs in the binary assemblies

The other issue is that the assembly includes much of the Hadoop
distribution.
The basic assumption in all scripts and instructions is that the node has a
fully configured Hadoop installation, and we include it in the classpath of
HBase.

If that is true, then there is no reason to include Hadoop in the assembly,
HBase and its direct dependencies should be enough.

One could argue that it would simplify the client side, which is true to
some extent (though 95% of the client distro use cases are served better by
simply using hbase-shaded-client).

We could either remove the Hadoop libraries from either or both of the
assemblies unconditionally, or provide two variants for either or both
assemblies, one with Hadoop included, and one without it.
Spark already does this, it has binary distributions both with and without
Hadoop.

The advantages would be:
* Smaller distro size
* Faster startup (this is marginal)
* Less chance of conflicts with the Hadoop jars
* Less CVE-prone JARs in the binary assemblies


Thirdly, we could consider excluding the
full-fat org.apache.hbase:hbase-shaded-client JAR from the Hadoop-less
binary assemblies. It is not used by the assembly, and AFAIK it is not
included in any of the 'hbase classpath' command variants.

This would make sure that no Hadoop libraries are included (even in shaded
form) and would make the HBase distribution fully insulated from Hadoop's
CVE issues.

(The full-fat hbase-shaded-client works best as direct build-time
dependency anyway)

best regards
Istvan