[jira] [Resolved] (HBASE-28549) Make shell commands support column qualifiers with colons

2024-06-07 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-28549.
---
Fix Version/s: 2.7.0
   3.0.0-beta-2
   2.6.1
   2.5.9
 Hadoop Flags: Reviewed
   Resolution: Fixed

Pushed to branch-2.5+.

Thanks [~junegunn]!

> Make shell commands support column qualifiers with colons
> -
>
> Key: HBASE-28549
> URL: https://issues.apache.org/jira/browse/HBASE-28549
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Reporter: Junegunn Choi
>Assignee: Junegunn Choi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> Revisiting abandonded HBASE-13788.
> h2. Problem
> Shell commands do not support column qualifiers with colons (which are 
> actually quite common in practice) because the part after the first colon is 
> always treated as a converter expression.
> This can be too restrictive because:
> # Converters are only used for {{get}} and {{scan}} commands. They are not 
> supported anyway for other mutating commands such as {{put}}, {{delete}}, 
> {{incr}}, etc.
> # Converter expression should follow a specific pattern. It should either be 
> a method name of the {{Bytes}} class, or should be in {{c(CLASS).METHOD}} 
> format. We ignore the part after the first colon even if it does not follow 
> the pattern.
> h2. Suggested solution
> I suggest applying two approaches to make the commands support column 
> qualifiers with colons.
> # Do not interpret column qualifiers when using commands that don't support 
> converters.
> # If the part after the first colon does not follow the pattern, treat it as 
> a part of the column qualifier
> h2. Counterargument
> Depending on how you see it, this makes the commands inconsistent in how they 
> handle column qualifiers. For example, a user may want to use the same column 
> expression throughout the commands.
> {code}
> create 't1', 'cf'
> col = 'cf:cq:toLong'
> # Expecting incr command to automatically ignore :toLong part
> incr 't1', 'r1', col, 1
> get 't1', 'r1', COLUMNS => [col]
> {code}
> However, if we see the converter as an option that is supported by only a few 
> commands, passing it to a command that doesn't support it can be considered 
> to be a user error. {{help 'put'}} or {{help 'delete'}} don't mention 
> anything about converters.
> h2. Alternative solution
> An alternative solution would be to add a global switch that disables the 
> converter interpretation altogether.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Dropping Java 8 support in HBase 3

2024-06-07 Thread Duo Zhang
Bump.

If there are no other big concerns, I will open a new discussion
thread to both user and dev mailing lists to decide which jdk version
to be the minimum support version for hbase 3.x.

And then, let's send a final NOTICE email to formally finish this task.

Thanks.

Bryan Beaudreault  于2024年5月8日周三 19:42写道:
>
> In my experience, there are a few notable areas where core refactoring is
> happening. Most contributions don’t happen in those areas, and as a result
> could be cleanly backported if not for gotchas like HBaseTestingUtil
> rename.
>
> Anyway I agree that just adding a compile check is easier.
>
> That said, I would still advocate for not diverging the jdk version from
> branch-2. In my opinion almost all commits should be backported to
> branch-2. The only exceptions are for specific incompatibile/unsafe 3.x
> features. The reason for that is we don’t do major releases nearly often
> enough, so backporting to branch-2 is the only way to get changes into
> users hands.
>
> So if this change is going to make that much more difficult then personally
> I’d prefer a more aggressive approach of bumping jdk for branch-2, or a
> more conservative approach of not allowing new language features in
> branch-3.
>
> Overall I think more frequent smaller major releases would help us be more
> agile, and aligns more with other modern projects I’ve seen.
>
> On Tue, May 7, 2024 at 10:00 AM Istvan Toth 
> wrote:
>
> > I'd expect the automated backporting process to only work for fairly
> > trivial patches which do not use protobuf, etc.
> > More involved patches would need manual work anyway.
> >
> > If we want to make sure that everything compiles with JDK8, it's easier to
> > just compile the master branch with JDK8 (along with 11/17),
> > and fail the CI check if it doesn't.
> >
> > We need to find a balance between using the new Java features and keeping
> > the workload manageable.
> > We could keep compiling master with JDK8 for a year or two, and when
> > activity on the 2.x branches tapers off, we could remove that restriction.
> >
> >
> > On Tue, May 7, 2024 at 3:56 PM Andrew Purtell 
> > wrote:
> >
> > > I also like the suggestion to have CI help us here too.
> > >
> > > > On May 7, 2024, at 9:42 AM, Bryan Beaudreault  > >
> > > wrote:
> > > >
> > > > I'm nervous about creating more big long-term divergences between the
> > > > branches. Already I sometimes get caught up on HBaseTestingUtil vs
> > > > HBaseTestingUtility. And we all know the burden of maintaining the old
> > > > HTable impl.
> > > >
> > > > I'm not sure if this is a useful suggestion since it would require
> > > someone
> > > > to do a good deal of work, but I wonder if we could automate backport
> > > > testing a bit. Our yetus checks already check the patch, maybe it could
> > > > apply the patch to branch-2. This would increase the cost of master
> > > branch
> > > > PRs but maybe speed us up overall.
> > > >
> > > >> On Tue, May 7, 2024 at 9:21 AM 张铎(Duo Zhang) 
> > > wrote:
> > > >>
> > > >> The problem is that, if we only compile and run tests on JDK11+, the
> > > >> contributors may implicitly use some JDK11+ only features and
> > > >> introduce difference when backporting to branch-2.x.
> > > >>
> > > >> Maybe a possible policy is that, once a patch should go into
> > > >> branch-2.x too, before mering the master PR, we should make sure the
> > > >> contributor open a PR for branch-2.x too, so we can catch the
> > > >> differences between the 2 PRs, and whether to align them.
> > > >>
> > > >> WDYT?
> > > >>
> > > >> Thanks.
> > > >>
> > > >> Andrew Purtell  于2024年5月7日周二 20:20写道:
> > > >>>
> > > >>> I don’t expect 2.x to wind down for up to several more years. We will
> > > be
> > > >>> still using it in production at my employer for a long time and I
> > would
> > > >>> continue my role as RM for 2.x as needed. HBase 3 is great but not GA
> > > yet
> > > >>> and then some users will want to wait one to a couple years before
> > > >> adopting
> > > >>> the new major version, especially if migration is not seamless. (We
> > > even
> > > >>> faced breaking changes in a minor upgrade from 2.4 to 2.5 that
> > brought
> > > >> down
> > > >>> a cluster during a rolling upgrade, so there should be no expectation
> > > of
> > > >> a
> > > >>> seamless upgrade.) My plan is to continue releasing 2.x until, like
> > > with
> > > >>> 1.x, the commits to branch-2 essentially stop, or until the PMC stops
> > > >>> allowing release of the candidates.
> > > >>>
> > > >>> Perhaps we do not need to do a total ban on use of 11 features. We
> > > should
> > > >>> allow a case by case discussion. We can minimize their scope and even
> > > >>> potentially offer multiversion support like we do with Unsafe access
> > > >>> utility classes in hbase-thirdparty. There are no planned uses of new
> > > 11+
> > > >>> APIs and features now anyhow.
> > > >>>
> > > >>>
> > > >>> On Tue, May 7, 2024 at 7:40 AM 张铎(Duo Zhang) 
> > > >> wrote:
> > > >>>
> > > 

Re: mTLS / x509 authentication

2024-06-07 Thread Andrew Purtell
Most users who would employ a mTLS authentication scheme would operate with 
this trust model. The fact the client has a valid signed certificate means it 
can be trusted, and that trust includes supplied connection metadata like 
username. Or, if not, then not. 
So then a lot of security engineering effort goes in to protecting the trust 
established by certificate distribution, like using short lived certs, and 
secure distribution methods. 

> On Jun 7, 2024, at 6:34 AM, Bryan Beaudreault  wrote:
> 
> You're sort of correct. We've been using mTLS in prod for a while now, ever
> since the feature was committed. It's true that the actual HBase username
> is not verified with mTLS, however you still can authenticate the
> connection. The idea behind mTLS is that the certificate carries the
> authentication -- so a client will need a certificate which has been signed
> by the same CA (or at least within the CA chain) which signed the server's
> certificate, and vise versa.
> 
> For us, if someone has a valid certificate and the mTLS authentication
> succeeds, then we just trust their username. Based on how we use HBase in
> our environment, this is perfectly secure for our use-case. That may not
> work for everyone, and I did file a jira to add a feature for validating
> the username (perhaps pulling from a custom certificate property). But I
> haven't actually implemented that, and not sure that I will since it works
> as-is for us.
> 
> I'm on mobile now so I can't find it, but it should be findable in jira if
> you search the tls-related tickets
> 
>> On Fri, Jun 7, 2024 at 8:53 AM Andor Molnar  wrote:
>> 
>> Hi Bryan / Hbase devs,
>> 
>> Based on the changes when you added mTLS support in HBASE-27280 [1],
>> only the certificate and hostname verification part were added to the
>> codebase. HBase doesn't actually authenticates the user when mTLS is
>> being used.
>> 
>> In other words some other auth method Simple or Kerberos is still
>> needed to identify the HBase user, because mTLS doesn't extract
>> identity information from the client certificate and doesn't map it to
>> an active HBase user.
>> 
>> Is that correct?
>> 
>> Regards,
>> Andor
>> 
>> 
>> [1] https://issues.apache.org/jira/browse/HBASE-27280
>> 
>> 
>> 
>> 


Re: mTLS / x509 authentication

2024-06-07 Thread Bryan Beaudreault
You're sort of correct. We've been using mTLS in prod for a while now, ever
since the feature was committed. It's true that the actual HBase username
is not verified with mTLS, however you still can authenticate the
connection. The idea behind mTLS is that the certificate carries the
authentication -- so a client will need a certificate which has been signed
by the same CA (or at least within the CA chain) which signed the server's
certificate, and vise versa.

For us, if someone has a valid certificate and the mTLS authentication
succeeds, then we just trust their username. Based on how we use HBase in
our environment, this is perfectly secure for our use-case. That may not
work for everyone, and I did file a jira to add a feature for validating
the username (perhaps pulling from a custom certificate property). But I
haven't actually implemented that, and not sure that I will since it works
as-is for us.

I'm on mobile now so I can't find it, but it should be findable in jira if
you search the tls-related tickets

On Fri, Jun 7, 2024 at 8:53 AM Andor Molnar  wrote:

> Hi Bryan / Hbase devs,
>
> Based on the changes when you added mTLS support in HBASE-27280 [1],
> only the certificate and hostname verification part were added to the
> codebase. HBase doesn't actually authenticates the user when mTLS is
> being used.
>
> In other words some other auth method Simple or Kerberos is still
> needed to identify the HBase user, because mTLS doesn't extract
> identity information from the client certificate and doesn't map it to
> an active HBase user.
>
> Is that correct?
>
> Regards,
> Andor
>
>
> [1] https://issues.apache.org/jira/browse/HBASE-27280
>
>
>
>


mTLS / x509 authentication

2024-06-07 Thread Andor Molnar
Hi Bryan / Hbase devs,

Based on the changes when you added mTLS support in HBASE-27280 [1],
only the certificate and hostname verification part were added to the
codebase. HBase doesn't actually authenticates the user when mTLS is
being used.

In other words some other auth method Simple or Kerberos is still
needed to identify the HBase user, because mTLS doesn't extract
identity information from the client certificate and doesn't map it to
an active HBase user.

Is that correct?

Regards,
Andor


[1] https://issues.apache.org/jira/browse/HBASE-27280





[jira] [Resolved] (HBASE-28636) Add UTs for testing copy/sync table between clusters

2024-06-07 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-28636.
---
Fix Version/s: 2.7.0
   3.0.0-beta-2
   2.6.1
   2.5.9
 Hadoop Flags: Reviewed
   Resolution: Fixed

Pushed to all active branches.

Thanks [~sunxin] for reviewing!

> Add UTs for testing copy/sync table between clusters
> 
>
> Key: HBASE-28636
> URL: https://issues.apache.org/jira/browse/HBASE-28636
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce, test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> While implementing HBASE-28565, I found out that there is no test for testing 
> these two tools between clusters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28540) Cache Results in org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner

2024-06-07 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28540.
-
Fix Version/s: 2.7.0
   3.0.0-beta-2
   2.6.1
   2.5.9
   Resolution: Fixed

> Cache Results in org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner
> -
>
> Key: HBASE-28540
> URL: https://issues.apache.org/jira/browse/HBASE-28540
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> The implementation of org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner
> is very inefficient, as the standard next() methods makes separate a http 
> request for each row.
> Performance can be improved by not specifying the row count in the REST call 
> and caching the returned Results.
> Chunk size can still be influenced by scan.setBatch();



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Blockers for 2.6.1

2024-06-07 Thread Nick Dimiduk
HBASE-28539 and HBASE-28562 are merged as well.

On Thu, Jun 6, 2024 at 2:57 PM Ray Mattingly 
wrote:

> I found the bug documented in `HBASE-28643: An unbounded backup failure
> message can cause an irrecoverable state for the given backup` yesterday,
> and I think we should probably aim to fix this in the 2.6.1 release.
> Otherwise that list looks good to me.
>
> On Wed, Jun 5, 2024 at 3:35 PM Bryan Beaudreault 
> wrote:
>
> > Hey all,
> >
> > It's been 2 weeks since 2.6.0 was released. As discussed in the vote
> > thread, there were a few outstanding backup-related issues. I believe
> we've
> > made some progress on some of those.
> >
> > I'd like to start compiling a list of important backup-related fixes to
> > target for the 2.6.1 release so that we can track progress. Can those of
> > you who are involved (Ray Mattingly, Nick Dimiduck, Dieter De Paepe &
> team,
> > and any others) please list any important jiras here?
> >
> > With a list of jiras in hand, we can check to make sure blockers &
> > fixVersions are set and use that to track what we need to drill down
> before
> > releasing.
> >
> > Here's what I know of so far, let me know if I'm missing anything:
> >
> > Not yet started:
> > - HBASE-28084: incremental backups should be forbidden after deleting
> > backups
> > - HBASE-28602: Incremental backup fails when WALs move
> > - HBASE-28462: (similar to ^, but in a different part of the code)
> > - HBASE-28538: BackupHFileCleaner is very expensive
> >
> > Patch available:
> > - HBASE-28539: backup merging does not work when using cloud storage as
> > filesystem
> > - HBASE-28562: another possible failure cause for incremental backups +
> > possibly cause of overly big backup metadata
> >
> > Resolved:
> > - HBASE-28502: backed up tables are not listed correctly in backup
> > metadata, which causes unreliable backup validation
> > - HBASE-28568: the set of tables included in incremental backups might be
> > too big
> >
>


Re: [DISCUSS] SyncTable is marked as IA.Private?

2024-06-07 Thread Duo Zhang
LimitedPrivate(TOOLS) and LimitedPrivate(CONFIG) are a bit different
with IA.Public.

For LP(TOOLS), we just need to make sure that there is no breaking
change when invoking from command line, or from the ToolRunner, we do
not need to keep all the public methods unchanged.

For LP(Config), we just need to make sure that that the class name
keeps the same.

Here, I think these tools should be marked as
IA.LimitedPrivate(TOOLS), so we do not need to always keep the public
methods unchanged.

Thanks.

Andrew Purtell  于2024年6月7日周五 07:11写道:
>
> LimitedPrivate and Public are functionally the same in that they are not 
> Private, so compatibility breaking changes must go through a process.  At 
> some future time when someone wants to make a breaking change to the LP 
> interface, we will still take the same steps to deprecate and then change it 
> as we do for Public… usually.
>
> The difference might be, then, how hard we would work to avoid breaking 
> changes. Depending on the proposed change retaining compatibility could be 
> more work than the change itself or might block the change because it is too 
> much effort. With Public there is no question but with LP _maybe_ we might 
> make an exception. With this in mind it is in our interest that only the 
> smallest possible set of core interfaces should be Public and everything else 
> either LP or Private.
>
> > On Jun 5, 2024, at 6:14 PM, 张铎  wrote:
> >
> > Thanks Nick.
> >
> > I've filed HBASE-28639 for promoting SyncTable.
> >
> > For me I agree that maybe IA.LimitedPrivate is better, as we do not
> > expect users to use its public method in code. But since they are
> > already public, we need a long deprecation cycle to mark them other
> > than IA.Public...
> >
> > Nick Dimiduk  于2024年6月6日周四 02:14写道:
> >>
> >> I agree that if these are tools we ship to users, they should be exposed in
> >> the driver. However, probably we want to keep them LimitedPrivate(Tool)
> >> instead of making them fully IA.Public.
> >>
> >> -n
> >>
> >>> On Wed, 5 Jun 2024 at 12:49, 张铎(Duo Zhang)  wrote:
> >>>
> >>> OK, then let's file an issue to promote it to IA.Public and add it to
> >>> Driver.
> >>>
> >>> Thanks.
> >>>
> >>> Pankaj Kumar  于2024年6月5日周三 17:48写道:
> 
>  It looks like a typo and unintentional.
> 
>  Regards,
>  Pankaj
> 
>  On Wed, 5 Jun, 2024, 2:55 pm Wellington Chevreuil, <
>  wellington.chevre...@gmail.com> wrote:
> 
> > It seems it was marked that way by HBASE-20212, which bulk marked many
> > public classes as I.A. Private, as part of the efforts to replace the
> > old TestInterfaceAudienceAnnotations validation by warbucks plugin,
> >>> however
> > I don't see any discussions about which I.A. level should be applied to
> > each class, so it looks unintentional.
> >
> > Em qua., 5 de jun. de 2024 às 09:57, 张铎(Duo Zhang) <
> >>> palomino...@gmail.com>
> > escreveu:
> >
> >> Noticed this when trying to add more UTs for it in HBASE-28636.
> >>
> >> https://hbase.apache.org/book.html#hashtable.synctable
> >>
> >> We do have a section in our ref guide to explain the algorythm for
> >> this tool and also how to make use of it. But I noticed that in our
> >> code base it is marked as IA.Private, and it is not exposed in our
> >>> map
> >> reduce Driver class.
> >>
> >> Is this intentional?
> >>
> >> Thanks.
> >>
> >
> >>>