[jira] [Created] (HBASE-28625) ExportSnapshot should verify checksums for the source file and the target file

2024-05-28 Thread Liangjun He (Jira)
Liangjun He created HBASE-28625:
---

 Summary: ExportSnapshot should verify checksums for the source 
file and the target file
 Key: HBASE-28625
 URL: https://issues.apache.org/jira/browse/HBASE-28625
 Project: HBase
  Issue Type: Improvement
Reporter: Liangjun He
Assignee: Liangjun He






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Marking Filters based on their hinting / row stateful behaviours.

2024-05-28 Thread Andrew Purtell
Actually source compatibility with default methods would be fine too. I
forget this is the main reason default methods were invented. The code of
derived classes would not need to be changed, unless the returned value of
the new method should be changed, and this is no worse than having a marker
interface, which would also require code changes to implement non-default
behaviors.

A marker interface does remain as an option. It might make a difference in
chained use cases. Consider a chain of filter instances that mixes derived
code that is unaware of isHinting() and base code that is. The filter chain
can be examined for the presence or absence of the marker interface and
would not need to rely on every filter in the chain passing return values
of isHinting back.

Marker interfaces can also be added to denote stateful or stateless
filters, if distinguishing between them would be useful, perhaps down the
road.

On Tue, May 28, 2024 at 5:13 PM Andrew Purtell  wrote:

> I think you've clearly put a lot of time into the analysis and it is
> plausible.
>
> Adding isHinting as a default method will preserve binary compatibility.
> Source compatibility for derived custom filters would be broken though and
> that probably prevents this going back into a releasing code line.
>
> Have you considered adding a marker interface instead? That would preserve
> both source and binary compatibility. It wouldn't require any changes to
> derived custom filters. A runtime instanceof test would determine if the
> filter is a hinting filter or not. No need for a new method, default or
> otherwise.
>
> On Tue, May 28, 2024 at 12:41 AM Istvan Toth  wrote:
>
>> I have recently opened HBASE-28622
>>  , which has turned
>> out
>> to be another aspect of the problem discussed in HBASE-20565
>>  .
>>
>> The problem is discussed in detail in HBASE-20565
>>  , but it boils down
>> to
>> the API design decision that the filters returning SEEK_NEXT_USING_HINT
>> rely on filterCell() getting called.
>>
>> On the other hand, some filters maintain an internal row state that sets
>> counters for calls of filterCell(), which interacts with the results of
>> previous filters in a filterList.
>>
>> When filters return different results for filterRowkey(), then filters
>> returning  SEEK_NEXT_USING_HINT that have returned false must have
>> filterCell() called, otherwise the scan will degenerate into a full scan.
>>
>> On the other hand, filters that maintain an internal row state must only
>> be
>> called if all previous filters have INCLUDEed the Cell, otherwise their
>> internal state will be off. (This still has caveats, as described in
>> HBASE-20565 )
>>
>> In my opinion, the current code from HBASE-20565
>>  strikes a bad balance
>> between features, as while it fixes some use cases for row stateful
>> filters, it also often negates the performance benefits of the filters
>> providing hints, which in practice makes them unusable in many filter list
>> combinations.
>>
>> Without completely re-designing the filter system, I think that the best
>> solution would be adding a method to distinguish the filters that can
>> return hints from the rest of them. (This was also suggested in
>> HBASE-20565
>>  , but it was not
>> implemented)
>>
>> In theory, we have four combinations of hinting and row stateful filters,
>> but currently we have no filters that are both hinting and row stateful,
>> and I don't think that there is valid use case for those. The ones that
>> are
>> neither hinting nor stateful could be handled as either, but treating them
>> as non-hinting seems faster.
>>
>> Once we have that, we can improve the filterList behaviour a lot:
>> - in filterRowKey(), if any hinting filter returns false, then we could
>> return false
>> - in filterCell(), rather than returning on the first non-include result,
>> we could process the remaining hinting filters, while skipping the
>> non-hinting ones.
>>
>> The code changes are minimal, we just need to add a new method like
>> isHinting() to the Filter class, and change the above two methods.
>>
>> We could add this even in 2.5, by defaulting isHinting() to return false
>> in
>> the Filter class, which would preserve the current API and behaviour for
>> existing custom filters.
>>
>> I was looking at it from the AND filter perspective, but if needed,
>> similar
>> changes could be made to the OR filter.
>>
>> What do you think ?
>> Is this a good idea ?
>>
>> Istvan
>>
>
-- 
Best regards,
Andrew

Unrest, ignorance distilled, nihilistic imbeciles -
It's what we’ve earned
Welcome, apocalypse, what’s taken you so long?
Bring us the fitting end that we’ve been counting on
   - A23, Welcome, Apo

Re: [DISCUSS] Marking Filters based on their hinting / row stateful behaviours.

2024-05-28 Thread Andrew Purtell
I think you've clearly put a lot of time into the analysis and it is
plausible.

Adding isHinting as a default method will preserve binary compatibility.
Source compatibility for derived custom filters would be broken though and
that probably prevents this going back into a releasing code line.

Have you considered adding a marker interface instead? That would preserve
both source and binary compatibility. It wouldn't require any changes to
derived custom filters. A runtime instanceof test would determine if the
filter is a hinting filter or not. No need for a new method, default or
otherwise.

On Tue, May 28, 2024 at 12:41 AM Istvan Toth  wrote:

> I have recently opened HBASE-28622
>  , which has turned out
> to be another aspect of the problem discussed in HBASE-20565
>  .
>
> The problem is discussed in detail in HBASE-20565
>  , but it boils down to
> the API design decision that the filters returning SEEK_NEXT_USING_HINT
> rely on filterCell() getting called.
>
> On the other hand, some filters maintain an internal row state that sets
> counters for calls of filterCell(), which interacts with the results of
> previous filters in a filterList.
>
> When filters return different results for filterRowkey(), then filters
> returning  SEEK_NEXT_USING_HINT that have returned false must have
> filterCell() called, otherwise the scan will degenerate into a full scan.
>
> On the other hand, filters that maintain an internal row state must only be
> called if all previous filters have INCLUDEed the Cell, otherwise their
> internal state will be off. (This still has caveats, as described in
> HBASE-20565 )
>
> In my opinion, the current code from HBASE-20565
>  strikes a bad balance
> between features, as while it fixes some use cases for row stateful
> filters, it also often negates the performance benefits of the filters
> providing hints, which in practice makes them unusable in many filter list
> combinations.
>
> Without completely re-designing the filter system, I think that the best
> solution would be adding a method to distinguish the filters that can
> return hints from the rest of them. (This was also suggested in HBASE-20565
>  , but it was not
> implemented)
>
> In theory, we have four combinations of hinting and row stateful filters,
> but currently we have no filters that are both hinting and row stateful,
> and I don't think that there is valid use case for those. The ones that are
> neither hinting nor stateful could be handled as either, but treating them
> as non-hinting seems faster.
>
> Once we have that, we can improve the filterList behaviour a lot:
> - in filterRowKey(), if any hinting filter returns false, then we could
> return false
> - in filterCell(), rather than returning on the first non-include result,
> we could process the remaining hinting filters, while skipping the
> non-hinting ones.
>
> The code changes are minimal, we just need to add a new method like
> isHinting() to the Filter class, and change the above two methods.
>
> We could add this even in 2.5, by defaulting isHinting() to return false in
> the Filter class, which would preserve the current API and behaviour for
> existing custom filters.
>
> I was looking at it from the AND filter perspective, but if needed, similar
> changes could be made to the OR filter.
>
> What do you think ?
> Is this a good idea ?
>
> Istvan
>


-- 
Best regards,
Andrew

Unrest, ignorance distilled, nihilistic imbeciles -
It's what we’ve earned
Welcome, apocalypse, what’s taken you so long?
Bring us the fitting end that we’ve been counting on
   - A23, Welcome, Apocalypse


[jira] [Created] (HBASE-28624) Docs around configuring backups can lead to unexpectedly disabling other features

2024-05-28 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28624:
-

 Summary: Docs around configuring backups can lead to unexpectedly 
disabling other features
 Key: HBASE-28624
 URL: https://issues.apache.org/jira/browse/HBASE-28624
 Project: HBase
  Issue Type: Bug
Reporter: Bryan Beaudreault


In our documentation for enabling backups, we suggest that the user set the 
following:
{code:java}

  hbase.master.logcleaner.plugins
  org.apache.hadoop.hbase.backup.master.BackupLogCleaner,...


  hbase.master.hfilecleaner.plugins
  org.apache.hadoop.hbase.backup.BackupHFileCleaner,...
 {code}
A naive user will set these and not know what to do about the ",..." part. In 
doing so, they will unexpectedly be disabling all of the default cleaners we 
have. For example here are the defaults:
{code:java}

  hbase.master.logcleaner.plugins
  
org.apache.hadoop.hbase.master.cleaner.TimeToLiveLogCleaner,org.apache.hadoop.hbase.master.cleaner.TimeToLiveProcedureWALCleaner,org.apache.hadoop.hbase.master.cleaner.TimeToLiveMasterLocalStoreWALCleaner


  hbase.master.hfilecleaner.plugins
  
org.apache.hadoop.hbase.master.cleaner.TimeToLiveHFileCleaner,org.apache.hadoop.hbase.master.cleaner.TimeToLiveMasterLocalStoreHFileCleaner
 {code}
So basically disabling support for hbase.master.logcleaner.ttl and 
hbase.master.hfilecleaner.ttl.

There exists a method BackupManager.decorateMasterConfiguration and 
BackupManager.decorateRegionServerConfiguration. They are currently javadoc'd 
as being for tests only, but I think we should call these in HMaster and 
HRegionServer. Then we can only require the user to set "hbase.backup.enable" 
and very much simplify our docs here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28588) Remove deprecated methods in WAL

2024-05-28 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-28588.
---
Fix Version/s: 3.0.0-beta-2
 Hadoop Flags: Reviewed
   Resolution: Fixed

Pushed to master and branch-3.

THanks [~heliangjun] for contributing!

> Remove deprecated methods in WAL
> 
>
> Key: HBASE-28588
> URL: https://issues.apache.org/jira/browse/HBASE-28588
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Reporter: Duo Zhang
>Assignee: Liangjun He
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0-beta-2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Failure: HBase Generate Website

2024-05-28 Thread Apache Jenkins Server

Build status: FAILURE

The HBase website has not been updated to incorporate recent HBase changes.

See https://ci-hbase.apache.org/job/hbase_generate_website/536/console


[jira] [Created] (HBASE-28623) Scan with MultiRowRangeFilter very slow

2024-05-28 Thread chaijunjie (Jira)
chaijunjie created HBASE-28623:
--

 Summary: Scan with MultiRowRangeFilter very slow
 Key: HBASE-28623
 URL: https://issues.apache.org/jira/browse/HBASE-28623
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 2.4.14
Reporter: chaijunjie


when we *scan* a big table({*}more than 500 regions{*}) with 
{*}MultiRowRangeFilter{*}, it is very slow...

it seems to {*}scan all regions{*}...

for example:

we scan 3 ranges..

startRow: 097_28220_ stopRow: 097_28220_~

startRow: 098_28221_ stopRow: 098_28221_~

startRow: 099_28222_ stopRow: 099_28222_~

and enable TRACE log in hbase client

we find the too many scans

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] Marking Filters based on their hinting / row stateful behaviours.

2024-05-28 Thread Istvan Toth
I have recently opened HBASE-28622
 , which has turned out
to be another aspect of the problem discussed in HBASE-20565
 .

The problem is discussed in detail in HBASE-20565
 , but it boils down to
the API design decision that the filters returning SEEK_NEXT_USING_HINT
rely on filterCell() getting called.

On the other hand, some filters maintain an internal row state that sets
counters for calls of filterCell(), which interacts with the results of
previous filters in a filterList.

When filters return different results for filterRowkey(), then filters
returning  SEEK_NEXT_USING_HINT that have returned false must have
filterCell() called, otherwise the scan will degenerate into a full scan.

On the other hand, filters that maintain an internal row state must only be
called if all previous filters have INCLUDEed the Cell, otherwise their
internal state will be off. (This still has caveats, as described in
HBASE-20565 )

In my opinion, the current code from HBASE-20565
 strikes a bad balance
between features, as while it fixes some use cases for row stateful
filters, it also often negates the performance benefits of the filters
providing hints, which in practice makes them unusable in many filter list
combinations.

Without completely re-designing the filter system, I think that the best
solution would be adding a method to distinguish the filters that can
return hints from the rest of them. (This was also suggested in HBASE-20565
 , but it was not
implemented)

In theory, we have four combinations of hinting and row stateful filters,
but currently we have no filters that are both hinting and row stateful,
and I don't think that there is valid use case for those. The ones that are
neither hinting nor stateful could be handled as either, but treating them
as non-hinting seems faster.

Once we have that, we can improve the filterList behaviour a lot:
- in filterRowKey(), if any hinting filter returns false, then we could
return false
- in filterCell(), rather than returning on the first non-include result,
we could process the remaining hinting filters, while skipping the
non-hinting ones.

The code changes are minimal, we just need to add a new method like
isHinting() to the Filter class, and change the above two methods.

We could add this even in 2.5, by defaulting isHinting() to return false in
the Filter class, which would preserve the current API and behaviour for
existing custom filters.

I was looking at it from the AND filter perspective, but if needed, similar
changes could be made to the OR filter.

What do you think ?
Is this a good idea ?

Istvan