Petr Krch created HBASE-29356:
---------------------------------
Summary: Incorrect split behavior when region information is
missing
Key: HBASE-29356
URL: https://issues.apache.org/jira/browse/HBASE-29356
Project: HBase
Issue Type: Bug
Components: Normalizer
Affects Versions: 2.6.2, 2.5.6
Environment: Not environment-specific — this is a clear logic bug in
{{{}SimpleRegionNormalizer{}}}. It occurs deterministically when region size
data is missing and can be reproduced via unit tests.
Reporter: Petr Krch
Attachments: fix-count-unknown-region-size-SimpleRegionNormalizer.patch
We have identified a bug in the {{SimpleRegionNormalizer}} logic that leads to
incorrect region splits when region size information is missing. If the size
cannot be determined for one or more regions (e.g. due to unavailable metrics
from RegionServers), the average region size calculation becomes incorrect.
This results in a scenario where *all* regions may be considered too large and
get split unintentionally.
*Observed Behavior:*
When region size data is not available (e.g., {{getRegionSizeMB()}} returns
-1), the computed average size does not account for that, and regions with
valid size may appear excessively large compared to the average — resulting in
multiple unnecessary splits.
*Expected Behavior:*
If region size is unknown for some regions, those regions should be skipped
during normalization. The average region size should be computed only from the
regions for which the size is known. No region should be split or merged unless
its size is known.
*Patch:*
We are submitting a patch that:
* Skips regions with unknown size from average size computation.
* Prevents split and merge operations on regions with unknown size.
* Adds unit tests for scenarios with partial or total absence of size data.
*Patch author:* Milan Vymazal <[email protected]>
*Tests:*
* {{testSplitOfLargeRegionIfOneIsNotKnow}} verifies correct behavior when one
region has unknown size.
* {{testSplitOfAllUnknownSize}} ensures that no split happens if size data is
missing for all regions.
*Reproduction:*
Unfortunately, we are unable to reliably reproduce this bug in a live
environment, since we cannot easily simulate the condition where RegionServer
metrics are missing. However, we have confirmed the behavior through code
analysis and the added unit tests.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)