Junegunn Choi created HBASE-29357:
-------------------------------------
Summary: PerformanceEvaluation: Read tests should not drop
existing table
Key: HBASE-29357
URL: https://issues.apache.org/jira/browse/HBASE-29357
Project: HBase
Issue Type: Bug
Components: PE
Reporter: Junegunn Choi
h2. Problem
A read test such as {{randomRead}} might drop an existing table when the
specified {{--presplit}} value is not consistent with the current number of
regions of the table.
{code:java}
# Generate data
bin/hbase pe --nomapred --size=2 --presplit=30 sequentialWrite 1
# Perform a read test on the table. Forgot to remove --presplit option, but
it's okay.
bin/hbase pe --nomapred --size=2 --presplit=30 --sampleRate=0.1 sequentialRead 1
# But if the number of the regions has changed
bin/hbase shell -n <<< "split 'TestTable'"
# The --presplit option will cause recreation of the table.
bin/hbase pe --nomapred --size=2 --presplit=30 --sampleRate=0.1 sequentialRead 1
# Operation: DISABLE, Table Name: default:TestTable completed
# Operation: DELETE, Table Name: default:TestTable completed
# Operation: CREATE, Table Name: default:TestTable completed
{code}
One might say it's wrong to put a {{--presplit}} value in a read test, yes, but
even so, it should not cause recreation of the table, which makes the following
read test meaningless.
h2. Analysis
There are currently 4 conditions for recreating the table.
{code:java}
if (
(exists && opts.presplitRegions != DEFAULT_OPTS.presplitRegions
&& opts.presplitRegions != admin.getRegions(tableName).size())
|| (!isReadCmd && desc != null
&& !StringUtils.equals(desc.getRegionSplitPolicyClassName(),
opts.splitPolicy))
|| (!(isReadCmd || isDeleteCmd) && desc != null
&& desc.getRegionReplication() != opts.replicas)
|| (desc != null && desc.getColumnFamilyCount() != opts.families)
) {
needsDelete = true;
{code}
But they are inconsistent in how they treat {{{}isReadCmd{}}}.
h2. Suggestion
*Premise: never drop an existing table unless executing a write command.*
||Condition||Current behavior||Suggested behavior||
|Region count changed|{color:#de350b}Table recreated{color}|Proceed the test
with a warning|
|Split policy changed|No warning|Proceed the test with a warning|
|Replication factor changed|No warning|Proceed the test with a warning|
|CF count changed|{color:#de350b}Table recreated{color}|Abort the test with a
warning|
* Change of region count or split policy shouldn't affect read tests, so it's
better to proceed the test but with a warning.
* I can also imagine wanting to perform a read test with {{--replicas=1}} even
when the table has a different setting.
* Technically, we can still run a read test if the current number of CFs is
greater than the requested number of CFs, but I decided not to allow it to
avoid confusion.
h2. Result
{code:java}
bin/hbase pe --nomapred --size=2 --presplit=30 --sampleRate=0.1 sequentialRead 1
# Inconsistent table state detected. Consider running a write command first:
[--presplit=30, but found 60 regions]
bin/hbase pe --nomapred --size=2 --presplit=30 --replicas=2 --sampleRate=0.1
sequentialRead 1
# Inconsistent table state detected. Consider running a write command first:
[--presplit=30, but found 60 regions], [--replicas=2, but found 1 replicas]
bin/hbase pe --nomapred --size=2 --presplit=30 --replicas=2
--splitPolicy=org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy
--sampleRate=0.1 sequentialRead 1
# Inconsistent table state detected. Consider running a write command first:
[--presplit=30, but found 60 regions],
[--splitPolicy=org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy,
but current policy is null], [--replicas=2, but found 1 replicas]
bin/hbase pe --nomapred --size=2 --presplit=30 --replicas=2
--splitPolicy=org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy
--families=2 --sampleRate=0.1 sequentialRead 1
# java.lang.IllegalStateException: Cannot proceed the test. Run a write
command first: --families=2, but found 1 column families
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)