Junegunn Choi created HBASE-29136:
-------------------------------------

             Summary: Make ALTER operations safer
                 Key: HBASE-29136
                 URL: https://issues.apache.org/jira/browse/HBASE-29136
             Project: HBase
          Issue Type: Improvement
            Reporter: Junegunn Choi


I'd like to start a discussion on how we can make ALTER operations safer in 
HBase.
h1. Problem Statement

A small mistake can lead to a catastrophic failure.
{code:ruby}
create 't', 'd', { NUMREGIONS => 10, SPLITALGO => 'HexStringSplit' }

# Wanted to specify 512KB, but mistakenly put an extra *, ended up passing an 
astronomical value
alter 't', { CONFIGURATION => { 'hbase.storescanner.pread.max.bytes' => 1024 ** 
512 } }

# Mistakenly thinking that HBase would understand this notation
alter 't', { CONFIGURATION => { 'hbase.storescanner.pread.max.bytes' => '2MB' } 
}
{code}
What then happens is that all the regions of the table go down and never come 
back up. HBase retries and retries as if more retries would make 
{{NumberFormatException}} go away.
{noformat}
assignment.TransitRegionStateProcedure: Retry=9 of max=2147483647; pid=46, 
ppid=38, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true; 
TransitRegionStateProcedure table=t, region=3347e7a93b19edda40c2ce08fe3cbf27, 
REOPEN/MOVE; state=OPENING, location=localhost,16020,1739865304886

assignment.TransitRegionStateProcedure: Failed transition, suspend 32secs 
pid=46, ppid=38, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, 
locked=true; TransitRegionStateProcedure table=t, 
region=3347e7a93b19edda40c2ce08fe3cbf27, REOPEN/MOVE; state=OPENING, 
location=null; waiting on rectified condition fixed by other Procedure or 
operator intervention

org.apache.hadoop.hbase.HBaseIOException: Failed confirm OPEN of state=OPENING, 
location=null, table=t, region=3347e7a93b19edda40c2ce08fe3cbf27 (remote log may 
yield more detail on why).{noformat}
How do you get the service back up? Here's what an ordinary operator would do:
 * First, you kill the HBase shell that is stuck waiting for the response and 
restart it.
 * Then you try to undo the change with a correct {{alter}} command, only to 
learn that the previous operation is holding the lock.
{code:ruby}
alter 't', { CONFIGURATION => { 'hbase.storescanner.pread.max.bytes' => 1024 * 
512 } }
{code}

 * Restart the shell again and try to {{{}disable{}}}. But it's stuck again.
{code:ruby}
disable 't'
{code}

 * You realize that normal operations are not going to work. You reach for 
{{hbck}} and "bypass" the procedures.
{code:ruby}
hbck = @hbase.instance_variable_get(:@connection).hbck
hbck.bypassProcedure([30, 153], 10000, true, true)
{code}

 * Now the procedures are gone, but the regions are still stuck in {{OPENING}} 
state, and the table is in {{DISABLING}} state.
 * Try {{alter}} again. It seems to work!
{code:ruby}
alter 't', { CONFIGURATION => { 'hbase.storescanner.pread.max.bytes' => 1024 * 
512 } }
  # Updating all regions with the new schema...
  # All regions updated.
  # Done.
  # Took 1.1817 seconds
{code}

 * But you can't {{enable}} it.
{code:ruby}
enable 't'
  # ERROR: Table tableName=t, state=DISABLING should be disabled!
{code}

 * You need to modify the table state using {{hbck}} again
{code:ruby}
hbck.setTableStateInMeta(
  org.apache.hadoop.hbase.client.TableState.new(
    TableName.valueOf('t'),
    org.apache.hadoop.hbase.client.TableState::State::DISABLED
  )
)
{code}

 * Now you can finally enable it.
{code:ruby}
enable 't'
{code}

This is a painful process for such a small mistake. If you're not familiar with 
{{{}hbck{}}}, you might have to spend hours figuring out how to get the table 
back up.

We should try to make the alter operation safer.
h1. Current ways to alleviate the problem
 * {{hbase.reopen.table.regions.progressive.batch.size.max}} helps a lot in 
this scenario. With the config, only one region will be out of service.
 ** However, you still have to do the whole dance to get the region back up.
 * Having a small {{hbase.assignment.maximum.attempts}} value can help as the 
procedure will fail fast, and we can avoid the "bypass" step.
 ** But it's a global configuration of the HBase master, and setting it to a 
very small number is probably not a good idea.

h1. Client effort to avoid the problem

Ideally, the operator should verify the alter command beforehand on a test 
cluster, preferably with a small {{hbase.assignment.maximum.attempts}} value, 
and proceed with the production cluster only if it works as expected.

This is probably what everyone should be doing, but mistakes are made in the 
real world, and it shouldn't be this easy to unintentionally bring the table 
down.
h1. Proposals to make alter operation safer
h2. Proposal 1. Make it easier to verify the alter command on a temporary table

To avoid such mistakes, the operator would want to verify an alter command on a 
temporary table first. But it's surprisingly not easy to do it properly. I'll 
explain why. This proposal aims to make the process easier.
h3. Proposal 1-1. Table-wise {{hbase.assignment.maximum.attempts}}

The default value of {{hbase.assignment.maximum.attempts}} is {{{}INT_MAX{}}}. 
So HBase will endlessly retry even on non-recoverable errors like 
{{{}NumberFormatException{}}}. And the operator has to rely on {{hbck}} to 
"bypass" the retrying procedure.

If we can make the configuration table-wise, it would be easier to verify an 
alter command on a temporary table and clean up the mess if something goes 
wrong. (i.e. no need for {{{}bypass{}}})
{code:ruby}
create 'temp', 'd', { MAX_ASSIGNMENT_ATTEMPTS => 1 }

alter 'temp', { CONFIGURATION => { 'hbase.hregion.majorcompaction' => 'x' } }
{code}
h3. Proposal 1-2. Allowing disabling regions in {{FAILED_OPEN}} state

So with the above configuration, we can avoid using {{hbck}} for bypassing the 
procedure. But there's another problem. The region will be in {{FAILED_OPEN}} 
state, and because of that, we can't disable the table. When you try to disable 
the table, HBase will try to reassign the region.

[https://github.com/apache/hbase/blob/e00f492a216a8827cb1d47ded6e259ddc691d764/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java#L400-L404]

So to disable and drop the table, we would have to use {{hbck}} to alter the 
state in the meta table.

I don't see why we shouldn't allow disabling (closing) regions in 
{{FAILED_OPEN}} state. They are not open, so we just need to update the meta 
table and mark them as {{{}CLOSED{}}}.
h3. Proposal 1-3. Provide safe alter command in the shell (optional)

With the above enhancements, users can implement a safe alter command that 
works as follows:
 # Create a temp table with the same schema as the target table, but with 
{{MAX_ASSIGNMENT_ATTEMPTS}} set to 1.
 # Apply the alter command to the temp table.
 ## If it succeeds, drop the temp table and apply the alter command to the 
target table.
 ## If it fails, drop the temp table. Abort the operation.

But, we can consider providing the official implementation of the scenario 
above as a new shell command, something like {{{}alter_safe{}}}.
h2. Proposal 2. Add type validation to the configuration

While the method described above can help in many cases, I realized there are 
some configuration parameters that can outright crash HBase server.

Here are some examples:
{code:ruby}
alter 't', { CONFIGURATION => { 'hbase.regionserver.optionalcacheflushinterval' 
=> 'x' } }
alter 't', { CONFIGURATION => { 'hbase.hregion.max.filesize.jitter' => 'x' } }
alter 't', { CONFIGURATION => { 'hbase.hstore.open.and.close.threads.max' => 
'x' } }
alter 't', { CONFIGURATION => { 'hbase.hregion.memstore.block.multiplier' => 
'x' } }
{code}
{noformat}
handler.AssignRegionHandler: Fatal error occurred while opening region 
t,,1739868021543.c2a35d28432c8869c40d3fe9f464c52e., aborting...
java.lang.IllegalStateException: Could not instantiate a region 
instance.{noformat}
So testing an alter command on a temp table is clearly not enough of a 
safeguard.

To prevent fatal errors like this, we have to properly validate the values of 
these configuration parameters. We have {{TableDescriptorChecker}} for the 
purpose, but it is far from complete. We can start adding checks to the class, 
but it's a tedious and time-consuming process, and it's easy to miss updating 
it when a new configuration is added. To increase the validation coverage 
without too much effort, I propose adding type definition to the configuration 
keys as below:
{code:java}
public static final String HBASE_HSTORE_COMPACTION_RATIO_KEY =
  ConfigKey.FLOAT("hbase.hstore.compaction.ratio");

public static final String HBASE_HSTORE_COMPACTION_MIN_KEY =
  ConfigKey.INT("hbase.hstore.compaction.min");

public static final String HBASE_HSTORE_COMPACTION_MIN_SIZE_KEY =
  ConfigKey.LONG("hbase.hstore.compaction.min.size");

// With additional predicates
public static final String HREGION_MEMSTORE_BLOCK_MULTIPLIER =
  ConfigKey.INT("hbase.hregion.memstore.block.multiplier", v -> v > 0);
{code}
 Then {{TableDescriptorChecker}} will perform basic type validation for these 
parameters before modifying the table.

The advantages of this approach compared to manually addings checks to 
{{TableDescriptorChecker}} are as follows.
 * Easier to increase the coverage.
 ** Because it's much easier and faster to add simple type checks this way, 
than to add a separate code block to {{{}TableDescriptorChecker{}}}.
 * Easier to read and understand the code.
 ** {{ConfigKey.TYPE}} serves as a documentation of the expected type of the 
configuration key.
 * Easier to keep track of the configuration keys that need validation.

But this approach has a limitation:
 * If a class is not loaded, the config keys listed in it are not going to be 
validated. We should track if these classes are loaded by the master server.
 ** {{{}HRegion{}}}, {{{}HStore{}}}, {{{}ScanInfo{}}}, and 
{{CompactionConfiguration}} classes are all in {{regionserver}} namespace, but 
they are loaded by the master during startup while initializing 
{{{}MasterRegion{}}}.

*If we can make the validation coverage high enough with this approach, the 
first proposal might not be necessary.*
h3. Alternative considered

I thought about using annotations for the purpopse, but because the 
configuration keys are scattered across the codebase, it would be hard to track 
them all using reflection. It also has the same limitation as the suggested 
approach.
h1. Summary

We have patches for each proposal. Except for proposal 1-3, they are 
independent of each other, so I can open subtasks and submit patches. You can 
find them here: [https://github.com/junegunn/hbase/commits/alter-safe/]

But considering that proposal 1 has a fundamental limitation (cannot deal with 
server crashes), I think we should go with proposal 2 and try to increase the 
validation coverage as much as possible. And once we have reached the point 
where we are confident that the validation coverage is high enough, we are not 
going need what's proposed in proposal 1.

So I'm going to open a pull request for proposal 2 first. I have added type 
validations to some of the configuration keys that are used in alter commands.

I'd like to hear your thoughts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to