Junegunn Choi created HBASE-29136: ------------------------------------- Summary: Make ALTER operations safer Key: HBASE-29136 URL: https://issues.apache.org/jira/browse/HBASE-29136 Project: HBase Issue Type: Improvement Reporter: Junegunn Choi
I'd like to start a discussion on how we can make ALTER operations safer in HBase. h1. Problem Statement A small mistake can lead to a catastrophic failure. {code:ruby} create 't', 'd', { NUMREGIONS => 10, SPLITALGO => 'HexStringSplit' } # Wanted to specify 512KB, but mistakenly put an extra *, ended up passing an astronomical value alter 't', { CONFIGURATION => { 'hbase.storescanner.pread.max.bytes' => 1024 ** 512 } } # Mistakenly thinking that HBase would understand this notation alter 't', { CONFIGURATION => { 'hbase.storescanner.pread.max.bytes' => '2MB' } } {code} What then happens is that all the regions of the table go down and never come back up. HBase retries and retries as if more retries would make {{NumberFormatException}} go away. {noformat} assignment.TransitRegionStateProcedure: Retry=9 of max=2147483647; pid=46, ppid=38, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true; TransitRegionStateProcedure table=t, region=3347e7a93b19edda40c2ce08fe3cbf27, REOPEN/MOVE; state=OPENING, location=localhost,16020,1739865304886 assignment.TransitRegionStateProcedure: Failed transition, suspend 32secs pid=46, ppid=38, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; TransitRegionStateProcedure table=t, region=3347e7a93b19edda40c2ce08fe3cbf27, REOPEN/MOVE; state=OPENING, location=null; waiting on rectified condition fixed by other Procedure or operator intervention org.apache.hadoop.hbase.HBaseIOException: Failed confirm OPEN of state=OPENING, location=null, table=t, region=3347e7a93b19edda40c2ce08fe3cbf27 (remote log may yield more detail on why).{noformat} How do you get the service back up? Here's what an ordinary operator would do: * First, you kill the HBase shell that is stuck waiting for the response and restart it. * Then you try to undo the change with a correct {{alter}} command, only to learn that the previous operation is holding the lock. {code:ruby} alter 't', { CONFIGURATION => { 'hbase.storescanner.pread.max.bytes' => 1024 * 512 } } {code} * Restart the shell again and try to {{{}disable{}}}. But it's stuck again. {code:ruby} disable 't' {code} * You realize that normal operations are not going to work. You reach for {{hbck}} and "bypass" the procedures. {code:ruby} hbck = @hbase.instance_variable_get(:@connection).hbck hbck.bypassProcedure([30, 153], 10000, true, true) {code} * Now the procedures are gone, but the regions are still stuck in {{OPENING}} state, and the table is in {{DISABLING}} state. * Try {{alter}} again. It seems to work! {code:ruby} alter 't', { CONFIGURATION => { 'hbase.storescanner.pread.max.bytes' => 1024 * 512 } } # Updating all regions with the new schema... # All regions updated. # Done. # Took 1.1817 seconds {code} * But you can't {{enable}} it. {code:ruby} enable 't' # ERROR: Table tableName=t, state=DISABLING should be disabled! {code} * You need to modify the table state using {{hbck}} again {code:ruby} hbck.setTableStateInMeta( org.apache.hadoop.hbase.client.TableState.new( TableName.valueOf('t'), org.apache.hadoop.hbase.client.TableState::State::DISABLED ) ) {code} * Now you can finally enable it. {code:ruby} enable 't' {code} This is a painful process for such a small mistake. If you're not familiar with {{{}hbck{}}}, you might have to spend hours figuring out how to get the table back up. We should try to make the alter operation safer. h1. Current ways to alleviate the problem * {{hbase.reopen.table.regions.progressive.batch.size.max}} helps a lot in this scenario. With the config, only one region will be out of service. ** However, you still have to do the whole dance to get the region back up. * Having a small {{hbase.assignment.maximum.attempts}} value can help as the procedure will fail fast, and we can avoid the "bypass" step. ** But it's a global configuration of the HBase master, and setting it to a very small number is probably not a good idea. h1. Client effort to avoid the problem Ideally, the operator should verify the alter command beforehand on a test cluster, preferably with a small {{hbase.assignment.maximum.attempts}} value, and proceed with the production cluster only if it works as expected. This is probably what everyone should be doing, but mistakes are made in the real world, and it shouldn't be this easy to unintentionally bring the table down. h1. Proposals to make alter operation safer h2. Proposal 1. Make it easier to verify the alter command on a temporary table To avoid such mistakes, the operator would want to verify an alter command on a temporary table first. But it's surprisingly not easy to do it properly. I'll explain why. This proposal aims to make the process easier. h3. Proposal 1-1. Table-wise {{hbase.assignment.maximum.attempts}} The default value of {{hbase.assignment.maximum.attempts}} is {{{}INT_MAX{}}}. So HBase will endlessly retry even on non-recoverable errors like {{{}NumberFormatException{}}}. And the operator has to rely on {{hbck}} to "bypass" the retrying procedure. If we can make the configuration table-wise, it would be easier to verify an alter command on a temporary table and clean up the mess if something goes wrong. (i.e. no need for {{{}bypass{}}}) {code:ruby} create 'temp', 'd', { MAX_ASSIGNMENT_ATTEMPTS => 1 } alter 'temp', { CONFIGURATION => { 'hbase.hregion.majorcompaction' => 'x' } } {code} h3. Proposal 1-2. Allowing disabling regions in {{FAILED_OPEN}} state So with the above configuration, we can avoid using {{hbck}} for bypassing the procedure. But there's another problem. The region will be in {{FAILED_OPEN}} state, and because of that, we can't disable the table. When you try to disable the table, HBase will try to reassign the region. [https://github.com/apache/hbase/blob/e00f492a216a8827cb1d47ded6e259ddc691d764/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java#L400-L404] So to disable and drop the table, we would have to use {{hbck}} to alter the state in the meta table. I don't see why we shouldn't allow disabling (closing) regions in {{FAILED_OPEN}} state. They are not open, so we just need to update the meta table and mark them as {{{}CLOSED{}}}. h3. Proposal 1-3. Provide safe alter command in the shell (optional) With the above enhancements, users can implement a safe alter command that works as follows: # Create a temp table with the same schema as the target table, but with {{MAX_ASSIGNMENT_ATTEMPTS}} set to 1. # Apply the alter command to the temp table. ## If it succeeds, drop the temp table and apply the alter command to the target table. ## If it fails, drop the temp table. Abort the operation. But, we can consider providing the official implementation of the scenario above as a new shell command, something like {{{}alter_safe{}}}. h2. Proposal 2. Add type validation to the configuration While the method described above can help in many cases, I realized there are some configuration parameters that can outright crash HBase server. Here are some examples: {code:ruby} alter 't', { CONFIGURATION => { 'hbase.regionserver.optionalcacheflushinterval' => 'x' } } alter 't', { CONFIGURATION => { 'hbase.hregion.max.filesize.jitter' => 'x' } } alter 't', { CONFIGURATION => { 'hbase.hstore.open.and.close.threads.max' => 'x' } } alter 't', { CONFIGURATION => { 'hbase.hregion.memstore.block.multiplier' => 'x' } } {code} {noformat} handler.AssignRegionHandler: Fatal error occurred while opening region t,,1739868021543.c2a35d28432c8869c40d3fe9f464c52e., aborting... java.lang.IllegalStateException: Could not instantiate a region instance.{noformat} So testing an alter command on a temp table is clearly not enough of a safeguard. To prevent fatal errors like this, we have to properly validate the values of these configuration parameters. We have {{TableDescriptorChecker}} for the purpose, but it is far from complete. We can start adding checks to the class, but it's a tedious and time-consuming process, and it's easy to miss updating it when a new configuration is added. To increase the validation coverage without too much effort, I propose adding type definition to the configuration keys as below: {code:java} public static final String HBASE_HSTORE_COMPACTION_RATIO_KEY = ConfigKey.FLOAT("hbase.hstore.compaction.ratio"); public static final String HBASE_HSTORE_COMPACTION_MIN_KEY = ConfigKey.INT("hbase.hstore.compaction.min"); public static final String HBASE_HSTORE_COMPACTION_MIN_SIZE_KEY = ConfigKey.LONG("hbase.hstore.compaction.min.size"); // With additional predicates public static final String HREGION_MEMSTORE_BLOCK_MULTIPLIER = ConfigKey.INT("hbase.hregion.memstore.block.multiplier", v -> v > 0); {code} Then {{TableDescriptorChecker}} will perform basic type validation for these parameters before modifying the table. The advantages of this approach compared to manually addings checks to {{TableDescriptorChecker}} are as follows. * Easier to increase the coverage. ** Because it's much easier and faster to add simple type checks this way, than to add a separate code block to {{{}TableDescriptorChecker{}}}. * Easier to read and understand the code. ** {{ConfigKey.TYPE}} serves as a documentation of the expected type of the configuration key. * Easier to keep track of the configuration keys that need validation. But this approach has a limitation: * If a class is not loaded, the config keys listed in it are not going to be validated. We should track if these classes are loaded by the master server. ** {{{}HRegion{}}}, {{{}HStore{}}}, {{{}ScanInfo{}}}, and {{CompactionConfiguration}} classes are all in {{regionserver}} namespace, but they are loaded by the master during startup while initializing {{{}MasterRegion{}}}. *If we can make the validation coverage high enough with this approach, the first proposal might not be necessary.* h3. Alternative considered I thought about using annotations for the purpopse, but because the configuration keys are scattered across the codebase, it would be hard to track them all using reflection. It also has the same limitation as the suggested approach. h1. Summary We have patches for each proposal. Except for proposal 1-3, they are independent of each other, so I can open subtasks and submit patches. You can find them here: [https://github.com/junegunn/hbase/commits/alter-safe/] But considering that proposal 1 has a fundamental limitation (cannot deal with server crashes), I think we should go with proposal 2 and try to increase the validation coverage as much as possible. And once we have reached the point where we are confident that the validation coverage is high enough, we are not going need what's proposed in proposal 1. So I'm going to open a pull request for proposal 2 first. I have added type validations to some of the configuration keys that are used in alter commands. I'd like to hear your thoughts. -- This message was sent by Atlassian Jira (v8.20.10#820010)