[jira] [Resolved] (HBASE-28050) RSProcedureDispatcher to fail-fast for krb auth failures
[ https://issues.apache.org/jira/browse/HBASE-28050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Jasani resolved HBASE-28050. -- Fix Version/s: 2.6.0 2.4.18 2.5.6 3.0.0-beta-1 Hadoop Flags: Reviewed Resolution: Fixed > RSProcedureDispatcher to fail-fast for krb auth failures > > > Key: HBASE-28050 > URL: https://issues.apache.org/jira/browse/HBASE-28050 > Project: HBase > Issue Type: Sub-task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Fix For: 2.6.0, 2.4.18, 2.5.6, 3.0.0-beta-1 > > > As discussed on the parent Jira, let's mark the remote procedures fail when > we encounter SaslException (GSS initiate failed) as this belongs to the > category of known IOException where we are certain that the request has not > yet reached to the target regionserver yet. > This should help release dispatcher threads for other > ExecuteProceduresRemoteCall executions. > > Example log: > {code:java} > 2023-08-25 02:21:02,821 WARN [ispatcher-pool-40777] > procedure.RSProcedureDispatcher - request to rs1,61020,1692930044498 failed > due to java.io.IOException: Call to address=rs1:61020 failed on local > exception: java.io.IOException: > org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: > org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS > initiate failed, try=0, retrying... {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28118) Web UI of Thrift, REST and RegionServer are partially broken
Dmitry Zavodnikov created HBASE-28118: - Summary: Web UI of Thrift, REST and RegionServer are partially broken Key: HBASE-28118 URL: https://issues.apache.org/jira/browse/HBASE-28118 Project: HBase Issue Type: Bug Components: UI Affects Versions: 2.5.5, 2.4.17, 2.3.7 Reporter: Dmitry Zavodnikov Attachments: REST Service Web UI (broken).png, RegionServer Web UI (broken).png, Thrift Server Web UI (broken).png If I go to Web UI of: * Thrift * REST * RegionServer I saw that UI is partialy overlapped (see screnshots). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28068) Add hbase.normalizer.merge.merge_request_max_number_of_regions property to limit max number of regions in a merge request for merge normalization
[ https://issues.apache.org/jira/browse/HBASE-28068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk resolved HBASE-28068. -- Resolution: Fixed Addendums applied. > Add hbase.normalizer.merge.merge_request_max_number_of_regions property to > limit max number of regions in a merge request for merge normalization > - > > Key: HBASE-28068 > URL: https://issues.apache.org/jira/browse/HBASE-28068 > Project: HBase > Issue Type: Improvement > Components: Normalizer >Affects Versions: 2.4.0, 2.5.0, 2.6.0, 3.0.0-alpha-4, 4.0.0-alpha-1 >Reporter: Ravi Kishore Valeti >Assignee: Rahul Kumar >Priority: Minor > Fix For: 2.6.0, 2.4.18, 2.5.6, 3.0.0-beta-1, 4.0.0-alpha-1 > > > In our production environment, while investigating an issue, we observed that > the Noramlizer had scheduled one single merge procedure to an RS providing > 27K+ empty regions of a table (this was a result of a failed copy table job > that left 27K+ empty regions of the table) to merge. > This action led the procedure to go to stuck state and eventually the > procedure framework bailed out after ~40mins. This was happening with each > normalizer run until we deleted the table manually. > Logs > Normalizer triggers a merge procedure > normalizer.RegionNormalizerWorker - NormalizationTarget[regionInfo=\{ENCODED > => 6e8606335a62f6bafceb017dc7edfdf5, NAME => 'TEST.TEST_TABLE,.', > STARTKEY => '', ENDKEY => ''},{*}regionSizeMb=0{*}], > NormalizationTarget[regionInfo=\{ENCODED => 79607df308d7618e632abe8a12c1bf6b, > NAME => 'TEST.TEST_TABLE,', STARTKEY => 'XXYY', ENDKEY => > 'YYZZ'},{*}regionSizeMb=0]{*}]] resulting in *pid 21968356* > procedure immediately gets stuck > procedure2.ProcedureExecutor - Worker *stuck* PEWorker-56(pid=21968356), run > time 12.4850 sec > Finally fails after ~40 mins > procedure2.ProcedureExecutor - Worker *stuck* PEWorker-56(pid=21968356), run > time *40 mins, 58.055 sec* > Bails out with RuntimeException > procedure2.ProcedureExecutor - force=false > java.lang.UnsupportedOperationException: pid=21968356, > state=FAILED:MERGE_TABLE_REGIONS_UPDATE_META, locked=true, > exception=java.lang.{*}RuntimeException via CODE-BUG: Uncaught runtime > exception{*}: pid=21968356, state=RUNNABLE:MERGE_TABLE_REGIONS_UPDATE_META, > locked=true; MergeTableRegionsProcedure table=TEST.TEST_TABLE, > {*}regions={*}{*}[269a1b168af497cce9ba6d3d581568f2{*} > . > . > . > . > *27K+ regions printed here]* -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28117) [HBCK2] extraRegionsInMeta need supoort delete extra regions when table not exist
chaijunjie created HBASE-28117: -- Summary: [HBCK2] extraRegionsInMeta need supoort delete extra regions when table not exist Key: HBASE-28117 URL: https://issues.apache.org/jira/browse/HBASE-28117 Project: HBase Issue Type: Improvement Components: hbck2 Affects Versions: 2.4.14 Reporter: chaijunjie Some times, we delete one table dir on hdfs, we need use hbck2 to fix region info in hbase:meta... but some times, we use hbase shell delete the table state uncarefully,just like execute: deleteall 'hbase:meta','t1,xxx' (it not prevented...) then the table state lose when we want to use extraRegionsInMeta to remove these unuseful regions in meta...but it failed,beacuse the table is not exist.. I think we should support do extraRegionsInMeta when table not exists...or there are other method to fix it? -- This message was sent by Atlassian Jira (v8.20.10#820010)