[jira] [Resolved] (HBASE-28050) RSProcedureDispatcher to fail-fast for krb auth failures

2023-09-28 Thread Viraj Jasani (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani resolved HBASE-28050.
--
Fix Version/s: 2.6.0
   2.4.18
   2.5.6
   3.0.0-beta-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

> RSProcedureDispatcher to fail-fast for krb auth failures
> 
>
> Key: HBASE-28050
> URL: https://issues.apache.org/jira/browse/HBASE-28050
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
> Fix For: 2.6.0, 2.4.18, 2.5.6, 3.0.0-beta-1
>
>
> As discussed on the parent Jira, let's mark the remote procedures fail when 
> we encounter SaslException (GSS initiate failed) as this belongs to the 
> category of known IOException where we are certain that the request has not 
> yet reached to the target regionserver yet.
> This should help release dispatcher threads for other 
> ExecuteProceduresRemoteCall executions.
>  
> Example log:
> {code:java}
> 2023-08-25 02:21:02,821 WARN [ispatcher-pool-40777] 
> procedure.RSProcedureDispatcher - request to rs1,61020,1692930044498 failed 
> due to java.io.IOException: Call to address=rs1:61020 failed on local 
> exception: java.io.IOException: 
> org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS 
> initiate failed, try=0, retrying...  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28118) Web UI of Thrift, REST and RegionServer are partially broken

2023-09-28 Thread Dmitry Zavodnikov (Jira)
Dmitry Zavodnikov created HBASE-28118:
-

 Summary: Web UI of Thrift, REST and RegionServer are partially 
broken
 Key: HBASE-28118
 URL: https://issues.apache.org/jira/browse/HBASE-28118
 Project: HBase
  Issue Type: Bug
  Components: UI
Affects Versions: 2.5.5, 2.4.17, 2.3.7
Reporter: Dmitry Zavodnikov
 Attachments: REST Service Web UI (broken).png, RegionServer Web UI 
(broken).png, Thrift Server Web UI (broken).png

If I go to Web UI of:
 * Thrift
 * REST
 * RegionServer

I saw that UI is partialy overlapped (see screnshots).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28068) Add hbase.normalizer.merge.merge_request_max_number_of_regions property to limit max number of regions in a merge request for merge normalization

2023-09-28 Thread Nick Dimiduk (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-28068.
--
Resolution: Fixed

Addendums applied.

> Add hbase.normalizer.merge.merge_request_max_number_of_regions property to 
> limit max number of regions in a merge request for merge normalization
> -
>
> Key: HBASE-28068
> URL: https://issues.apache.org/jira/browse/HBASE-28068
> Project: HBase
>  Issue Type: Improvement
>  Components: Normalizer
>Affects Versions: 2.4.0, 2.5.0, 2.6.0, 3.0.0-alpha-4, 4.0.0-alpha-1
>Reporter: Ravi Kishore Valeti
>Assignee: Rahul Kumar
>Priority: Minor
> Fix For: 2.6.0, 2.4.18, 2.5.6, 3.0.0-beta-1, 4.0.0-alpha-1
>
>
> In our production environment, while investigating an issue, we observed that 
> the Noramlizer had scheduled one single merge procedure to an RS providing 
> 27K+ empty regions of a table (this was a result of a failed copy table job 
> that left 27K+ empty regions of the table) to merge.
> This action led the procedure to go to stuck state and eventually the 
> procedure framework bailed out after ~40mins. This was happening with each 
> normalizer run until we deleted the table manually.
> Logs
> Normalizer triggers a merge procedure
> normalizer.RegionNormalizerWorker - NormalizationTarget[regionInfo=\{ENCODED 
> => 6e8606335a62f6bafceb017dc7edfdf5, NAME => 'TEST.TEST_TABLE,.', 
> STARTKEY => '', ENDKEY => ''},{*}regionSizeMb=0{*}], 
> NormalizationTarget[regionInfo=\{ENCODED => 79607df308d7618e632abe8a12c1bf6b, 
> NAME => 'TEST.TEST_TABLE,', STARTKEY => 'XXYY', ENDKEY => 
> 'YYZZ'},{*}regionSizeMb=0]{*}]] resulting in *pid 21968356*
> procedure immediately gets stuck
> procedure2.ProcedureExecutor - Worker *stuck* PEWorker-56(pid=21968356), run 
> time 12.4850 sec
> Finally fails after ~40 mins
> procedure2.ProcedureExecutor - Worker *stuck* PEWorker-56(pid=21968356), run 
> time *40 mins, 58.055 sec*
> Bails out with RuntimeException
> procedure2.ProcedureExecutor - force=false
> java.lang.UnsupportedOperationException: pid=21968356, 
> state=FAILED:MERGE_TABLE_REGIONS_UPDATE_META, locked=true, 
> exception=java.lang.{*}RuntimeException via CODE-BUG: Uncaught runtime 
> exception{*}: pid=21968356, state=RUNNABLE:MERGE_TABLE_REGIONS_UPDATE_META, 
> locked=true; MergeTableRegionsProcedure table=TEST.TEST_TABLE, 
> {*}regions={*}{*}[269a1b168af497cce9ba6d3d581568f2{*}
> .
> .
> .
> .
> *27K+ regions printed here]*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28117) [HBCK2] extraRegionsInMeta need supoort delete extra regions when table not exist

2023-09-28 Thread chaijunjie (Jira)
chaijunjie created HBASE-28117:
--

 Summary: [HBCK2] extraRegionsInMeta need supoort delete extra 
regions when table not exist
 Key: HBASE-28117
 URL: https://issues.apache.org/jira/browse/HBASE-28117
 Project: HBase
  Issue Type: Improvement
  Components: hbck2
Affects Versions: 2.4.14
Reporter: chaijunjie


Some times, we delete one table dir on hdfs, we need use hbck2 to fix region 
info in hbase:meta...

but some times, we use hbase shell delete the table state uncarefully,just like 
execute:

deleteall 'hbase:meta','t1,xxx' (it not prevented...)

then the table state lose

when we want to use extraRegionsInMeta to remove these unuseful regions in 
meta...but it failed,beacuse the table is not exist..

I think we should support do extraRegionsInMeta when table not exists...or 
there are other method to fix it?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)