haridsv commented on PR #8116:
URL: https://github.com/apache/hbase/pull/8116#issuecomment-4314845890

   @Apache9 **I gave your inputs to to Cursor and asked it further diagnose and 
this is what it said:**
   
   ---
     I concur with the contributor. The metrics deadlock is a real bug and 
appears
     to be the primary root cause. Here's why:
   
     What bad-run2 actually shows — two distinct failures in sequence
   
     First attempt (visible in the XML, starting at ~07:24:54):
     - testRegionServerSessionExpired actually ran for 353 seconds and timed out
     - The NameNode entered safe mode due to low disk space ("NameNode low on 
     available disk space. Already in safe mode." at line 645 of the output)
     - Meta region got stuck in "opening" state for ~5 minutes (07:25:13 through
     07:30:xx)
     - The deadlock the contributor describes prevented meta from ever finishing
     open
   
     Retry attempt (visible in the .txt summary, starting at 07:36:54):
     - Surefire retried the test class, but the static TEST_UTIL still has
     clusterTestDir set from the first attempt
     - setupClusterTestDir() is a no-op (returns early because clusterTestDir 
!= 
     null)
     - startMiniDFSCluster(2) at line 71 creates a new NameNode in the same 
     directory that still has stale data from the first attempt
     - Result: InconsistentFSStateException — a secondary failure caused by the
     failed first attempt not cleaning up
   
     The deadlock explained
   
     Two threads hold resources the other needs:
   
     1. RS_OPEN_META thread (opening meta region): needs to register a metrics
     source
     AssignRegionHandler.process → HRegion.openHRegion → replayRecoveredEdits →
     flushCache → StoreFileWriter.close → HFile.updateWriteLatency →
     MetricsIO.getInstance → MetricsIOSourceImpl.<init> →
     MetricsSystemImpl.register()  ← BLOCKED (needs MetricsSystem lock)
     2. HBase-Metrics2 thread (JmxCacheBuster stopping MetricsSystem): holds the
     MetricsSystem lock, needs meta to be readable
     JmxCacheBusterRunnable.run → MetricsSystemImpl.stop() → stopSources() →
     MetricsSourceAdapter.stop() → getMBeanInfo() → updateJmxCache() → 
getMetrics()
      →
     MetricsMasterSourceImpl.getMetrics → getRegionCounts() →
     listTableDescriptors() →
     isTablePresent() → reads meta table →
     CompletableFuture.get()  ← WAITING (meta region can't open)
   
     Classic deadlock: Meta open needs the metrics lock. Metrics shutdown holds 
the
      lock and needs meta.
   
   ---
   
   **I also asked it to propose a fix and got the below response:**
   
   ---
   
     First run (deadline 17:22:56, starting ~17:17):
     - Line 95: testRegionServerSessionExpired — timed out at 349s (meta stuck
     opening due to deadlock)
     - Lines 12845-12880: remaining 4 tests — all failed instantly (deadline
     already exceeded)
     - Line 12881: afterAll — timed out at 360s
   
     Rerun (deadline 17:28:56, starting at 17:28:57):
     - Line 12889: <rerunError> — beforeAll timed out →
     InconsistentFSStateException
   
     The output .txt file only contains the rerun's logs (starts at 17:28:57). 
The
     first run's logs are embedded in the XML's <system-err> CDATA section for
     testRegionServerSessionExpired.
   
     The pattern is identical in both bad-run and bad-run2: deadlock causes 
timeout
      in first run → stale state causes InconsistentFSStateException in rerun.
   
     ---
     Proposed Fix for the Deadlock
   
     The deadlock chain is:
   
     1. RS_OPEN_META thread: opening meta → replay edits → flush →
     HFile.updateWriteLatency() → MetricsIO.getInstance() → 
BaseSourceImpl.<init> →
      MetricsSystemImpl.register() → BLOCKED (MetricsSystem lock held by thread 
2)
     2. HBase-Metrics2 thread: JmxCacheBuster → MetricsSystemImpl.stop() → HOLDS
     MetricsSystem lock → stopSources() → MetricsSourceAdapter.getMBeanInfo() →
     updateJmxCache() → getMetrics() → MetricsMasterSourceImpl.getMetrics() →
     MetricsMasterWrapperImpl.getRegionCounts() → master.listTableDescriptors() 
→
     tableStateManager.isTablePresent() → reads meta → WAITING (meta being 
opened
     by thread 1)
   
     The best fix is Option A: make getRegionCounts() not read meta. The current
     implementation calls master.listTableDescriptors() just to get the list of
     tables, then calls the in-memory getRegionByStateOfTable() per table. But
     listTableDescriptors() calls tableStateManager.isTablePresent() which does 
a
     synchronous meta read. This is unnecessary — the
     AssignmentManager.RegionStates already has the full in-memory state:
   
   ```
   ⏺ 
Update(~/src/apache/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/m
           aster/MetricsMasterWrapperImpl.java)
     ⎿  Added 11 lines, removed 18 lines
         217  
         218    @Override
         219    public PairOfSameType<Integer> getRegionCounts() {
         220 -    try {                                                     
         221 -      if (!master.isInitialized()) {                          
         222 -        return new PairOfSameType<>(0, 0);                    
         223 -      }                                                       
         224 -      Integer onlineRegionCount = 0;                          
         225 -      Integer offlineRegionCount = 0;                         
         226 -                                                              
         227 -      List<TableDescriptor> descriptors = master.listTableDesc
             -riptors(null, null, null, false);                             
         228 -                                                              
         229 -      for (TableDescriptor htDesc : descriptors) {            
         230 -        TableName tableName = htDesc.getTableName();          
         231 -        Map<RegionState.State, List<RegionInfo>> tableRegions 
             -=                                                             
         232 -          master.getAssignmentManager().getRegionStates().getR
             -egionByStateOfTable(tableName);                               
         233 -        onlineRegionCount += tableRegions.get(RegionState.Stat
             -e.OPEN).size();                                               
         234 -        offlineRegionCount += tableRegions.get(RegionState.Sta
             -te.OFFLINE).size();                                           
         235 -      }                                                       
         236 -      return new PairOfSameType<>(onlineRegionCount, offlineRe
             -gionCount);                                                   
         237 -    } catch (IOException e) {                                 
         220 +    if (!master.isInitialized() || master.getAssignmentManager
             +() == null) {                                                 
         221        return new PairOfSameType<>(0, 0);
         222      }
         223 +    int onlineRegionCount = 0;                                
         224 +    int offlineRegionCount = 0;                               
         225 +    for (RegionState rs : master.getAssignmentManager().getReg
             +ionStates().getRegionStates()) {                              
         226 +      if (rs.isOpened()) {                                    
         227 +        onlineRegionCount++;                                  
         228 +      } else if (rs.isOffline()) {                            
         229 +        offlineRegionCount++;                                 
         230 +      }                                                       
         231 +    }                                                         
         232 +    return new PairOfSameType<>(onlineRegionCount, offlineRegi
             +onCount);                                                     
         233    }
         234  
         235    @Override
   ```
   
   ⏺ Now let me verify the RegionState API has isOpened() and isOffline():
   
     Searched for 1 pattern (ctrl+o to expand)
   
   ⏺ Let me also verify we still match the old semantics — the old code checked
     OPEN and OFFLINE states:
     Read 1 file (ctrl+o to expand)
   
   ⏺ Good, isOpened() checks State.OPEN and isOffline() checks State.OFFLINE — 
same
      states as the old code. Now let me clean up unused imports:
   
     Read 1 file (ctrl+o to expand)
   
   ⏺ Let me check what's still used in the file:
   
     Searched for 1 pattern (ctrl+o to expand)
   
   ⏺ Now let me check if IOException, List, RegionInfo, and TableDescriptor are
     still used elsewhere in the file:
   
     Searched for 2 patterns (ctrl+o to expand)
   
   ⏺ Good — IOException, RegionInfo, and TableDescriptor are no longer used. 
Let me
      also check List:
   
     Searched for 1 pattern (ctrl+o to expand)
   
   ⏺ Let me clean up the unused imports and add the new one:
   
   ```
   ⏺ 
Update(~/src/apache/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/m
           aster/MetricsMasterWrapperImpl.java)
     ⎿  Added 1 line, removed 4 lines
         17   */
         18  package org.apache.hadoop.hbase.master;
         19  
         20 -import java.io.IOException;                                    
         20  import java.util.AbstractMap.SimpleImmutableEntry;
         21  import java.util.Collections;
         22  import java.util.HashMap;
         24 -import java.util.List;                                         
         23  import java.util.Map;
         24  import java.util.Map.Entry;
         25  import org.apache.commons.lang3.StringUtils;
         26  import org.apache.hadoop.hbase.ServerName;
         27  import org.apache.hadoop.hbase.TableName;
         30 -import org.apache.hadoop.hbase.client.RegionInfo;              
         31 -import org.apache.hadoop.hbase.client.TableDescriptor;         
         28 +import org.apache.hadoop.hbase.master.RegionState;             
         29  import org.apache.hadoop.hbase.quotas.QuotaObserverChore;
         30  import org.apache.hadoop.hbase.quotas.SpaceQuotaSnapshot;
         31  import org.apache.hadoop.hbase.util.PairOfSameType;
   ```
   
   ⏺ Let me verify the final state of the file:
   
     Read 1 file (ctrl+o to expand)
   
   ⏺ Good. Now let me verify the compilation would work by checking that
     RegionState is accessible from this package:
   
     Searched for 1 pattern (ctrl+o to expand)
   
   ⏺ Let me verify the changes look complete:
   
   ```
   ⏺ Bash(cd /Users/hdara/src/apache/hbase && git diff --stat HEAD)
     ⎿   .../hbase/master/MetricsMasterWrapperImpl.java     | 34 ++++++++-----
        ---------
         1 file changed, 12 insertions(+), 22 deletions(-)
     ⎿  Shell cwd was reset to /Users/hdara/tmp/Debug-TestZooKeeper-Flapper
   
   ⏺ Bash(cd /Users/hdara/src/apache/hbase && git diff HEAD)
     ⎿  diff --git hbase-server/src/main/java/org/apache/hadoop/hbase/master/M
        etricsMasterWrapperImpl.java hbase-server/src/main/java/org/apache/had
        oop/hbase/master/MetricsMasterWrapperImpl.java
        … +35 lines (ctrl+o to expand)
     ⎿  Shell cwd was reset to /Users/hdara/tmp/Debug-TestZooKeeper-Flapper
   ```
   
   ⏺ The fix looks clean. Let me summarize everything:
   
     ---
     Summary
   
     Two Runs Demarcation
   
     Both bad-run and bad-run2 follow the same pattern. In the XML:
   
     Column 1: bad-run
     First Run: testRegionServerSessionExpired timed out (349s), deadline 
17:22:56.
   
       Remaining 4 tests failed instantly. Found in XML lines 95-12880.
     Rerun: <rerunError> at XML line 12889: InconsistentFSStateException. Rerun
     logs
       in the .txt output file.
     ────────────────────────────────────────
     Column 1: bad-run2
     First Run: testRegionServerSessionExpired timed out (353s), deadline 
07:30:54.
   
       Found in XML line 95.
     Rerun: InconsistentFSStateException in .txt summary. Rerun logs in the .txt
       output file.
   
     The first run's output logs are embedded in the XML's <system-err> CDATA 
for
     testRegionServerSessionExpired. The .txt output file only captures the 
rerun.
   
     Fix Applied
   
     Changed MetricsMasterWrapperImpl.getRegionCounts() to use purely in-memory
     RegionStates instead of calling master.listTableDescriptors() which reads 
meta
      via tableStateManager.isTablePresent().
   
     Why this breaks the deadlock: The HBase-Metrics2 thread (JmxCacheBuster) 
will
     no longer attempt to read the meta table while holding the MetricsSystem 
lock.
      Since RegionStates.getRegionStates() iterates an in-memory
     ConcurrentSkipListMap, it cannot block on meta availability, eliminating 
the
     circular dependency.
   
     Semantic equivalence: The old code iterated tables from 
listTableDescriptors()
      and summed OPEN/OFFLINE counts per table. The new code iterates all
     RegionState entries from the same in-memory regionsMap and counts the same
     State.OPEN and State.OFFLINE states directly. The counts will be identical.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to