dlmarion commented on issue #5577:
URL: https://github.com/apache/accumulo/issues/5577#issuecomment-2902377291

   Here is a summary of the problem:
   
   1. The TabletGroupWatcher uses information about the TabletServers when 
making Tablet management decisions.
   
   
https://github.com/apache/accumulo/blob/a8b798308492fc4dc660792340a89aae20c05916/server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java#L702-L710
   
   2. Manager.tserverStatus is updated in Manager.StatusThread.updateStatus 
(line 784)
   
   
https://github.com/apache/accumulo/blob/a8b798308492fc4dc660792340a89aae20c05916/server/manager/src/main/java/org/apache/accumulo/manager/Manager.java#L782-L806
   
   3. If the cluster is started with the root table needing recovery and no 
tablet server information in `tserverStatus`, then 
Manager.StatusThread.updateStatus hangs on line 803 in a `balanceTablets` call 
because it can't read from the metadata table.
   
   ```
   "Status Thread" #35 daemon prio=5 os_prio=0 cpu=145.60ms elapsed=125.45s 
tid=0x00007f3408199000 nid=0x2dad9 waiting on condition  [0x00007f34784e1000]
      java.lang.Thread.State: TIMED_WAITING (sleeping)
           at java.lang.Thread.sleep([email protected]/Native Method)
           at java.lang.Thread.sleep([email protected]/Thread.java:334)
           at 
java.util.concurrent.TimeUnit.sleep([email protected]/TimeUnit.java:446)
           at 
com.google.common.util.concurrent.Uninterruptibles.sleepUninterruptibly(Uninterruptibles.java:405)
           at 
org.apache.accumulo.core.clientImpl.RootClientTabletCache.findTablet(RootClientTabletCache.java:159)
           at 
org.apache.accumulo.core.clientImpl.ThriftScanner.getNextScanAddress(ThriftScanner.java:556)
           at 
org.apache.accumulo.core.clientImpl.ThriftScanner.scan(ThriftScanner.java:659)
           at 
org.apache.accumulo.core.clientImpl.ScannerIterator.readBatch(ScannerIterator.java:162)
           - locked <0x00000000f7dbfe68> (a 
org.apache.accumulo.core.clientImpl.ThriftScanner$ScanState)
           at 
org.apache.accumulo.core.clientImpl.ScannerIterator.getNextBatch(ScannerIterator.java:180)
           at 
org.apache.accumulo.core.clientImpl.ScannerIterator.hasNext(ScannerIterator.java:112)
           at 
org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.readRow(IsolatedScanner.java:72)
           at 
org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.<init>(IsolatedScanner.java:150)
           at 
org.apache.accumulo.core.client.IsolatedScanner.iterator(IsolatedScanner.java:239)
           at 
org.apache.accumulo.core.client.RowIterator.<init>(RowIterator.java:126)
           at 
org.apache.accumulo.core.metadata.schema.TabletsMetadata$Builder.lambda$12(TabletsMetadata.java:258)
           - locked <0x00000000f7dc0000> (a 
org.apache.accumulo.core.client.IsolatedScanner)
           at 
org.apache.accumulo.core.metadata.schema.TabletsMetadata$Builder$$Lambda$477/0x000000084040a440.apply(Unknown
 Source)
           at 
org.apache.accumulo.core.metadata.schema.TabletsMetadata$Builder.lambda$16(TabletsMetadata.java:274)
           at 
org.apache.accumulo.core.metadata.schema.TabletsMetadata$Builder$$Lambda$478/0x000000084040a840.iterator(Unknown
 Source)
           at 
java.lang.Iterable.spliterator([email protected]/Iterable.java:101)
           at 
org.apache.accumulo.core.metadata.schema.TabletsMetadata.stream(TabletsMetadata.java:599)
           at 
org.apache.accumulo.manager.Manager.partitionMigrations(Manager.java:650)
           at 
org.apache.accumulo.manager.Manager$StatusThread.balanceTablets(Manager.java:905)
           at 
org.apache.accumulo.manager.Manager$StatusThread.updateStatus(Manager.java:803)
           at 
org.apache.accumulo.manager.Manager$StatusThread.run(Manager.java:769)
           at 
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
           at java.lang.Thread.run([email protected]/Thread.java:829)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to