dlmarion commented on issue #5577: URL: https://github.com/apache/accumulo/issues/5577#issuecomment-2902377291
Here is a summary of the problem: 1. The TabletGroupWatcher uses information about the TabletServers when making Tablet management decisions. https://github.com/apache/accumulo/blob/a8b798308492fc4dc660792340a89aae20c05916/server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java#L702-L710 2. Manager.tserverStatus is updated in Manager.StatusThread.updateStatus (line 784) https://github.com/apache/accumulo/blob/a8b798308492fc4dc660792340a89aae20c05916/server/manager/src/main/java/org/apache/accumulo/manager/Manager.java#L782-L806 3. If the cluster is started with the root table needing recovery and no tablet server information in `tserverStatus`, then Manager.StatusThread.updateStatus hangs on line 803 in a `balanceTablets` call because it can't read from the metadata table. ``` "Status Thread" #35 daemon prio=5 os_prio=0 cpu=145.60ms elapsed=125.45s tid=0x00007f3408199000 nid=0x2dad9 waiting on condition [0x00007f34784e1000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep([email protected]/Native Method) at java.lang.Thread.sleep([email protected]/Thread.java:334) at java.util.concurrent.TimeUnit.sleep([email protected]/TimeUnit.java:446) at com.google.common.util.concurrent.Uninterruptibles.sleepUninterruptibly(Uninterruptibles.java:405) at org.apache.accumulo.core.clientImpl.RootClientTabletCache.findTablet(RootClientTabletCache.java:159) at org.apache.accumulo.core.clientImpl.ThriftScanner.getNextScanAddress(ThriftScanner.java:556) at org.apache.accumulo.core.clientImpl.ThriftScanner.scan(ThriftScanner.java:659) at org.apache.accumulo.core.clientImpl.ScannerIterator.readBatch(ScannerIterator.java:162) - locked <0x00000000f7dbfe68> (a org.apache.accumulo.core.clientImpl.ThriftScanner$ScanState) at org.apache.accumulo.core.clientImpl.ScannerIterator.getNextBatch(ScannerIterator.java:180) at org.apache.accumulo.core.clientImpl.ScannerIterator.hasNext(ScannerIterator.java:112) at org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.readRow(IsolatedScanner.java:72) at org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.<init>(IsolatedScanner.java:150) at org.apache.accumulo.core.client.IsolatedScanner.iterator(IsolatedScanner.java:239) at org.apache.accumulo.core.client.RowIterator.<init>(RowIterator.java:126) at org.apache.accumulo.core.metadata.schema.TabletsMetadata$Builder.lambda$12(TabletsMetadata.java:258) - locked <0x00000000f7dc0000> (a org.apache.accumulo.core.client.IsolatedScanner) at org.apache.accumulo.core.metadata.schema.TabletsMetadata$Builder$$Lambda$477/0x000000084040a440.apply(Unknown Source) at org.apache.accumulo.core.metadata.schema.TabletsMetadata$Builder.lambda$16(TabletsMetadata.java:274) at org.apache.accumulo.core.metadata.schema.TabletsMetadata$Builder$$Lambda$478/0x000000084040a840.iterator(Unknown Source) at java.lang.Iterable.spliterator([email protected]/Iterable.java:101) at org.apache.accumulo.core.metadata.schema.TabletsMetadata.stream(TabletsMetadata.java:599) at org.apache.accumulo.manager.Manager.partitionMigrations(Manager.java:650) at org.apache.accumulo.manager.Manager$StatusThread.balanceTablets(Manager.java:905) at org.apache.accumulo.manager.Manager$StatusThread.updateStatus(Manager.java:803) at org.apache.accumulo.manager.Manager$StatusThread.run(Manager.java:769) at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) at java.lang.Thread.run([email protected]/Thread.java:829) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
