keith-turner commented on code in PR #5576:
URL: https://github.com/apache/accumulo/pull/5576#discussion_r2109473622
##########
server/manager/src/main/java/org/apache/accumulo/manager/Manager.java:
##########
@@ -1265,14 +1265,21 @@ public void run() {
context.getTableManager().addObserver(this);
- Thread statusThread = Threads.createThread("Status Thread", new
StatusThread());
+ // TODO KEVIN RATHBUN updating the Manager state seems like a critical
function. However, the
+ // thread already handles, waits, and continues in the case of any
Exception, so critical or
+ // non critical doesn't make a difference here.
+ Thread statusThread = Threads.createCriticalThread("Status Thread", new
StatusThread());
Review Comment:
> so critical or non critical doesn't make a difference here
still seems nice to make this change though, makes the code more resilient
to changes in the function. Like if a bit of code was added outside the try
block in the method and it threw a runtime exception.
##########
server/manager/src/main/java/org/apache/accumulo/manager/Manager.java:
##########
@@ -1616,14 +1625,16 @@ private TServer setupReplication()
Property.MANAGER_REPLICATION_COORDINATOR_THREADCHECK,
maxMessageSizeProperty);
log.info("Started replication coordinator service at " +
replAddress.address);
+ // TODO KEVIN RATHBUN this thread creation exists within a task which is
labeled non-critical
+ // so assuming these are as well.
// Start the daemon to scan the replication table and make units of work
- replicationWorkThread = Threads.createThread("Replication Driver",
+ replicationWorkThread = Threads.createNonCriticalThread("Replication
Driver",
new org.apache.accumulo.manager.replication.ReplicationDriver(this));
replicationWorkThread.start();
// Start the daemon to assign work to tservers to replicate to our peers
var wd = new org.apache.accumulo.manager.replication.WorkDriver(this);
- replicationAssignerThread = Threads.createThread(wd.getName(), wd);
+ replicationAssignerThread = Threads.createNonCriticalThread(wd.getName(),
wd);
Review Comment:
> within a task which is labeled non-critical
Where is it labeled non-critical?
##########
server/tserver/src/main/java/org/apache/accumulo/tserver/TabletClientHandler.java:
##########
@@ -1246,24 +1246,24 @@ public void loadTablet(TInfo tinfo, TCredentials
credentials, String lock,
TabletLogger.loading(extent, server.getTabletSession());
final AssignmentHandler ah = new AssignmentHandler(server, extent);
- // final Runnable ah = new LoggingRunnable(log, );
// Root tablet assignment must take place immediately
if (extent.isRootTablet()) {
- Threads.createThread("Root Tablet Assignment", () -> {
+ // TODO KEVIN RATHBUN I think this should remain non critical. This
method is ultimately
+ // called by TabletGroupWatcher.flushChanges which is always called
within a loop, so will
+ // continue to retry/recreate the thread
+ Threads.createNonCriticalThread("Root Tablet Assignment", () -> {
Review Comment:
Yeah this triggered by an RPC and the manager will keep making that RPC
periodically.
##########
server/monitor/src/main/java/org/apache/accumulo/monitor/Monitor.java:
##########
@@ -514,7 +514,9 @@ public void run() {
}
// need to regularly fetch data so plot data is updated
- Threads.createThread("Data fetcher", () -> {
+ // TODO KEVIN RATHBUN don't think this is a critical function of the
Monitor (and the
+ // RuntimeException is already handled here)
+ Threads.createNonCriticalThread("Data fetcher", () -> {
Review Comment:
Even though it handles any exception, would be good to mark this as
critical. If this thread stops running some data displayed on the monitor
would be be frozen and would never update.
##########
server/tserver/src/main/java/org/apache/accumulo/tserver/log/DfsLogger.java:
##########
@@ -475,7 +475,10 @@ public synchronized void open(String address) throws
IOException {
throw new IOException(ex);
}
- syncThread = Threads.createThread("Accumulo WALog thread " + this, new
LogSyncingTask());
+ // TODO KEVIN RATHBUN this seems like a vital thread for TabletServer, but
appears that the
+ // thread will continuously be recreated, so probably fine to stay non
critical
+ syncThread =
+ Threads.createNonCriticalThread("Accumulo WALog thread " + this, new
LogSyncingTask());
Review Comment:
Yeah this should probably be critical. I suspect if it dies that writes to
a tserver could freeze indefinitely.
##########
core/src/main/java/org/apache/accumulo/core/util/threads/Threads.java:
##########
@@ -55,22 +55,26 @@ public static Runnable createNamedRunnable(String name,
Runnable r) {
return new NamedRunnable(name, r);
}
- public static Thread createThread(String name, Runnable r) {
- return createThread(name, OptionalInt.empty(), r, UEH);
+ public static Thread createNonCriticalThread(String name, Runnable r) {
Review Comment:
> My goal was to force callers to always make a conscious decision about
which method to use.
One possible way to achieve this is via an enum that must always be passed
to the method instead of encoding the information int the method name. Feel
free to ignore this comment, not sure it would be an improvement was just a
thought based on the goal mentioned.
```java
enum ThreadType {
CRITICAL,
NON_CRITICAL
}
public static Thread createThread(String name, ThreadType type,, Runnable
r) {...}
``
##########
core/src/main/java/org/apache/accumulo/core/clientImpl/ThriftTransportPool.java:
##########
@@ -73,7 +73,8 @@ public class ThriftTransportPool {
private ThriftTransportPool(LongSupplier maxAgeMillis) {
this.maxAgeMillis = maxAgeMillis;
- this.checkThread = Threads.createThread("Thrift Connection Pool Checker",
() -> {
+ // TODO KEVIN RATHBUN all this does is perform some resource cleanup, so
may not be critical.
+ this.checkThread = Threads.createNonCriticalThread("Thrift Connection Pool
Checker", () -> {
Review Comment:
This thread dying could cause idle connections to accumulo server processes
to be left around indefinitely which could make those server processes
unhealthy. One tricky part about this code is that it runs in client and
server code, for the case of client code not sure about halting the client
process if this thread dies (would be a sudden change in behavior, would it be
easy for someone to determine why their client process halted?). Seems like it
would be good to halt an accumuo server process if this thread dies as that
would help keep metadata tablet servers healthy.
Seems like this is the only change in client side code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]