From: Ligade, Shailesh (ITADD) (CON) <slig...@fbi.gov> Sent: Tuesday, October 11, 2022 8:44 AM To: u...@accumulo.apache.org; Ligade, Shailesh (ITADD) (CON) <slig...@fbi.gov> Subject: RE: accumu 1.10.0 master log connection refised error
Looking at the fate print/dump I do see repo: { "org.apache.accumulo.master.tableOps.CompactRange" { tableId: xx namespace: default } } Does that mean it is stuck on table compact operation but can't finish it for whatever reason and hence I it drops tserver connection? Is it safe to fail/delete this fate? What are the alternatives, if any? Appreciate your help -S From: Shailesh Ligade via user <u...@accumulo.apache.org<mailto:u...@accumulo.apache.org>> Sent: Tuesday, October 11, 2022 8:09 AM To: u...@accumulo.apache.org<mailto:u...@accumulo.apache.org> Subject: [EXTERNAL EMAIL] - accumu 1.10.0 master log connection refised error Hello, I have 25 node cluster with two masters. Time to time (every 4/5 hours) I get on different tserver Org.apache.thrift.transport.TTransportException: java.net.ConnectionException: Connection refused Error closing output stream Java.ioException: The stream is closed SocketOutputStream.write(SocketOutputStream.java:118) ... master.LiveTServerSet$TServerConnection.compact(LiveTServerSet.java:214) master.tableOps.CompactionDriver.isReady(CompactionDriver:168) master.tableOps.CompactionDriver.isReady(CompactionDriver:54) master.tableOps.Tracerepo.isReady(Tracerepo.java:47) fate.Fate$TransactionRunner.run(Fate.java:72) Everytime its same exception? What may be an issue? Is it stuck in some fate operation? After this tserver restarts (I have it system, with auto restart flag) How to debug this further. Appreciate any response. -S