From: Ligade, Shailesh (ITADD) (CON) <slig...@fbi.gov>
Sent: Tuesday, October 11, 2022 8:44 AM
To: u...@accumulo.apache.org; Ligade, Shailesh (ITADD) (CON) <slig...@fbi.gov>
Subject: RE: accumu 1.10.0 master log connection refised error

Looking at the fate print/dump

I do see repo: {
   "org.apache.accumulo.master.tableOps.CompactRange" {
       tableId: xx
       namespace: default
      }
}

Does that mean it is stuck on table compact operation but can't finish it for 
whatever reason and hence I it drops tserver connection?
Is it safe to fail/delete this fate? What are the alternatives, if any?

Appreciate your help

-S

From: Shailesh Ligade via user 
<u...@accumulo.apache.org<mailto:u...@accumulo.apache.org>>
Sent: Tuesday, October 11, 2022 8:09 AM
To: u...@accumulo.apache.org<mailto:u...@accumulo.apache.org>
Subject: [EXTERNAL EMAIL] - accumu 1.10.0 master log connection refised error

Hello,

I have 25 node cluster with two masters. Time to time (every 4/5 hours) I get 
on different tserver
Org.apache.thrift.transport.TTransportException: java.net.ConnectionException: 
Connection refused
Error closing output stream
Java.ioException: The stream is closed
                SocketOutputStream.write(SocketOutputStream.java:118)
...
               
master.LiveTServerSet$TServerConnection.compact(LiveTServerSet.java:214)
               master.tableOps.CompactionDriver.isReady(CompactionDriver:168)
                master.tableOps.CompactionDriver.isReady(CompactionDriver:54)
                master.tableOps.Tracerepo.isReady(Tracerepo.java:47)
                fate.Fate$TransactionRunner.run(Fate.java:72)

Everytime its same exception? What may be an issue? Is it stuck in some fate 
operation?
After this tserver restarts (I have it system, with auto restart flag)

How to debug this further.
Appreciate any response.

-S

Reply via email to