Just now tried to understand the logic... Whenever an IOException/TTransportException is thrown, we mark a Connection as bad. Slowly when all Connections are greeted by this, we get "All Connections Bad..."
Is it a good idea to write a reaper thread to proactively try & replenish the bad Connection, instead of waiting for search to hit it at the wrong moment? Also, I just found that "staleness" check is eagerly performed. It should be possible to return a live connection & refresh stale ones in background? [*ClientPool.getConnection(Connection conn)*] -- Ravi On Sat, Dec 10, 2016 at 3:44 PM, Ravikumar Govindarajan < [email protected]> wrote: > Often, I find myself bang in the middle of a query, when BlurClientManager > comes up with this error. Happens both ways. When my app-server talks to > controller-server as well as controller-server talks to shard-server. This > is affecting search experience quite a bit nowadays in production!! > > BlurException(message:Unknown error during remote call to node > [AAA.BB.CCC.DD:40020], > stackTraceStr:org.apache.blur.thrift.BadConnectionException: > Could not connect to controller/shard server. All connections are bad. at > org.apache.blur.thrift.BlurClientManager.execute(BlurClientManager.java:243) > at > org.apache.blur.thrift.BlurClientManager.execute(BlurClientManager.java:314) > at > org.apache.blur.thrift.BlurControllerServer$BlurClientRemote$1.call(BlurControllerServer.java:132) > at org.apache.blur.thrift.BlurControllerServer$BlurClientRemote.execute( > BlurControllerServer.java:139) > > When do we get such an Exception? In-correct timeout settings or > shard-server restarts etc... > > Any help is much appreciated > > -- > Ravi >
