Thanks Aaron for the clarification On Sun, Dec 11, 2016 at 8:37 PM, Aaron McCurry <[email protected]> wrote:
> I believe this timer does in fact test the pooled client connections. I my > experience the all connections bad exception usually occurs when a shard > server is no responding in a timely manor. It could be GCing or blocking > from HDFS or some other unknown problem. > > Timer: > > https://github.com/apache/incubator-blur/blob/master/ > blur-thrift/src/main/java/org/apache/blur/thrift/ClientPool.java#L98 > > Also there is a test method that will test connections before their use. > > https://github.com/apache/incubator-blur/blob/master/ > blur-thrift/src/main/java/org/apache/blur/thrift/ClientPool.java#L299 > > Hope this helps. > > Aaron > > > > On Sat, Dec 10, 2016 at 5:56 AM, Ravikumar Govindarajan < > [email protected]> wrote: > > > Just now tried to understand the logic... > > > > Whenever an IOException/TTransportException is thrown, we mark a > > Connection > > as bad. Slowly when all Connections are greeted by this, we get "All > > Connections Bad..." > > > > Is it a good idea to write a reaper thread to proactively try & replenish > > the bad Connection, instead of waiting for search to hit it at the wrong > > moment? > > > > Also, I just found that "staleness" check is eagerly performed. It should > > be possible to return a live connection & refresh stale ones in > background? > > [*ClientPool.getConnection(Connection conn)*] > > > > -- > > Ravi > > > > > > > > On Sat, Dec 10, 2016 at 3:44 PM, Ravikumar Govindarajan < > > [email protected]> wrote: > > > > > Often, I find myself bang in the middle of a query, when > > BlurClientManager > > > comes up with this error. Happens both ways. When my app-server talks > to > > > controller-server as well as controller-server talks to shard-server. > > This > > > is affecting search experience quite a bit nowadays in production!! > > > > > > BlurException(message:Unknown error during remote call to node > > > [AAA.BB.CCC.DD:40020], stackTraceStr:org.apache.blur. > > thrift.BadConnectionException: > > > Could not connect to controller/shard server. All connections are bad. > at > > > org.apache.blur.thrift.BlurClientManager.execute( > > BlurClientManager.java:243) > > > at org.apache.blur.thrift.BlurClientManager.execute( > > BlurClientManager.java:314) > > > at org.apache.blur.thrift.BlurControllerServer$ > BlurClientRemote$1.call( > > BlurControllerServer.java:132) > > > at org.apache.blur.thrift.BlurControllerServer$ > BlurClientRemote.execute( > > > BlurControllerServer.java:139) > > > > > > When do we get such an Exception? In-correct timeout settings or > > > shard-server restarts etc... > > > > > > Any help is much appreciated > > > > > > -- > > > Ravi > > > > > >
