In the reality if you can not connect to ZK (and ConnectionLoss is a client 
side error) it either means issues with network on client node itself or issues 
with ZK quorum.  In those situations unless you receive (eventually) "Session 
Expiration" or "Connection reestablished" again you don't know what is going 
on. What probably would be prudent to do is to timeout if after ConnectionLoss 
you do not have anything back from ZK server for time > ZK client timeout (30 
sec. by default I think).
And again it will need to depend on the client - in your example it is a good 
idea to fail in some other cases it may be a good idea to wait (e.g if you deal 
with non-idempotent operations)
      From: Hsuan Yi Chu <hyi...@maprtech.com>
 To: dev@drill.apache.org 
 Sent: Sunday, November 8, 2015 9:36 AM
 Subject: Re: Zookeeper down before query starts/after query finishes
   
I just submitted a pull request to address DRILL-3751, which focuses on the
scenario where query already finishes and zookeeper dies. So Foreman cannot
delete the profiles of running queries in zookeeper.

I think in this case, after a few retries, Foreman can assume Zookeeper is
down. And, this query is assumed to fail since client might not be able to
receive the result (see the behavior in DRILL-3751
<https://issues.apache.org/jira/browse/DRILL-3751>).

Does this make sense?




On Fri, Nov 6, 2015 at 10:43 AM, Hsuan Yi Chu <hyi...@maprtech.com> wrote:

> My understanding is :
> Before query starts/After query finishes, Foreman will put/delete running
> query profiles in zookeeper.
>
> However, if zookeeper is down before the put/delete is successful, Drill
> would be blocked at the put/delete operation.
>
> See https://issues.apache.org/jira/browse/DRILL-3751
>
> I think it is not quite right to let Drill just wait for Zookeeper to
> respond. Does it make sense to use "time-out" here?
>
>
>


  

Reply via email to