On 25 July 2014 12:05, Gustavo Niemeyer <gustavo.nieme...@canonical.com> wrote:
> On Fri, Jul 25, 2014 at 1:02 AM, Ian Booth <ian.bo...@canonical.com> wrote:
>> We've transitioned to using Session.Copy() to address the situation whereby 
>> Juju
>> would create a mongo collection instance and then continue to make db calls
>> against that collection without realising the underlying socket may have 
>> become
>> disconnected. This resulted in Juju components failing, logging "i/o timeout"
>> errors talking to mongo, even though mongo itself was still up and running.
>
> Sounds sane, as I indicated in previous discussions about the topic in
> these last two weeks and also about a year ago when we covered that.
> Serializing every single request to a concurrent server via a single
> database connection seems like a pretty bad idea for anything but
> simplistic servers.
>
>> As an aside - I'm wondering whether the mgo driver shouldn't transparently 
>> catch
>> an i/o error associated with a dead socket and retry using a fresh connection
>> rather than imposing that responsibility on the caller?
>
> The evidence so far indicates that this will likely not happen. The
> current design was purposefully put in place so that harsh connection
> errors are not swept under the rug, and this seems to be working well
> so far. I'd rather not have juju proceeding over a harsh problem such
> as a master re-election midway through the execution of an algorithm
> without any indication that the failure has happened, let alone
> silently retry operations that in most cases are not idempotent.
>
> That said, the goal is of course not to make the developer's life
> miserable. All the driver wants is an acknowledgement that the error
> was perceived and taken care of. This is done trivially by calling:
>
>     session.Refresh()
>
> Done. The driver will happily drop the error notice, and proceed with
> further operations, blocking if waiting for a re-election to take
> place is necessary.

The bug Ian cites and is trying to work around has sessions failing
with an i/o error after some time (I'm guessing resource starvation in
MongoDB or TCP networking issues). session.Copy() is pulling things
from a pool, so it might be handing out sessions doomed to fail with
exactly the same issue. The connections in the pool could even be
perfectly functional when they went in, with no way at the go level of
knowing they have failed without trying them.

If this is the case, then Ian would need to handle the failure by
ensuring the failed connection does not go back in the pool and
grabbing a new one (the defered Close() will return it I think). And
repeating until it works, or until the pool has been exhausted and we
know Mongo is actually down rather than just having a polluted pool.

-- 
Stuart Bishop <stuart.bis...@canonical.com>

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Reply via email to