Re: proton Messenger error handling/recovery REQUEST FEEDBACK!

Alan Conway Tue, 09 Sep 2014 08:23:32 -0700

On Tue, 2014-09-09 at 08:34 -0400, Ken Giusti wrote:
> I'm also interested; personally never felt comfortable with the lack of 
> visibility regarding things like connection failures that Messenger's api 
> currently provides.
> 
> Tangentially related, perhaps - I'd like to see errors reported via the event 
> collector interface.  While my issue is engine related, perhaps Messenger 
> should provide applications access to the event "bus"?
> 
> I've opened a bug against the engine event model to include errors (at least 
> for the transport/connection objects):
> 
> https://issues.apache.org/jira/browse/PROTON-656
>


See my previous mail - I'm thinking of moving to engine and helping to
make that API easier to use rather putting the effort into Messenger.
See https://github.com/grs/examples/blob/master/proton_utils.py

The issue in Messenger is that it aims to hide connections and let the
user think about messages - that is a laudable goal. The problem is that
it still needs to report errors in terms that make sense to the API.
"connection X broke" doesn't make sense because messenger offers no
notion of connection X. Things that would make sense are "Message X will
never be delivered because of network problems" and "Subscription Y will
never receive messages because of network problems". The trick is where
and when to report these conditions. On message trackers? Exceptions on
put/send? Exceptions on get/recv? Some other event source?

Of course it would be great if Messenger did transparent failover for
you, and it can in the future - but you still need error notification.
Failover can itself fail - all nodes in the cluster are down, the local
network connection is dead etc.

None of these problems are unsolvable, but there's some work to be done.
Given that engine is already a more complete and flexible API (in the
sense of offering full low-level access to the entire AMQP protocol),
and that people are demonstrating that it can be made easier to use by
layering tools on top, perhaps that is where we should focus our efforts
rather than splitting them over two APIs.

> -K
> 
> ----- Original Message -----
> > From: "Fraser Adams" <fraser.ad...@blueyonder.co.uk>
> > To: users@qpid.apache.org
> > Sent: Monday, September 8, 2014 2:07:23 PM
> > Subject: Re: proton Messenger error handling/recovery REQUEST FEEDBACK!
> > 
> > Messenger gurus seem to be keeping their heads down a bit.
> > 
> > Is it *really* just Alan and I who are interested to understand the
> > error handling/reconnection behaviour of Messenger?
> > 
> > Is anybody using it in "industrial strength" applications or is it just
> > being used in quick and dirty demos? Without error handling and
> > reconnection mechanisms I'm struggling to see how it can be used for the
> > former.
> > 
> > I can likely hack things and Alan also mentioned that he "cheats", but
> > I'd really like to know from people who really understand messenger how
> > to do it *properly*.
> > 
> > Frase
> > 
> > 
> > On 05/09/14 14:17, Alan Conway wrote:
> > > On Thu, 2014-09-04 at 18:28 +0100, Fraser Adams wrote:
> > >> On 03/09/14 23:29, Alan Conway wrote:
> > >>> On Wed, 2014-09-03 at 20:05 +0100, Fraser Adams wrote:
> > >>>> Hello,
> > >>>> I've probably missed something, but I don't know how to reliably detect
> > >>>> failures and reconnect.
> > >>>>
> > >>>> So if I sent to an address with a freshly stood up Messenger instance
> > >>>> and the address can't be found things aren't too bad and I wind up with
> > >>>> an ECONNREFUSED that I could do something with, however if I've been
> > >>>> sending messages to a valid address then I kill off the consumer I see
> > >>>> a:
> > >>>>
> > >>>> [0x513380]:ERROR amqp:connection:framing-error connection aborted
> > >>>> [0x513380]:ERROR[-2] connection aborted
> > >>>>
> > >>>> CONNECTION ERROR connection aborted (remote)
> > >>>>
> > >>>> The thing is that all of these are *internally* generated messages sent
> > >>>> to the console via fprintf, so my *application* doesn't really know
> > >>>> about them (though I could be crafty and interpose my own cheeky 
> > >>>> fprintf
> > >>>> to intercept them). That doesn't quite sound like the desired behaviour
> > >>>> for a robust system?
> > >>>>
> > >>>>
> > >>>> Similarly should I actually trap an error what's the correct way to
> > >>>> continue, as it happens currently my app carries on silently doing
> > >>>> nothing useful and continuing to do so even when the peer restarts (so
> > >>>> there is no magic internal reconnection logic as far as I can see).
> > >>>>
> > >>>> do I have to do a
> > >>>> messenger.stop()
> > >>>> messenger.start()
> > >>>>
> > >>>> cycle to get things going again, I'm guessing so, but I'll like to know
> > >>>> what the "correct"/expected way to create Messenger code that is robust
> > >>>> against remote failures, as far as I can see there are no examples of
> > >>>> that sort of thing?
> > >>> I've come up against similar problems, I think it's an area that needs
> > >>> some work in Proton. Is anybody already working on/thinking about this
> > >>> area?
> > >>>
> > >>> Cheers,
> > >>> Alan.
> > >>>
> > >> I'd definitely like to know how others deal with this sort of thing.
> > > I cheat. I've been using proton in dispatch system tests, I come up
> > > against these issues when I start up some proton/dispatch network and
> > > try to use it too quickly before things have settled down. I have some
> > > tweaks in my test harness to wait till things are ready so there are no
> > > errors :) That's not a solution for general non-test situations -
> > > although knowing how to wait till things are ready is always useful.
> > >
> > > https://svn.apache.org/repos/asf/qpid/dispatch/trunk/tests/system_test.py
> > >
> > > class Messenger adds a "flush" method that pumps the Messenger event
> > > loop till there is no more work to do. Otherwise subscribe() in
> > > particular gives no way to tell when the subscription is active.
> > >
> > > Note: My situation is a bit special in that dispatch creates addresses
> > > dynamically on subscribe and my tests involve slow stuff like waypoints
> > > to brokers etc. That introduces a delay in subscribe that probably isn't
> > > visible when the address is created beforehand.
> > >
> > > There's also Qpidd.wait_ready and Qdrouterd.wait_ready that wait for
> > > qpidd and dispatch router to be ready respectively so I can be sure that
> > > when I connect with proton they'll be listening. Those wait for the
> > > expected listening ports to be connectable and in the case of dispatch
> > > also does a qmf check to make sure that all expected outgoing connectors
> > > are there.
> > >
> > >> For info notwithstanding not necessarily being able to trap all the
> > >> errors without being devious around fprintf  (which to be fair works,
> > >> but it's a bit sneaky and if you have multiple Messenger instances won't
> > >> tell you which one the error relates to) but when I do get an error I
> > >> appear to have to start from scratch - in other words:
> > >>
> > >> message.free();
> > >> messenger.free();
> > >> message = new proton.Message();
> > >> messenger = new proton.Messenger();
> > >> messenger.start();
> > >>
> > >> If I try to restart the original messenger or use existing queue I get
> > >> no joy. It's not the end of the world but I've no idea what robust
> > >> Messenger code is *supposed* to look like.
> > >>
> > >> Presumably Alan and I aren't the only people who might like to be able
> > >> to trap errors and restart? Or does every one else write code that never
> > >> fails ;->
> > > I always wondered how everybody but me can do that. Sigh. For you and me
> > > I think we need to do some work on proton's error handling.
> > >
> > > - proton (or any library!) should NEVER EVER write anything direct to
> > > stdout or stderr. It needs a (very simple) logging facility that can
> > > write to stderr by default but can be redirected elsewhere.
> > > - proton should never log an error without also returning some useful
> > > error condition to the application.
> > >
> > > Proton has some useful pn_error_* functions, they just need to be used
> > > more widely. In dispatch I introduced an errno-style thread-local error
> > > code/message (in proton it would be a pn_error_t*) That allows sensible
> > > error messages out of functions that want to return something else (e.g.
> > > pointer or null and set the thread error) It also allows you to work
> > > around lazy error handling (temporarily of course (hahahaha)) - a caller
> > > couple of stack frames up can detect an error even if intermediate
> > > functions didn't check & propagate errors properly. I'm not advocating
> > > lazy error checking but in C it is hard to get everything.
> > >
> > > FEEDBACK PLEASE: anyone think this is a great/horrible idea? Does proton
> > > already do things I've missed that would make this unnecessary?
> > >
> > > Cheers,
> > > Alan.
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
> > > For additional commands, e-mail: users-h...@qpid.apache.org
> > >
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
> > For additional commands, e-mail: users-h...@qpid.apache.org
> > 
> > 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org

Re: proton Messenger error handling/recovery REQUEST FEEDBACK!

Reply via email to