On 09/09/2014 10:59 AM, Alan Conway wrote:
> On Mon, 2014-09-08 at 19:07 +0100, Fraser Adams wrote:
>> Messenger gurus seem to be keeping their heads down a bit.
>> Is it *really* just Alan and I who are interested to understand the 
>> error handling/reconnection behaviour of Messenger?
>> Is anybody using it in "industrial strength" applications or is it just 
>> being used in quick and dirty demos? Without error handling and 
>> reconnection mechanisms I'm struggling to see how it can be used for the 
>> former.
>> I can likely hack things and Alan also mentioned that he "cheats", but 
>> I'd really like to know from people who really understand messenger how 
>> to do it *properly*.
> I've been looking at this and error handling in Messenger is not just a
> matter of fixing implementation, there are some pretty big API questions
> to be answered about when and how you can report errors. Its not
> unfixable but I'm starting to think about moving away from Messenger and
> towards using the proton Engine API.
> The original tradeoff was that engine is more complete and flexible but
> harder to use, whereas Messenger is easy but not as complete/flexible.
> However if you look at the toolkit & examples at
>  https://github.com/grs/examples

> it makes engine a lot more appealing. The idea is to provide blocks of
> "normal default" behavior in a toolkit to get going quickly (and to keep
> you going for many/most uses) but allow those to be modified or replaced
> as things get more complex. The nice thing about this is that you know
> you can peel back the toolkit if you need to and get full access to the
> proton event machine, so anything proton knows you can react to.
> If we can make the engine API approachable enough for general messaging
> use (while keeping it powerful enough for integration use) then it might
> make more sense to focus on doing that than on maintaining two different
> APIs for proton.
> Cheers,
> Alan.
>> Frase
>> On 05/09/14 14:17, Alan Conway wrote:
>>> On Thu, 2014-09-04 at 18:28 +0100, Fraser Adams wrote:
>>>> On 03/09/14 23:29, Alan Conway wrote:
>>>>> On Wed, 2014-09-03 at 20:05 +0100, Fraser Adams wrote:
>>>>>> Hello,
>>>>>> I've probably missed something, but I don't know how to reliably detect
>>>>>> failures and reconnect.
>>>>>> So if I sent to an address with a freshly stood up Messenger instance
>>>>>> and the address can't be found things aren't too bad and I wind up with
>>>>>> an ECONNREFUSED that I could do something with, however if I've been
>>>>>> sending messages to a valid address then I kill off the consumer I see a:
>>>>>> [0x513380]:ERROR amqp:connection:framing-error connection aborted
>>>>>> [0x513380]:ERROR[-2] connection aborted
>>>>>> CONNECTION ERROR connection aborted (remote)
>>>>>> The thing is that all of these are *internally* generated messages sent
>>>>>> to the console via fprintf, so my *application* doesn't really know
>>>>>> about them (though I could be crafty and interpose my own cheeky fprintf
>>>>>> to intercept them). That doesn't quite sound like the desired behaviour
>>>>>> for a robust system?
>>>>>> Similarly should I actually trap an error what's the correct way to
>>>>>> continue, as it happens currently my app carries on silently doing
>>>>>> nothing useful and continuing to do so even when the peer restarts (so
>>>>>> there is no magic internal reconnection logic as far as I can see).
>>>>>> do I have to do a
>>>>>> messenger.stop()
>>>>>> messenger.start()
>>>>>> cycle to get things going again, I'm guessing so, but I'll like to know
>>>>>> what the "correct"/expected way to create Messenger code that is robust
>>>>>> against remote failures, as far as I can see there are no examples of
>>>>>> that sort of thing?
>>>>> I've come up against similar problems, I think it's an area that needs
>>>>> some work in Proton. Is anybody already working on/thinking about this
>>>>> area?
>>>>> Cheers,
>>>>> Alan.
>>>> I'd definitely like to know how others deal with this sort of thing.
>>> I cheat. I've been using proton in dispatch system tests, I come up
>>> against these issues when I start up some proton/dispatch network and
>>> try to use it too quickly before things have settled down. I have some
>>> tweaks in my test harness to wait till things are ready so there are no
>>> errors :) That's not a solution for general non-test situations -
>>> although knowing how to wait till things are ready is always useful.
>>> https://svn.apache.org/repos/asf/qpid/dispatch/trunk/tests/system_test.py
>>> class Messenger adds a "flush" method that pumps the Messenger event
>>> loop till there is no more work to do. Otherwise subscribe() in
>>> particular gives no way to tell when the subscription is active.
>>> Note: My situation is a bit special in that dispatch creates addresses
>>> dynamically on subscribe and my tests involve slow stuff like waypoints
>>> to brokers etc. That introduces a delay in subscribe that probably isn't
>>> visible when the address is created beforehand.
>>> There's also Qpidd.wait_ready and Qdrouterd.wait_ready that wait for
>>> qpidd and dispatch router to be ready respectively so I can be sure that
>>> when I connect with proton they'll be listening. Those wait for the
>>> expected listening ports to be connectable and in the case of dispatch
>>> also does a qmf check to make sure that all expected outgoing connectors
>>> are there.          
>>>> For info notwithstanding not necessarily being able to trap all the
>>>> errors without being devious around fprintf  (which to be fair works,
>>>> but it's a bit sneaky and if you have multiple Messenger instances won't
>>>> tell you which one the error relates to) but when I do get an error I
>>>> appear to have to start from scratch - in other words:
>>>> message.free();
>>>> messenger.free();
>>>> message = new proton.Message();
>>>> messenger = new proton.Messenger();
>>>> messenger.start();
>>>> If I try to restart the original messenger or use existing queue I get
>>>> no joy. It's not the end of the world but I've no idea what robust
>>>> Messenger code is *supposed* to look like.
>>>> Presumably Alan and I aren't the only people who might like to be able
>>>> to trap errors and restart? Or does every one else write code that never
>>>> fails ;->
>>> I always wondered how everybody but me can do that. Sigh. For you and me
>>> I think we need to do some work on proton's error handling.
>>> - proton (or any library!) should NEVER EVER write anything direct to
>>> stdout or stderr. It needs a (very simple) logging facility that can
>>> write to stderr by default but can be redirected elsewhere.
>>> - proton should never log an error without also returning some useful
>>> error condition to the application.
>>> Proton has some useful pn_error_* functions, they just need to be used
>>> more widely. In dispatch I introduced an errno-style thread-local error
>>> code/message (in proton it would be a pn_error_t*) That allows sensible
>>> error messages out of functions that want to return something else (e.g.
>>> pointer or null and set the thread error) It also allows you to work
>>> around lazy error handling (temporarily of course (hahahaha)) - a caller
>>> couple of stack frames up can detect an error even if intermediate
>>> functions didn't check & propagate errors properly. I'm not advocating
>>> lazy error checking but in C it is hard to get everything.
>>> FEEDBACK PLEASE: anyone think this is a great/horrible idea? Does proton
>>> already do things I've missed that would make this unnecessary?
>>> Cheers,
>>> Alan.
