[ 
https://issues.apache.org/jira/browse/PROTON-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua Seagrave updated PROTON-2933:
------------------------------------
    Attachment: main.cpp

> Windows/SChannel: AMQP 1.0 SASL auth-failure lost in handshake
> --------------------------------------------------------------
>
>                 Key: PROTON-2933
>                 URL: https://issues.apache.org/jira/browse/PROTON-2933
>             Project: Qpid Proton
>          Issue Type: Bug
>          Components: cpp-binding
>    Affects Versions: proton-c-0.40.0
>         Environment: OS: Windows 11 (x64)
> Proton: qpid-proton from vcpkg
> Compiler: MSVC
> Broker: RabbitMQ 4.3.0 with native AMQP 1.0
>            Reporter: Joshua Seagrave
>            Priority: Major
>         Attachments: CMakeLists.txt, main.cpp, vcpkg.json
>
>
> On Windows, when Proton C++ is built against the SChannel TLS backend and 
> connects to an AMQP 1.0 broker over {{amqps://}} with invalid PLAIN 
> credentials, the broker's auth-failure response is silently lost. No 
> {{messaging_handler}} callbacks are dispatched while the container is 
> running. The events only flush when {{container::stop()}} is forced, and even 
> then {{on_transport_close}} arrives with an empty {{error_condition}} — the 
> {{amqp:unauthorized-access}} was discarded somewhere in the teardown path.
> The same scenario behaves correctly under the OpenSSL backend (Proton-Python 
> on Windows), strongly suggesting the bug is in the SChannel binding's 
> handling of the close-immediately-after-{{{}sasl-outcome{}}} race.
> The application-visible consequence is severe: a plugin/service can't tell 
> that authentication failed. It just hangs, with no events to act on, no way 
> to surface the error to the user, no way to trigger a credential refresh.
>  
> *Reproducer*
> A minimal standalone reproducer is attached. It traces every 
> {{messaging_handler}} callback at top-of-function (so swallowed exceptions in 
> error-condition accessors can't be confused for "callback didn't fire") and 
> includes a 30-second watchdog that forces {{container::stop()}} if no 
> terminal callback has arrived.
> *Steps to reproduce*
>  # Build qpid-proton-cpp on Windows with the SChannel backend (the vcpkg 
> default).
>  # Stand up an AMQP 1.0 broker that rejects bad PLAIN credentials and closes 
> the TCP socket immediately after {{sasl-outcome}} (RabbitMQ 4.x is one such 
> broker).
>  # Add the appropriate credentials to {{main.cpp}} (lines 171-173)
>  # Run the reproducer pointed at that broker with credentials known to be 
> invalid.
> *Expected behaviour*
> {{on_transport_error}} (and/or {{{}on_transport_close{}}}) fires with 
> {{error_condition.name() == "amqp:unauthorized-access"}} and a description 
> along the lines of {{{}Authentication failed [mech=PLAIN]{}}}. The 
> application can read the condition synchronously inside the callback. The 
> container returns from {{run()}} shortly afterwards. This is the behaviour 
> observed under Proton-Python on Windows.
>  
> *Sample output:*
> {code:java}
> [..] connect() issued: amqps://...:5671
> [..] on_transport_error: amqp:unauthorized-access - Authentication failed 
> [mech=PLAIN]
> [..] [transport_error] SASL outcome=1 user='valid-user' mech='PLAIN'
> [..] on_disconnected
> Total elapsed: ~3 seconds.
> {code}
>  
> *Observed behaviour (the bug)*
> Against the same broker with the same credentials, on Windows + SChannel:
> {code:java}
> [..] === Proton C++ SChannel SASL race reproducer ===
> [..] connecting: amqps://...:5671
> [..] on_container_start
> <-- 30 seconds of total silence: no callbacks fire -->
> [..] WATCHDOG: 30s elapsed with no terminal callback; forcing container.stop()
> [..] on_transport_open
> [..] on_transport_close
> [..] cond.name='' cond.desc=''
> [..] sasl.outcome=1 sasl.user='valid-user' sasl.mech='PLAIN'
> [..] === container.run() returned ==={code}
> Note specifically:
>  - {{on_transport_error}} is never fired.
>  - {{on_transport_open}} and {{on_transport_close}} only arrive in the flush 
> triggered by the watchdog's {{container::stop()}} call. Without an external 
> mechanism forcing the stop, an application sits silently forever.
>  - {{{}on_transport_close{}}}'s {{error_condition}} is empty. The 
> {{amqp:unauthorized-access}} condition the broker sent was not propagated 
> onto the transport before the close-handling path tore it down.
>  - The SASL accessor ({{{}transport.sasl(){}}}) does still report the outcome 
> correctly when queried in {{{}on_transport_close{}}}. So the SASL state is 
> preserved internally; it just isn't surfaced as an {{error_condition}} on the 
> transport, and no error event is dispatched.
>  
> Broker-side log (RabbitMQ):
> {code:java}
> [error] closing AMQP connection (...) (duration: '6s'):
> [error] {handshake_error,waiting_sasl_init,
> [error] {'v1_0.error',{symbol,<<"amqp:unauthorized-access">>},
> [error] {utf8,<<"PLAIN login refused: user 'valid-user' - invalid 
> credentials">>}, [error] undefined}}{code}
> So the auth failure is unambiguously reaching the client over the wire. The 
> bytes are processed by Proton (the SASL accessor reflects them after the 
> eventual flush). They simply do not produce a {{messaging_handler}} event in 
> the SChannel-backed configuration.
> *Probable mechanism*
> AMQP 1.0 lets a server send {{sasl-outcome}} and immediately close the TCP 
> connection without waiting for the client to ack as the spec doesn't require 
> a hand-back. RabbitMQ does this. The {{sasl-outcome}} frame and the TCP FIN 
> can therefore arrive at the client in the same OS-level read. Clients have to 
> handle this gracefully: the outcome event must be dispatched even though the 
> transport is simultaneously transitioning to closed.
> The OpenSSL-backed code path appears to handle this race correctly. The 
> SChannel-backed path appears to lose the outcome event during the 
> near-simultaneous teardown - the transport's {{error_condition}} is never 
> populated with {{{}amqp:unauthorized-access{}}}, no {{transport_error}} event 
> is queued, and the eventual {{transport_close}} carries an empty condition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to