Joshua Seagrave created PROTON-2933:
---------------------------------------

             Summary: Windows/SChannel: AMQP 1.0 SASL auth-failure lost in 
handshake
                 Key: PROTON-2933
                 URL: https://issues.apache.org/jira/browse/PROTON-2933
             Project: Qpid Proton
          Issue Type: Bug
          Components: cpp-binding
    Affects Versions: proton-c-0.40.0
         Environment: OS: Windows 11 (x64)
Proton: qpid-proton from vcpkg
Compiler: MSVC
Broker: RabbitMQ 4.3.0 with native AMQP 1.0
            Reporter: Joshua Seagrave
         Attachments: CMakeLists.txt, main.cpp, vcpkg.json

On Windows, when Proton C++ is built against the SChannel TLS backend and 
connects to an AMQP 1.0 broker over {{amqps://}} with invalid PLAIN 
credentials, the broker's auth-failure response is silently lost. No 
{{messaging_handler}} callbacks are dispatched while the container is running. 
The events only flush when {{container::stop()}} is forced, and even then 
{{on_transport_close}} arrives with an empty {{error_condition}} — the 
{{amqp:unauthorized-access}} was discarded somewhere in the teardown path.

The same scenario behaves correctly under the OpenSSL backend (Proton-Python on 
Windows), strongly suggesting the bug is in the SChannel binding's handling of 
the close-immediately-after-{{{}sasl-outcome{}}} race.

The application-visible consequence is severe: a plugin/service can't tell that 
authentication failed. It just hangs, with no events to act on, no way to 
surface the error to the user, no way to trigger a credential refresh.

 

*Reproducer*

A minimal standalone reproducer is attached. It traces every 
{{messaging_handler}} callback at top-of-function (so swallowed exceptions in 
error-condition accessors can't be confused for "callback didn't fire") and 
includes a 30-second watchdog that forces {{container::stop()}} if no terminal 
callback has arrived.

*Steps to reproduce*
 # Build qpid-proton-cpp on Windows with the SChannel backend (the vcpkg 
default).
 # Stand up an AMQP 1.0 broker that rejects bad PLAIN credentials and closes 
the TCP socket immediately after {{sasl-outcome}} (RabbitMQ 4.x is one such 
broker).
 # Add the appropriate credentials to {{main.cpp}} (lines 171-173)
 # Run the reproducer pointed at that broker with credentials known to be 
invalid.

*Expected behaviour*

{{on_transport_error}} (and/or {{{}on_transport_close{}}}) fires with 
{{error_condition.name() == "amqp:unauthorized-access"}} and a description 
along the lines of {{{}Authentication failed [mech=PLAIN]{}}}. The application 
can read the condition synchronously inside the callback. The container returns 
from {{run()}} shortly afterwards. This is the behaviour observed under 
Proton-Python on Windows.

 

*Sample output:*
{code:java}
[..] connect() issued: amqps://...:5671
[..] on_transport_error: amqp:unauthorized-access - Authentication failed 
[mech=PLAIN]
[..] [transport_error] SASL outcome=1 user='047905' mech='PLAIN'
[..] on_disconnected

Total elapsed: ~3 seconds.
{code}
 

*Observed behaviour (the bug)*

Against the same broker with the same credentials, on Windows + SChannel:
{code:java}
[..] === Proton C++ SChannel SASL race reproducer ===
[..] connecting: amqps://...:5671
[..] on_container_start
<-- 30 seconds of total silence: no callbacks fire -->
[..] WATCHDOG: 30s elapsed with no terminal callback; forcing container.stop()
[..] on_transport_open
[..] on_transport_close
[..] cond.name='' cond.desc=''
[..] sasl.outcome=1 sasl.user='valid-user' sasl.mech='PLAIN'
[..] === container.run() returned ==={code}
Note specifically:
 - {{on_transport_error}} is never fired.
 - {{on_transport_open}} and {{on_transport_close}} only arrive in the flush 
triggered by the watchdog's {{container::stop()}} call. Without an external 
mechanism forcing the stop, an application sits silently forever.
 - {{{}on_transport_close{}}}'s {{error_condition}} is empty. The 
{{amqp:unauthorized-access}} condition the broker sent was not propagated onto 
the transport before the close-handling path tore it down.
 - The SASL accessor ({{{}transport.sasl(){}}}) does still report the outcome 
correctly when queried in {{{}on_transport_close{}}}. So the SASL state is 
preserved internally; it just isn't surfaced as an {{error_condition}} on the 
transport, and no error event is dispatched.

 

Broker-side log (RabbitMQ):
{code:java}
[error] closing AMQP connection (...) (duration: '6s'):
[error] {handshake_error,waiting_sasl_init,
[error] {'v1_0.error',{symbol,<<"amqp:unauthorized-access">>},
[error] {utf8,<<"PLAIN login refused: user '047905' - invalid credentials">>},
[error] undefined}}{code}
So the auth failure is unambiguously reaching the client over the wire. The 
bytes are processed by Proton (the SASL accessor reflects them after the 
eventual flush). They simply do not produce a {{messaging_handler}} event in 
the SChannel-backed configuration.

*Probable mechanism*

AMQP 1.0 lets a server send {{sasl-outcome}} and immediately close the TCP 
connection without waiting for the client to ack as the spec doesn't require a 
hand-back. RabbitMQ does this. The {{sasl-outcome}} frame and the TCP FIN can 
therefore arrive at the client in the same OS-level read. Clients have to 
handle this gracefully: the outcome event must be dispatched even though the 
transport is simultaneously transitioning to closed.

The OpenSSL-backed code path appears to handle this race correctly. The 
SChannel-backed path appears to lose the outcome event during the 
near-simultaneous teardown - the transport's {{error_condition}} is never 
populated with {{{}amqp:unauthorized-access{}}}, no {{transport_error}} event 
is queued, and the eventual {{transport_close}} carries an empty condition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to