Joshua Seagrave created PROTON-2933:
---------------------------------------
Summary: Windows/SChannel: AMQP 1.0 SASL auth-failure lost in
handshake
Key: PROTON-2933
URL: https://issues.apache.org/jira/browse/PROTON-2933
Project: Qpid Proton
Issue Type: Bug
Components: cpp-binding
Affects Versions: proton-c-0.40.0
Environment: OS: Windows 11 (x64)
Proton: qpid-proton from vcpkg
Compiler: MSVC
Broker: RabbitMQ 4.3.0 with native AMQP 1.0
Reporter: Joshua Seagrave
Attachments: CMakeLists.txt, main.cpp, vcpkg.json
On Windows, when Proton C++ is built against the SChannel TLS backend and
connects to an AMQP 1.0 broker over {{amqps://}} with invalid PLAIN
credentials, the broker's auth-failure response is silently lost. No
{{messaging_handler}} callbacks are dispatched while the container is running.
The events only flush when {{container::stop()}} is forced, and even then
{{on_transport_close}} arrives with an empty {{error_condition}} — the
{{amqp:unauthorized-access}} was discarded somewhere in the teardown path.
The same scenario behaves correctly under the OpenSSL backend (Proton-Python on
Windows), strongly suggesting the bug is in the SChannel binding's handling of
the close-immediately-after-{{{}sasl-outcome{}}} race.
The application-visible consequence is severe: a plugin/service can't tell that
authentication failed. It just hangs, with no events to act on, no way to
surface the error to the user, no way to trigger a credential refresh.
*Reproducer*
A minimal standalone reproducer is attached. It traces every
{{messaging_handler}} callback at top-of-function (so swallowed exceptions in
error-condition accessors can't be confused for "callback didn't fire") and
includes a 30-second watchdog that forces {{container::stop()}} if no terminal
callback has arrived.
*Steps to reproduce*
# Build qpid-proton-cpp on Windows with the SChannel backend (the vcpkg
default).
# Stand up an AMQP 1.0 broker that rejects bad PLAIN credentials and closes
the TCP socket immediately after {{sasl-outcome}} (RabbitMQ 4.x is one such
broker).
# Add the appropriate credentials to {{main.cpp}} (lines 171-173)
# Run the reproducer pointed at that broker with credentials known to be
invalid.
*Expected behaviour*
{{on_transport_error}} (and/or {{{}on_transport_close{}}}) fires with
{{error_condition.name() == "amqp:unauthorized-access"}} and a description
along the lines of {{{}Authentication failed [mech=PLAIN]{}}}. The application
can read the condition synchronously inside the callback. The container returns
from {{run()}} shortly afterwards. This is the behaviour observed under
Proton-Python on Windows.
*Sample output:*
{code:java}
[..] connect() issued: amqps://...:5671
[..] on_transport_error: amqp:unauthorized-access - Authentication failed
[mech=PLAIN]
[..] [transport_error] SASL outcome=1 user='047905' mech='PLAIN'
[..] on_disconnected
Total elapsed: ~3 seconds.
{code}
*Observed behaviour (the bug)*
Against the same broker with the same credentials, on Windows + SChannel:
{code:java}
[..] === Proton C++ SChannel SASL race reproducer ===
[..] connecting: amqps://...:5671
[..] on_container_start
<-- 30 seconds of total silence: no callbacks fire -->
[..] WATCHDOG: 30s elapsed with no terminal callback; forcing container.stop()
[..] on_transport_open
[..] on_transport_close
[..] cond.name='' cond.desc=''
[..] sasl.outcome=1 sasl.user='valid-user' sasl.mech='PLAIN'
[..] === container.run() returned ==={code}
Note specifically:
- {{on_transport_error}} is never fired.
- {{on_transport_open}} and {{on_transport_close}} only arrive in the flush
triggered by the watchdog's {{container::stop()}} call. Without an external
mechanism forcing the stop, an application sits silently forever.
- {{{}on_transport_close{}}}'s {{error_condition}} is empty. The
{{amqp:unauthorized-access}} condition the broker sent was not propagated onto
the transport before the close-handling path tore it down.
- The SASL accessor ({{{}transport.sasl(){}}}) does still report the outcome
correctly when queried in {{{}on_transport_close{}}}. So the SASL state is
preserved internally; it just isn't surfaced as an {{error_condition}} on the
transport, and no error event is dispatched.
Broker-side log (RabbitMQ):
{code:java}
[error] closing AMQP connection (...) (duration: '6s'):
[error] {handshake_error,waiting_sasl_init,
[error] {'v1_0.error',{symbol,<<"amqp:unauthorized-access">>},
[error] {utf8,<<"PLAIN login refused: user '047905' - invalid credentials">>},
[error] undefined}}{code}
So the auth failure is unambiguously reaching the client over the wire. The
bytes are processed by Proton (the SASL accessor reflects them after the
eventual flush). They simply do not produce a {{messaging_handler}} event in
the SChannel-backed configuration.
*Probable mechanism*
AMQP 1.0 lets a server send {{sasl-outcome}} and immediately close the TCP
connection without waiting for the client to ack as the spec doesn't require a
hand-back. RabbitMQ does this. The {{sasl-outcome}} frame and the TCP FIN can
therefore arrive at the client in the same OS-level read. Clients have to
handle this gracefully: the outcome event must be dispatched even though the
transport is simultaneously transitioning to closed.
The OpenSSL-backed code path appears to handle this race correctly. The
SChannel-backed path appears to lose the outcome event during the
near-simultaneous teardown - the transport's {{error_condition}} is never
populated with {{{}amqp:unauthorized-access{}}}, no {{transport_error}} event
is queued, and the eventual {{transport_close}} carries an empty condition.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]