[
https://issues.apache.org/jira/browse/PROTON-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joshua Seagrave updated PROTON-2933:
------------------------------------
Description:
On Windows, when Proton C++ is built against the SChannel TLS backend and
connects to an AMQP 1.0 broker over {{amqps://}} with invalid PLAIN
credentials, the broker's auth-failure response is silently lost. No
{{messaging_handler}} callbacks are dispatched while the container is running.
The events only flush when {{container::stop()}} is forced, and even then
{{on_transport_close}} arrives with an empty {{error_condition}} — the
{{amqp:unauthorized-access}} was discarded somewhere in the teardown path.
The same scenario behaves correctly under the OpenSSL backend (Proton-Python on
Windows), strongly suggesting the bug is in the SChannel binding's handling of
the close-immediately-after-{{{}sasl-outcome{}}} race.
The application-visible consequence is severe: a plugin/service can't tell that
authentication failed. It just hangs, with no events to act on, no way to
surface the error to the user, no way to trigger a credential refresh.
*Reproducer*
A minimal standalone reproducer is attached. It traces every
{{messaging_handler}} callback at top-of-function (so swallowed exceptions in
error-condition accessors can't be confused for "callback didn't fire") and
includes a 30-second watchdog that forces {{container::stop()}} if no terminal
callback has arrived.
*Steps to reproduce*
# Build qpid-proton-cpp on Windows with the SChannel backend (the vcpkg
default).
# Stand up an AMQP 1.0 broker that rejects bad PLAIN credentials and closes
the TCP socket immediately after {{sasl-outcome}} (RabbitMQ 4.x is one such
broker).
# Add the appropriate credentials to {{main.cpp}} (lines 171-173)
# Run the reproducer pointed at that broker with credentials known to be
invalid.
*Expected behaviour*
{{on_transport_error}} (and/or {{{}on_transport_close{}}}) fires with
{{error_condition.name() == "amqp:unauthorized-access"}} and a description
along the lines of {{{}Authentication failed [mech=PLAIN]{}}}. The application
can read the condition synchronously inside the callback. The container returns
from {{run()}} shortly afterwards. This is the behaviour observed under
Proton-Python on Windows.
*Sample output:*
{code:java}
[..] connect() issued: amqps://...:5671
[..] on_transport_error: amqp:unauthorized-access - Authentication failed
[mech=PLAIN]
[..] [transport_error] SASL outcome=1 user='valid-user' mech='PLAIN'
[..] on_disconnected
Total elapsed: ~3 seconds.
{code}
*Observed behaviour (the bug)*
Against the same broker with the same credentials, on Windows + SChannel:
{code:java}
[..] === Proton C++ SChannel SASL race reproducer ===
[..] connecting: amqps://...:5671
[..] on_container_start
<-- 30 seconds of total silence: no callbacks fire -->
[..] WATCHDOG: 30s elapsed with no terminal callback; forcing container.stop()
[..] on_transport_open
[..] on_transport_close
[..] cond.name='' cond.desc=''
[..] sasl.outcome=1 sasl.user='valid-user' sasl.mech='PLAIN'
[..] === container.run() returned ==={code}
Note specifically:
- {{on_transport_error}} is never fired.
- {{on_transport_open}} and {{on_transport_close}} only arrive in the flush
triggered by the watchdog's {{container::stop()}} call. Without an external
mechanism forcing the stop, an application sits silently forever.
- {{{}on_transport_close{}}}'s {{error_condition}} is empty. The
{{amqp:unauthorized-access}} condition the broker sent was not propagated onto
the transport before the close-handling path tore it down.
- The SASL accessor ({{{}transport.sasl(){}}}) does still report the outcome
correctly when queried in {{{}on_transport_close{}}}. So the SASL state is
preserved internally; it just isn't surfaced as an {{error_condition}} on the
transport, and no error event is dispatched.
Broker-side log (RabbitMQ):
{code:java}
[error] closing AMQP connection (...) (duration: '6s'):
[error] {handshake_error,waiting_sasl_init,
[error] {'v1_0.error',{symbol,<<"amqp:unauthorized-access">>},
[error] {utf8,<<"PLAIN login refused: user 'valid-user' - invalid
credentials">>}, [error] undefined}}{code}
So the auth failure is unambiguously reaching the client over the wire. The
bytes are processed by Proton (the SASL accessor reflects them after the
eventual flush). They simply do not produce a {{messaging_handler}} event in
the SChannel-backed configuration.
*Probable mechanism*
AMQP 1.0 lets a server send {{sasl-outcome}} and immediately close the TCP
connection without waiting for the client to ack as the spec doesn't require a
hand-back. RabbitMQ does this. The {{sasl-outcome}} frame and the TCP FIN can
therefore arrive at the client in the same OS-level read. Clients have to
handle this gracefully: the outcome event must be dispatched even though the
transport is simultaneously transitioning to closed.
The OpenSSL-backed code path appears to handle this race correctly. The
SChannel-backed path appears to lose the outcome event during the
near-simultaneous teardown - the transport's {{error_condition}} is never
populated with {{{}amqp:unauthorized-access{}}}, no {{transport_error}} event
is queued, and the eventual {{transport_close}} carries an empty condition.
was:
On Windows, when Proton C++ is built against the SChannel TLS backend and
connects to an AMQP 1.0 broker over {{amqps://}} with invalid PLAIN
credentials, the broker's auth-failure response is silently lost. No
{{messaging_handler}} callbacks are dispatched while the container is running.
The events only flush when {{container::stop()}} is forced, and even then
{{on_transport_close}} arrives with an empty {{error_condition}} — the
{{amqp:unauthorized-access}} was discarded somewhere in the teardown path.
The same scenario behaves correctly under the OpenSSL backend (Proton-Python on
Windows), strongly suggesting the bug is in the SChannel binding's handling of
the close-immediately-after-{{{}sasl-outcome{}}} race.
The application-visible consequence is severe: a plugin/service can't tell that
authentication failed. It just hangs, with no events to act on, no way to
surface the error to the user, no way to trigger a credential refresh.
*Reproducer*
A minimal standalone reproducer is attached. It traces every
{{messaging_handler}} callback at top-of-function (so swallowed exceptions in
error-condition accessors can't be confused for "callback didn't fire") and
includes a 30-second watchdog that forces {{container::stop()}} if no terminal
callback has arrived.
*Steps to reproduce*
# Build qpid-proton-cpp on Windows with the SChannel backend (the vcpkg
default).
# Stand up an AMQP 1.0 broker that rejects bad PLAIN credentials and closes
the TCP socket immediately after {{sasl-outcome}} (RabbitMQ 4.x is one such
broker).
# Add the appropriate credentials to {{main.cpp}} (lines 171-173)
# Run the reproducer pointed at that broker with credentials known to be
invalid.
*Expected behaviour*
{{on_transport_error}} (and/or {{{}on_transport_close{}}}) fires with
{{error_condition.name() == "amqp:unauthorized-access"}} and a description
along the lines of {{{}Authentication failed [mech=PLAIN]{}}}. The application
can read the condition synchronously inside the callback. The container returns
from {{run()}} shortly afterwards. This is the behaviour observed under
Proton-Python on Windows.
*Sample output:*
{code:java}
[..] connect() issued: amqps://...:5671
[..] on_transport_error: amqp:unauthorized-access - Authentication failed
[mech=PLAIN]
[..] [transport_error] SASL outcome=1 user='047905' mech='PLAIN'
[..] on_disconnected
Total elapsed: ~3 seconds.
{code}
*Observed behaviour (the bug)*
Against the same broker with the same credentials, on Windows + SChannel:
{code:java}
[..] === Proton C++ SChannel SASL race reproducer ===
[..] connecting: amqps://...:5671
[..] on_container_start
<-- 30 seconds of total silence: no callbacks fire -->
[..] WATCHDOG: 30s elapsed with no terminal callback; forcing container.stop()
[..] on_transport_open
[..] on_transport_close
[..] cond.name='' cond.desc=''
[..] sasl.outcome=1 sasl.user='valid-user' sasl.mech='PLAIN'
[..] === container.run() returned ==={code}
Note specifically:
- {{on_transport_error}} is never fired.
- {{on_transport_open}} and {{on_transport_close}} only arrive in the flush
triggered by the watchdog's {{container::stop()}} call. Without an external
mechanism forcing the stop, an application sits silently forever.
- {{{}on_transport_close{}}}'s {{error_condition}} is empty. The
{{amqp:unauthorized-access}} condition the broker sent was not propagated onto
the transport before the close-handling path tore it down.
- The SASL accessor ({{{}transport.sasl(){}}}) does still report the outcome
correctly when queried in {{{}on_transport_close{}}}. So the SASL state is
preserved internally; it just isn't surfaced as an {{error_condition}} on the
transport, and no error event is dispatched.
Broker-side log (RabbitMQ):
{code:java}
[error] closing AMQP connection (...) (duration: '6s'):
[error] {handshake_error,waiting_sasl_init,
[error] {'v1_0.error',{symbol,<<"amqp:unauthorized-access">>},
[error] {utf8,<<"PLAIN login refused: user '047905' - invalid credentials">>},
[error] undefined}}{code}
So the auth failure is unambiguously reaching the client over the wire. The
bytes are processed by Proton (the SASL accessor reflects them after the
eventual flush). They simply do not produce a {{messaging_handler}} event in
the SChannel-backed configuration.
*Probable mechanism*
AMQP 1.0 lets a server send {{sasl-outcome}} and immediately close the TCP
connection without waiting for the client to ack as the spec doesn't require a
hand-back. RabbitMQ does this. The {{sasl-outcome}} frame and the TCP FIN can
therefore arrive at the client in the same OS-level read. Clients have to
handle this gracefully: the outcome event must be dispatched even though the
transport is simultaneously transitioning to closed.
The OpenSSL-backed code path appears to handle this race correctly. The
SChannel-backed path appears to lose the outcome event during the
near-simultaneous teardown - the transport's {{error_condition}} is never
populated with {{{}amqp:unauthorized-access{}}}, no {{transport_error}} event
is queued, and the eventual {{transport_close}} carries an empty condition.
> Windows/SChannel: AMQP 1.0 SASL auth-failure lost in handshake
> --------------------------------------------------------------
>
> Key: PROTON-2933
> URL: https://issues.apache.org/jira/browse/PROTON-2933
> Project: Qpid Proton
> Issue Type: Bug
> Components: cpp-binding
> Affects Versions: proton-c-0.40.0
> Environment: OS: Windows 11 (x64)
> Proton: qpid-proton from vcpkg
> Compiler: MSVC
> Broker: RabbitMQ 4.3.0 with native AMQP 1.0
> Reporter: Joshua Seagrave
> Priority: Major
> Attachments: CMakeLists.txt, main.cpp, vcpkg.json
>
>
> On Windows, when Proton C++ is built against the SChannel TLS backend and
> connects to an AMQP 1.0 broker over {{amqps://}} with invalid PLAIN
> credentials, the broker's auth-failure response is silently lost. No
> {{messaging_handler}} callbacks are dispatched while the container is
> running. The events only flush when {{container::stop()}} is forced, and even
> then {{on_transport_close}} arrives with an empty {{error_condition}} — the
> {{amqp:unauthorized-access}} was discarded somewhere in the teardown path.
> The same scenario behaves correctly under the OpenSSL backend (Proton-Python
> on Windows), strongly suggesting the bug is in the SChannel binding's
> handling of the close-immediately-after-{{{}sasl-outcome{}}} race.
> The application-visible consequence is severe: a plugin/service can't tell
> that authentication failed. It just hangs, with no events to act on, no way
> to surface the error to the user, no way to trigger a credential refresh.
>
> *Reproducer*
> A minimal standalone reproducer is attached. It traces every
> {{messaging_handler}} callback at top-of-function (so swallowed exceptions in
> error-condition accessors can't be confused for "callback didn't fire") and
> includes a 30-second watchdog that forces {{container::stop()}} if no
> terminal callback has arrived.
> *Steps to reproduce*
> # Build qpid-proton-cpp on Windows with the SChannel backend (the vcpkg
> default).
> # Stand up an AMQP 1.0 broker that rejects bad PLAIN credentials and closes
> the TCP socket immediately after {{sasl-outcome}} (RabbitMQ 4.x is one such
> broker).
> # Add the appropriate credentials to {{main.cpp}} (lines 171-173)
> # Run the reproducer pointed at that broker with credentials known to be
> invalid.
> *Expected behaviour*
> {{on_transport_error}} (and/or {{{}on_transport_close{}}}) fires with
> {{error_condition.name() == "amqp:unauthorized-access"}} and a description
> along the lines of {{{}Authentication failed [mech=PLAIN]{}}}. The
> application can read the condition synchronously inside the callback. The
> container returns from {{run()}} shortly afterwards. This is the behaviour
> observed under Proton-Python on Windows.
>
> *Sample output:*
> {code:java}
> [..] connect() issued: amqps://...:5671
> [..] on_transport_error: amqp:unauthorized-access - Authentication failed
> [mech=PLAIN]
> [..] [transport_error] SASL outcome=1 user='valid-user' mech='PLAIN'
> [..] on_disconnected
> Total elapsed: ~3 seconds.
> {code}
>
> *Observed behaviour (the bug)*
> Against the same broker with the same credentials, on Windows + SChannel:
> {code:java}
> [..] === Proton C++ SChannel SASL race reproducer ===
> [..] connecting: amqps://...:5671
> [..] on_container_start
> <-- 30 seconds of total silence: no callbacks fire -->
> [..] WATCHDOG: 30s elapsed with no terminal callback; forcing container.stop()
> [..] on_transport_open
> [..] on_transport_close
> [..] cond.name='' cond.desc=''
> [..] sasl.outcome=1 sasl.user='valid-user' sasl.mech='PLAIN'
> [..] === container.run() returned ==={code}
> Note specifically:
> - {{on_transport_error}} is never fired.
> - {{on_transport_open}} and {{on_transport_close}} only arrive in the flush
> triggered by the watchdog's {{container::stop()}} call. Without an external
> mechanism forcing the stop, an application sits silently forever.
> - {{{}on_transport_close{}}}'s {{error_condition}} is empty. The
> {{amqp:unauthorized-access}} condition the broker sent was not propagated
> onto the transport before the close-handling path tore it down.
> - The SASL accessor ({{{}transport.sasl(){}}}) does still report the outcome
> correctly when queried in {{{}on_transport_close{}}}. So the SASL state is
> preserved internally; it just isn't surfaced as an {{error_condition}} on the
> transport, and no error event is dispatched.
>
> Broker-side log (RabbitMQ):
> {code:java}
> [error] closing AMQP connection (...) (duration: '6s'):
> [error] {handshake_error,waiting_sasl_init,
> [error] {'v1_0.error',{symbol,<<"amqp:unauthorized-access">>},
> [error] {utf8,<<"PLAIN login refused: user 'valid-user' - invalid
> credentials">>}, [error] undefined}}{code}
> So the auth failure is unambiguously reaching the client over the wire. The
> bytes are processed by Proton (the SASL accessor reflects them after the
> eventual flush). They simply do not produce a {{messaging_handler}} event in
> the SChannel-backed configuration.
> *Probable mechanism*
> AMQP 1.0 lets a server send {{sasl-outcome}} and immediately close the TCP
> connection without waiting for the client to ack as the spec doesn't require
> a hand-back. RabbitMQ does this. The {{sasl-outcome}} frame and the TCP FIN
> can therefore arrive at the client in the same OS-level read. Clients have to
> handle this gracefully: the outcome event must be dispatched even though the
> transport is simultaneously transitioning to closed.
> The OpenSSL-backed code path appears to handle this race correctly. The
> SChannel-backed path appears to lose the outcome event during the
> near-simultaneous teardown - the transport's {{error_condition}} is never
> populated with {{{}amqp:unauthorized-access{}}}, no {{transport_error}} event
> is queued, and the eventual {{transport_close}} carries an empty condition.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]