[jira] [Resolved] (PROTON-2790) Improve session flow control
[ https://issues.apache.org/jira/browse/PROTON-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2790. - Fix Version/s: proton-c-0.40.0 Resolution: Fixed merged: 60ab050b > Improve session flow control > > > Key: PROTON-2790 > URL: https://issues.apache.org/jira/browse/PROTON-2790 > Project: Qpid Proton > Issue Type: Improvement > Components: proton-c >Affects Versions: proton-c-0.39.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Fix For: proton-c-0.40.0 > > > Current flow control replenishment for the session incoming window only > occurs when the window reaches 0. This minimizes flow frames on the wire but > introduces a stall in transfer processing. > Switching to using a low watermark for the session incoming window would > allow the application to choose a preferred trade off between transfer stalls > and FLOW frames. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2859) Improve performance of pn_buffer_t defrag
[ https://issues.apache.org/jira/browse/PROTON-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892349#comment-17892349 ] Clifford Jansen commented on PROTON-2859: - cj-sender.c used with test-drain.c from PROTON-2857 > Improve performance of pn_buffer_t defrag > - > > Key: PROTON-2859 > URL: https://issues.apache.org/jira/browse/PROTON-2859 > Project: Qpid Proton > Issue Type: Improvement > Components: proton-c >Affects Versions: proton-c-0.39.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Attachments: cj-sender.c > > > Currently the only optimization in defrag is a check in rotate to do skip > memory copies if the rotation amount is zero. Otherwise, the full capacity > is rotated one byte at a time, even if there is only one byte of content. > Propose to check if the data in the buffer is currently contiguous and only > move actual content via memmove. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (PROTON-2859) Improve performance of pn_buffer_t defrag
[ https://issues.apache.org/jira/browse/PROTON-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen updated PROTON-2859: Attachment: cj-sender.c > Improve performance of pn_buffer_t defrag > - > > Key: PROTON-2859 > URL: https://issues.apache.org/jira/browse/PROTON-2859 > Project: Qpid Proton > Issue Type: Improvement > Components: proton-c >Affects Versions: proton-c-0.39.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Attachments: cj-sender.c > > > Currently the only optimization in defrag is a check in rotate to do skip > memory copies if the rotation amount is zero. Otherwise, the full capacity > is rotated one byte at a time, even if there is only one byte of content. > Propose to check if the data in the buffer is currently contiguous and only > move actual content via memmove. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2857) Improve performance of session flow control for senders.
Clifford Jansen created PROTON-2857: --- Summary: Improve performance of session flow control for senders. Key: PROTON-2857 URL: https://issues.apache.org/jira/browse/PROTON-2857 Project: Qpid Proton Issue Type: Improvement Components: proton-c Affects Versions: proton-c-0.39.0 Reporter: Clifford Jansen Assignee: Clifford Jansen Attachments: test-drain.c, test-sender.c The current interaction of pn_link_send and transfer frame generation results in many needless buffer rotate calls that are costly. The attached test programs (courtesy of kgiusti) shine a bright light on the problem. In this case a single 40MB message results in 76 buffer rotates and 5GB of individual 8 bit byte moves that are all busy work. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2859) Improve performance of pn_buffer_t defrag
Clifford Jansen created PROTON-2859: --- Summary: Improve performance of pn_buffer_t defrag Key: PROTON-2859 URL: https://issues.apache.org/jira/browse/PROTON-2859 Project: Qpid Proton Issue Type: Improvement Components: proton-c Affects Versions: proton-c-0.39.0 Reporter: Clifford Jansen Assignee: Clifford Jansen Currently the only optimization in defrag is a check in rotate to do skip memory copies if the rotation amount is zero. Otherwise, the full capacity is rotated one byte at a time, even if there is only one byte of content. Propose to check if the data in the buffer is currently contiguous and only move actual content via memmove. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2858) Improve scheduling fairness for outgoing streaming messages
Clifford Jansen created PROTON-2858: --- Summary: Improve scheduling fairness for outgoing streaming messages Key: PROTON-2858 URL: https://issues.apache.org/jira/browse/PROTON-2858 Project: Qpid Proton Issue Type: Improvement Components: proton-c Affects Versions: proton-c-0.39.0 Reporter: Clifford Jansen Assignee: Clifford Jansen PROTON-2857 takes special action in the case of an outgoing streaming link delivery during pn_link_send(). At that point, we know that the delivery is the current one for the link and the last for that link that may be on the tpwork queue with message data to send. It could be possible to continually refill and not fully drain the delivery in pni_process_tpwork_sender(). A simple check if bytes have been sent on the wire since the last pn_link_send, and further if the delivery is on the tpwork queue, can bypass this problem by moving the delivery to the back of the queue and allow other links to progress. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2856) Provide TLS support for intermediate CA certificates as trust anchors in OpenSSL
Clifford Jansen created PROTON-2856: --- Summary: Provide TLS support for intermediate CA certificates as trust anchors in OpenSSL Key: PROTON-2856 URL: https://issues.apache.org/jira/browse/PROTON-2856 Project: Qpid Proton Issue Type: Improvement Components: proton-c Affects Versions: proton-c-0.39.0 Environment: Proton-C built with OpenSSL Reporter: Clifford Jansen Assignee: Clifford Jansen The current implementation of TLS in Proton-C uses the default certificate verification algorithms provided by the OpenSLL library. This has the effect of making it difficult to use intermediate CA certificates in Proton-C to provide finer grade security envelopes for use, for example, by different organizational units in an organization or to differentiate subnets in cloud environments. Currently an intermediate CA, by default, cannot be used to anchor a subtree of a parent root CA because the root CA must also be in the trust store, at which point the whole tree flowing from the root CA becomes trusted. This behavior goes against current user expectations and industry norms. See https://github.com/golang/go/issues/24685#issuecomment-1058119312 This makes it difficult for Proton-C users to use certificate chain tooling that they already have in place. This JIRA proposes to set the X509_V_FLAG_PARTIAL_CHAIN flag when verifying peer certificates in OpenSSL. An additional advantage is a shortened verification sequence. After this change, existing trust stores for use with Proton-C that contain self-signed root certificates will continue to verify the whole subordinate trees of leaf certificates that flow from those roots. Users will now be able to create new trust stores that limit trust to subtrees anchored to intermediate CA certificates. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2594) Use of HSM for crypto opterations with the private key of a TLS certificate
[ https://issues.apache.org/jira/browse/PROTON-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873030#comment-17873030 ] Clifford Jansen commented on PROTON-2594: - Sorry for the delay in responding. I feel the suggested patch is useful and clear in its goal and implementation. Many thanks for your submission. +1 for using the provider api. I would like to comment on the pull request, however I am having difficulty running a simple C test program. No doubt it is due to my lack of familiarity with the standard, as well as with the layers of tooling to simulate HSM in software for testing. From getting this program to run I hope to better understand the implications of the patch for installed package requirements, documentation changes, CI issues, and other differences (e.g. user password prompts?). I have tried using the pkcs11-provider-qpid-proton-bug-reproduction project as a template to initialize softhsm, populate data with pkcs11-tool along with the OPENSSL_CONF and SOFTHSM2_CONF. I cannot yet get it to run using Fedora 40, which should be new enough to work with your patch. I have also tried your suggested C++ program, but the setup stage hangs at the "openssl storeutl", no matter what pin/password I supply, before even exercising your code. A further indication that I get tripped up merely taking baby steps with pkcs11. I have attached the C program I am trying to use (pn2594.c). It simply makes one client and one server connection. It allows you to specify each argument to the OpenSSL domain setup routines for each side. For example, if run with these arguments from qpid-proton/cpp/testdata/certs you can run with mutual TLS (two ways), server side TLS, or no TLS: /path/to/pn2594 amqps "client-certificate.pem" "client-private-key.pem" "client-password" "ca-certificate.pem" "server-certificate.pem" "server-private-key.pem" "server-password" "ca-certificate.pem" /pat/to/pn2594 amqps "client-certificate.pem" "client-private-key-no-password.pem" "" "ca-certificate.pem" "server-certificate.pem" "server-private-key.pem" "server-password" "ca-certificate.pem" /path/to/pn2594 amqps "" "" "" "ca-certificate.pem" "server-certificate.pem" "server-private-key.pem" "server-password" "" /path/to/pn2594 amqp "" "" "" "" "" "" "" "" I am trying to replace the first two examples "client private key" and "client password" with a pkcs11 URI and PIN, i.e. pkcs11-tool --module=/usr/lib64/libsofthsm2.so --token-label clitest --pin tclientpw --label test --id --write-object /r4/amqp/p/pkcs11/cj/cjcerts/cj-client-private-key-no-password.pem --type privkey --usage-sign pn2594 amqps "client-certificate.pem" "pkcs11:token=clitest;id=%44%44" "tclientpw" "ca-certificate.pem" "server-certificate.pem" "server-private-key.pem" "server-password" "ca-certificate.pem" I would appreciate if you can confirm you can run this test with your pkcs11 patch and get it to work in the way you think it should be run (i.e. not "fixing" my command usage or config files). Step by step commands (or a captured terminal session) to reproduce would be appreciated. Preferably starting with an empty softhsm, initializing it, creating/loading the slot+token. Hopefully from this exercise I can help you get the patch integrated. Thanks. > Use of HSM for crypto opterations with the private key of a TLS certificate > --- > > Key: PROTON-2594 > URL: https://issues.apache.org/jira/browse/PROTON-2594 > Project: Qpid Proton > Issue Type: New Feature > Components: cpp-binding, proton-c >Reporter: Franz Hollerer >Priority: Major > Attachments: pn2594.c > > > We use a Hardware Security Module with PKCS#11 Interface (to be more > specific: OP-TEE) as key store. This key store holds the public and private > key for a TLS certificate for the purpose of client authentication. > Is there a way to instruct proton-qpid to use the HSM for cryptographic > operations with the private key? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (PROTON-2594) Use of HSM for crypto opterations with the private key of a TLS certificate
[ https://issues.apache.org/jira/browse/PROTON-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen updated PROTON-2594: Attachment: pn2594.c > Use of HSM for crypto opterations with the private key of a TLS certificate > --- > > Key: PROTON-2594 > URL: https://issues.apache.org/jira/browse/PROTON-2594 > Project: Qpid Proton > Issue Type: New Feature > Components: cpp-binding, proton-c >Reporter: Franz Hollerer >Priority: Major > Attachments: pn2594.c > > > We use a Hardware Security Module with PKCS#11 Interface (to be more > specific: OP-TEE) as key store. This key store holds the public and private > key for a TLS certificate for the purpose of client authentication. > Is there a way to instruct proton-qpid to use the HSM for cryptographic > operations with the private key? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2834) Container stop delayed by canceled work_queue task.
Clifford Jansen created PROTON-2834: --- Summary: Container stop delayed by canceled work_queue task. Key: PROTON-2834 URL: https://issues.apache.org/jira/browse/PROTON-2834 Project: Qpid Proton Issue Type: Bug Components: cpp-binding Affects Versions: proton-c-0.39.0 Reporter: Clifford Jansen Attachments: tc1.cpp Canceling work using the work_handle does not remove the canceled item nor adjust the next proactor timeout forward if necessary. This prevents the container from stopping until the last scheduled work has reached its deadline, even if canceled. My first attempt at a fix fell short. I believe a proper fix requires a combination of checking for a shortened timer in cancel and some sort of reaping of canceled work in cancel() or run_timer_jobs() or both. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2832) Use-after free tsan error in epoll.c::post_event
[ https://issues.apache.org/jira/browse/PROTON-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17855982#comment-17855982 ] Clifford Jansen commented on PROTON-2832: - Presumably there is a race between the last incoming epoll IO event (on the poller thread) and the call to stop_polling() (on the worker thread). I suspect you need a raw connection wake (on its own) activating the worker, otherwise there would be no competing epoll activity "armed". One solution is to adopt the current_arm + shutdown() behaviour of the AMQP connection. I'm not sure what the equivalent state machine solution would be, but presumably this extra state will be relevant to an IOCP or (future) io_uring implementation too. > Use-after free tsan error in epoll.c::post_event > > > Key: PROTON-2832 > URL: https://issues.apache.org/jira/browse/PROTON-2832 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.40.0 >Reporter: Ken Giusti >Priority: Major > > Hit only once in github CI running in ubuntu 24.04 container. > proton main git ref: 813f87eef9e44682948e390b1d20a68ff283bad1 > Link to github log: > [https://github.com/skupperproject/skupper-router/actions/runs/9550471254/job/26322660920?pr=1524#step:10:5749] > > > {quote}59: E RuntimeError: Process 7654 (name=HTTP1EventChannel) error: > returned error code 66 {quote} > {quote} > 59: E skrouterd -c HTTP1EventChannel.conf -I > /home/runner/work/skupper-router/skupper-router/skupper-router/python > > 59: E > /home/runner/work/skupper-router/skupper-router/skupper-router/build/tests/system_test.dir/tests/system_tests_http1_adaptor/Http1AdaptorEventChannelTest/setUpClass/HTTP1EventChannel-17.cmd > > > 59: E > > 59: E == > > 59: E WARNING: ThreadSanitizer: heap-use-after-free (pid=7654) > > 59: E Write of size 1 at 0x726800030c91 by thread T4 (mutexes: write M0, > write M1): > > 59: E #0 post_event ../c/src/proactor/epoll.c:2349 > (libqpid-proton-proactor.so.1+0x137f8) (BuildId: > 158eaa565e8d209417b7751d724f3f73f8099121) > > 59: E #1 poller_do_epoll ../c/src/proactor/epoll.c:2617 > (libqpid-proton-proactor.so.1+0x137f8) > > 59: E #2 next_event_batch ../c/src/proactor/epoll.c:2501 > (libqpid-proton-proactor.so.1+0x137f8) > > 59: E #3 pn_proactor_wait ../c/src/proactor/epoll.c:2740 > (libqpid-proton-proactor.so.1+0x16265) (BuildId: > 158eaa565e8d209417b7751d724f3f73f8099121) > > 59: E #4 proactor_thread ../src/server.c:168 (skrouterd+0x130421) (BuildId: > 3a2755d79ab408265526faf0567b497811b59975) > > 59: E #5 _thread_init ../src/posix/threading.c:207 (skrouterd+0xc8441) > (BuildId: 3a2755d79ab408265526faf0567b497811b59975) > > 59: E > > 59: E Previous write of size 8 at 0x726800030c90 by thread T6: > > 59: E #0 free > ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:724 > (libtsan.so.2+0x5747c) (BuildId: 64c1e8de04b11a7d960abd7e45f94f3b277b7779) > > 59: E #1 praw_connection_cleanup ../c/src/proactor/epoll_raw_connection.c:171 > (libqpid-proton-proactor.so.1+0x8de4) (BuildId: > 158eaa565e8d209417b7751d724f3f73f8099121) > > 59: E #2 praw_connection_cleanup ../c/src/proactor/epoll_raw_connection.c:157 > (libqpid-proton-proactor.so.1+0x8de4) > > 59: E #3 pni_raw_connection_done ../c/src/proactor/epoll_raw_connection.c:496 > (libqpid-proton-proactor.so.1+0x174b9) (BuildId: > 158eaa565e8d209417b7751d724f3f73f8099121) > > 59: E #4 pn_proactor_done ../c/src/proactor/epoll.c:2762 > (libqpid-proton-proactor.so.1+0x174b9) > > 59: E #5 proactor_thread ../src/server.c:200 (skrouterd+0x1304d8) (BuildId: > 3a2755d79ab408265526faf0567b497811b59975) > > 59: E #6 _thread_init ../src/posix/threading.c:207 (skrouterd+0xc8441) > (BuildId: 3a2755d79ab408265526faf0567b497811b59975) > > 59: E > > 59: E Mutex M0 (0x726400030a50) created at: > > 59: E #0 pthread_mutex_init > ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1315 > (libtsan.so.2+0x58bfd) (BuildId: 64c1e8de04b11a7d960abd7e45f94f3b277b7779) > > 59: E #1 pmutex_init ../c/src/proactor/epoll-internal.h:336 > (libqpid-proton-proactor.so.1+0xcfb3) (BuildId: > 158eaa565e8d209417b7751d724f3f73f8099121) > > 59: E #2 pn_proactor ../c/src/proactor/epoll.c:1991 > (libqpid-proton-proactor.so.1+0xcfb3) > > 59: E #3 qd_server ../src/server.c:219 (skrouterd+0x13b739) (BuildId: > 3a2755d79ab408265526faf0567b497811b59975) > > 59: E #4 qd_dispatch_prepare ../src/dispatch.c:343 (skrouterd+0xb11cd) > (BuildId: 3a2755d79ab408265526faf0567b497811b59975) > > 59: E #5 (libffi.so.8+0x7b15) (BuildId: > c9149b6e99105aa4321ddd4a10ee4b90de7b7d49) > > 59: E #6 main_process ../router/src/main.c:101 (skrouterd+0x13c57c) (BuildId:
[jira] [Created] (PROTON-2818) Move epoll proctor connection logic to a task thread
Clifford Jansen created PROTON-2818: --- Summary: Move epoll proctor connection logic to a task thread Key: PROTON-2818 URL: https://issues.apache.org/jira/browse/PROTON-2818 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.39.0 Reporter: Clifford Jansen Assignee: Clifford Jansen See PROTON-2812. Implement the first described mitigation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2812) Epoll proactor blocks thread during DNS lookups in getaddrinfo
[ https://issues.apache.org/jira/browse/PROTON-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837355#comment-17837355 ] Clifford Jansen commented on PROTON-2812: - An additional possible mitigation (with thanks to astitcher): Since the epoll proactor knows when the getaddrinfo calls are needed and also when they are completed, it could regulate a maximum concurrent number of threads committed to servicing such calls. > Epoll proactor blocks thread during DNS lookups in getaddrinfo > -- > > Key: PROTON-2812 > URL: https://issues.apache.org/jira/browse/PROTON-2812 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.39.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Attachments: mitigate01.diff > > > The epoll proactor uses getaddrinfo() to resolve network addresses for > inbound and outbound AMQP and raw connections. These connect and listener > calls are thread safe so may be called from any thread and the expectation is > that they initiate the action without blocking. > Solutions could entail: > 1) using a dedicated DNS thread pool that multiplexes N serialized (blocking) > getaddrinfo calls over the pool (e.g. getaddrinfo_a or self managed like > libuv) > 2) use some custom library that scales DNS requests without blocking > 3) write the simplest custom proactor library that does #2. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2812) Epoll proactor blocks thread during DNS lookups in getaddrinfo
[ https://issues.apache.org/jira/browse/PROTON-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835482#comment-17835482 ] Clifford Jansen commented on PROTON-2812: - A short term mitigation could be to defer the getaddrinfo call for outgoing connections to run on a proactor task thread provided by the application. The outgoing connection call will not block in this case. If there are sufficient task threads compared to the number of number of blocked DNS calls during runtime, the performance impact may be greatly reduced. See the attached mitigate01.txt > Epoll proactor blocks thread during DNS lookups in getaddrinfo > -- > > Key: PROTON-2812 > URL: https://issues.apache.org/jira/browse/PROTON-2812 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.39.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Attachments: mitigate01.diff > > > The epoll proactor uses getaddrinfo() to resolve network addresses for > inbound and outbound AMQP and raw connections. These connect and listener > calls are thread safe so may be called from any thread and the expectation is > that they initiate the action without blocking. > Solutions could entail: > 1) using a dedicated DNS thread pool that multiplexes N serialized (blocking) > getaddrinfo calls over the pool (e.g. getaddrinfo_a or self managed like > libuv) > 2) use some custom library that scales DNS requests without blocking > 3) write the simplest custom proactor library that does #2. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (PROTON-2812) Epoll proactor blocks thread during DNS lookups in getaddrinfo
[ https://issues.apache.org/jira/browse/PROTON-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen updated PROTON-2812: Attachment: mitigate01.diff > Epoll proactor blocks thread during DNS lookups in getaddrinfo > -- > > Key: PROTON-2812 > URL: https://issues.apache.org/jira/browse/PROTON-2812 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.39.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Attachments: mitigate01.diff > > > The epoll proactor uses getaddrinfo() to resolve network addresses for > inbound and outbound AMQP and raw connections. These connect and listener > calls are thread safe so may be called from any thread and the expectation is > that they initiate the action without blocking. > Solutions could entail: > 1) using a dedicated DNS thread pool that multiplexes N serialized (blocking) > getaddrinfo calls over the pool (e.g. getaddrinfo_a or self managed like > libuv) > 2) use some custom library that scales DNS requests without blocking > 3) write the simplest custom proactor library that does #2. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2812) Epoll proactor blocks thread during DNS lookups in getaddrinfo
Clifford Jansen created PROTON-2812: --- Summary: Epoll proactor blocks thread during DNS lookups in getaddrinfo Key: PROTON-2812 URL: https://issues.apache.org/jira/browse/PROTON-2812 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.39.0 Reporter: Clifford Jansen Assignee: Clifford Jansen The epoll proactor uses getaddrinfo() to resolve network addresses for inbound and outbound AMQP and raw connections. These connect and listener calls are thread safe so may be called from any thread and the expectation is that they initiate the action without blocking. Solutions could entail: 1) using a dedicated DNS thread pool that multiplexes N serialized (blocking) getaddrinfo calls over the pool (e.g. getaddrinfo_a or self managed like libuv) 2) use some custom library that scales DNS requests without blocking 3) write the simplest custom proactor library that does #2. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2792) [cpp] Segmentation fault in container::impl::run_timer_jobs
[ https://issues.apache.org/jira/browse/PROTON-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815749#comment-17815749 ] Clifford Jansen commented on PROTON-2792: - My previous comment is obviously irrelevant to the stated problem. Please ignore. > [cpp] Segmentation fault in container::impl::run_timer_jobs > --- > > Key: PROTON-2792 > URL: https://issues.apache.org/jira/browse/PROTON-2792 > Project: Qpid Proton > Issue Type: Bug > Components: cpp-binding >Affects Versions: proton-c-0.38.0 >Reporter: Martin Zlomek >Priority: Major > > PROTON-2438 introduced a race condition in > [reading|https://github.com/DreamPearl/qpid-proton/blob/8142e3cecd9f668992e76a5448afc09fd7b1030a/cpp/src/proactor_container_impl.cpp#L545] > / > [writing|https://github.com/DreamPearl/qpid-proton/blob/8142e3cecd9f668992e76a5448afc09fd7b1030a/cpp/src/proactor_container_impl.cpp#L547] > {{is_active_}} in > [{{run_timer_jobs()}}|https://github.com/DreamPearl/qpid-proton/blob/8142e3cecd9f668992e76a5448afc09fd7b1030a/cpp/src/proactor_container_impl.cpp#L498] > while modifying it in > [{{schedule()}}|https://github.com/DreamPearl/qpid-proton/blob/8142e3cecd9f668992e76a5448afc09fd7b1030a/cpp/src/proactor_container_impl.cpp#L455] > or > [{{cancel()}}|https://github.com/DreamPearl/qpid-proton/blob/8142e3cecd9f668992e76a5448afc09fd7b1030a/cpp/src/proactor_container_impl.cpp#L473] > at the same time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2792) [cpp] Segmentation fault in container::impl::run_timer_jobs
[ https://issues.apache.org/jira/browse/PROTON-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815733#comment-17815733 ] Clifford Jansen commented on PROTON-2792: - run_timer_jobs() is only called from the PN_PROACTOR_TIMEOUT callback. The proactor only allows one such callback at a time. There should be no competing thread to GUARD against. Is this JIRA based on an actual runtime failure? If so, do you have a stack trace? > [cpp] Segmentation fault in container::impl::run_timer_jobs > --- > > Key: PROTON-2792 > URL: https://issues.apache.org/jira/browse/PROTON-2792 > Project: Qpid Proton > Issue Type: Bug > Components: cpp-binding >Affects Versions: proton-c-0.38.0 >Reporter: Martin Zlomek >Priority: Major > > PROTON-2438 introduced a race condition in > [reading|https://github.com/DreamPearl/qpid-proton/blob/8142e3cecd9f668992e76a5448afc09fd7b1030a/cpp/src/proactor_container_impl.cpp#L545] > / > [writing|https://github.com/DreamPearl/qpid-proton/blob/8142e3cecd9f668992e76a5448afc09fd7b1030a/cpp/src/proactor_container_impl.cpp#L547] > {{is_active_}} in > [{{run_timer_jobs()}}|https://github.com/DreamPearl/qpid-proton/blob/8142e3cecd9f668992e76a5448afc09fd7b1030a/cpp/src/proactor_container_impl.cpp#L498] > while modifying it in > [{{schedule()}}|https://github.com/DreamPearl/qpid-proton/blob/8142e3cecd9f668992e76a5448afc09fd7b1030a/cpp/src/proactor_container_impl.cpp#L455] > or > [{{cancel()}}|https://github.com/DreamPearl/qpid-proton/blob/8142e3cecd9f668992e76a5448afc09fd7b1030a/cpp/src/proactor_container_impl.cpp#L473] > at the same time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2791) Add MSG_MORE performance boost to raw connections
Clifford Jansen created PROTON-2791: --- Summary: Add MSG_MORE performance boost to raw connections Key: PROTON-2791 URL: https://issues.apache.org/jira/browse/PROTON-2791 Project: Qpid Proton Issue Type: Improvement Components: proton-c Affects Versions: proton-c-0.39.0 Reporter: Clifford Jansen Assignee: Clifford Jansen When multiple buffers are staged for writing, the use of the MSG_MORE send() flag for all but the last buffer can result in significant speed improvement. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2790) Improve session flow control
Clifford Jansen created PROTON-2790: --- Summary: Improve session flow control Key: PROTON-2790 URL: https://issues.apache.org/jira/browse/PROTON-2790 Project: Qpid Proton Issue Type: Improvement Components: proton-c Affects Versions: proton-c-0.39.0 Reporter: Clifford Jansen Assignee: Clifford Jansen Current flow control replenishment for the session incoming window only occurs when the window reaches 0. This minimizes flow frames on the wire but introduces a stall in transfer processing. Switching to using a low watermark for the session incoming window would allow the application to choose a preferred trade off between transfer stalls and FLOW frames. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Closed] (PROTON-2545) raw connection: client disconnect is ignored if no read buffers are available.
[ https://issues.apache.org/jira/browse/PROTON-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen closed PROTON-2545. --- Resolution: Won't Fix Opposite approach taken. See 2748. If there is a need to revisit, it would be best to open a new Jira with reference to these older issues with new info on why the decision needs refining. > raw connection: client disconnect is ignored if no read buffers are available. > -- > > Key: PROTON-2545 > URL: https://issues.apache.org/jira/browse/PROTON-2545 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.37.0 >Reporter: Ken Giusti >Assignee: Clifford Jansen >Priority: Major > > Refer to [https://github.com/skupperproject/skupper-router/issues/477] > TL;DR - if a client closes its TCP connection (full drop - not half close), > the proactor cannot post a PN_RAW_CONNECTION_DISCONNECTED event unless read > buffers have been provided to the raw connection. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2763) Two final disconnect events possible from a raw connection
[ https://issues.apache.org/jira/browse/PROTON-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2763. - Fix Version/s: proton-c-0.40.0 Resolution: Fixed > Two final disconnect events possible from a raw connection > -- > > Key: PROTON-2763 > URL: https://issues.apache.org/jira/browse/PROTON-2763 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.39.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Fix For: proton-c-0.40.0 > > > In writing a new threaderciser for raw connections the following scenario can > result in a state machine mixup and second disconnect. > If a pn_raw_connection_wake() occurs around the time that the first > disconnect event is being consumed the task may be added to the global ready > list for processing. The batch done() processing will (correctly) defer the > task cleanup until the task is next scheduled via the ready list. However > the raw connection forgets that it has already done the disconnect and > restarts the state machine at the first disconnect stage, resulting in the > second disconnect event. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2764) Zombie raw connections
[ https://issues.apache.org/jira/browse/PROTON-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2764. - Fix Version/s: proton-c-0.40.0 Resolution: Fixed > Zombie raw connections > -- > > Key: PROTON-2764 > URL: https://issues.apache.org/jira/browse/PROTON-2764 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.39.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Fix For: proton-c-0.40.0 > > > In writing a new threaderciser for raw connections the following scenario can > result in raw connections that are never scheduled. > If a pn_listener_raw_accept() fails due to a temporary fdlimit shortage or > simultaneous close of the listener by another thread, the new raw connection > is correctly set to an error state but is never scheduled for processing. > The state machine is never advanced and the raw connection resources are not > cleaned up. This also causes the PN_PROACTOR_INACTIVE event to be blocked. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2764) Zombie raw connections
Clifford Jansen created PROTON-2764: --- Summary: Zombie raw connections Key: PROTON-2764 URL: https://issues.apache.org/jira/browse/PROTON-2764 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.39.0 Reporter: Clifford Jansen Assignee: Clifford Jansen In writing a new threaderciser for raw connections the following scenario can result in raw connections that are never scheduled. If a pn_listener_raw_accept() fails due to a temporary fdlimit shortage or simultaneous close of the listener by another thread, the new raw connection is correctly set to an error state but is never scheduled for processing. The state machine is never advanced and the raw connection resources are not cleaned up. This also causes the PN_PROACTOR_INACTIVE event to be blocked. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2763) Two final disconnect events possible from a raw connection
Clifford Jansen created PROTON-2763: --- Summary: Two final disconnect events possible from a raw connection Key: PROTON-2763 URL: https://issues.apache.org/jira/browse/PROTON-2763 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.39.0 Reporter: Clifford Jansen Assignee: Clifford Jansen In writing a new threaderciser for raw connections the following scenario can result in a state machine mixup and second disconnect. If a pn_raw_connection_wake() occurs around the time that the first disconnect event is being consumed the task may be added to the global ready list for processing. The batch done() processing will (correctly) defer the task cleanup until the task is next scheduled via the ready list. However the raw connection forgets that it has already done the disconnect and restarts the state machine at the first disconnect stage, resulting in the second disconnect event. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2748) Raw connections do not always complete close operations
[ https://issues.apache.org/jira/browse/PROTON-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754426#comment-17754426 ] Clifford Jansen commented on PROTON-2748: - Related issues: https://issues.apache.org/jira/browse/PROTON-2545 https://issues.apache.org/jira/browse/PROTON-2680 > Raw connections do not always complete close operations > --- > > Key: PROTON-2748 > URL: https://issues.apache.org/jira/browse/PROTON-2748 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.39.0 > Environment: linux epoll >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Attachments: pn2748.patch > > > The Proton raw_connection_t currently requires cooperation from the > application layer to complete a close. There is a baked in assumption that > the application will always eventually provide a read buffer. A second > assumption is that the peer (not necessarily a Proton raw connection) will > detect a read close on its side, and do a graceful close of it's write side > "soon". > These incorrect assumptions can leave the raw connection in a hung state > waiting for non-existent wind up activity by the application or peer, > respectively. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2545) raw connection: client disconnect is ignored if no read buffers are available.
[ https://issues.apache.org/jira/browse/PROTON-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754425#comment-17754425 ] Clifford Jansen commented on PROTON-2545: - See the opposite reasoning in https://issues.apache.org/jira/browse/PROTON-2748 It looks at detecting/initiating/completing the shutdown and cleanup of socket resources from a wider perspective and is perhaps the better place to discuss/resolve this issue. > raw connection: client disconnect is ignored if no read buffers are available. > -- > > Key: PROTON-2545 > URL: https://issues.apache.org/jira/browse/PROTON-2545 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.37.0 >Reporter: Ken Giusti >Assignee: Clifford Jansen >Priority: Major > > Refer to [https://github.com/skupperproject/skupper-router/issues/477] > TL;DR - if a client closes its TCP connection (full drop - not half close), > the proactor cannot post a PN_RAW_CONNECTION_DISCONNECTED event unless read > buffers have been provided to the raw connection. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2680) [proton-c] PN_RAW_CONNECTION_DISCONNECTED event does not show up when client is disconnected
[ https://issues.apache.org/jira/browse/PROTON-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754421#comment-17754421 ] Clifford Jansen commented on PROTON-2680: - The test case should be retested against the code changes in https://issues.apache.org/jira/browse/PROTON-2748 once they have been finalized, approved and checked in. Ultimately, it should be noted that the killed curl process does: connect to router send http request bytes on socket [ kill ] OS closes socket -> FIN and nothing else except an ack to a FIN if the router ever sends one, or an RST if the router sends data. >From the router's perspective, this is identical to some other client which >does: connect to router send http request bytes on socket wait some time half close socket (write side) -> FIN wait a long long time for the http response from the router The latter is completely valid and should not result in a DISCONNECT. The two are indistinguishable on the wire (or loopback). > [proton-c] PN_RAW_CONNECTION_DISCONNECTED event does not show up when client > is disconnected > - > > Key: PROTON-2680 > URL: https://issues.apache.org/jira/browse/PROTON-2680 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Reporter: Ganesh Murthy >Assignee: Clifford Jansen >Priority: Major > > Steps to reproduce > Start the skupper-router with the following config - > {noformat} > router { > mode: standalone > } > listener { > host: 0.0.0.0 > port: amqp > authenticatePeer: no > saslMechanisms: ANONYMOUS > } > tcpConnector { > name: echo-1 > host: 10.108.50.177 > port: 9090 > address: echo > } > tcpConnector { > name: echo-2 > host: 10.108.50.177 > port: 9090 > address: echo > } > tcpListener { > host: 0.0.0.0 > port: 9000 > address: echo > } > log { > module: DEFAULT > enable: trace+ > outputFile: tcp.log > } {noformat} > > Note that the ip address in the host field of the tcpConnector is bogus. > Now connect a curl client to the tcpListener port - > {noformat} > curl http://localhost:9000/api {noformat} > > The curl client will hang. Terminate the curl client and look in the tcp.log > for logged proton events - the PN_RAW_CONNECTION_DISCONNECTED event will be > missing on connection C2 > Here is the full log of the relevant client connection > > {noformat} > 2023-02-01 16:51:57.069705 -0500 ROUTER_CORE (info) [C2] Connection Opened: > dir=in host=127.0.0.1:35348 encrypted=no auth=no user= > container_id=TcpAdaptor props={:"qd.adaptor"="tcp"} > 2023-02-01 16:51:57.069793 -0500 ROUTER_CORE (trace) Core action > 'connection_opened' > 2023-02-01 16:51:57.069986 -0500 TCP_ADAPTOR (info) [C2] > PN_RAW_CONNECTION_CONNECTED Listener ingress accepted to 0.0.0.0:9000 from > 127.0.0.1:35348 (global_id=127.0.0.1:35348) > 2023-02-01 16:51:57.070015 -0500 ROUTER_CORE (trace) Core action > 'link_first_attach' > 2023-02-01 16:51:57.070098 -0500 TCP_ADAPTOR (debug) [C2] > PN_RAW_CONNECTION_NEED_WRITE_BUFFERS listener > 2023-02-01 16:51:57.070148 -0500 TCP_ADAPTOR (debug) [C2] > PN_RAW_CONNECTION_NEED_READ_BUFFERS listener > 2023-02-01 16:51:57.070171 -0500 ROUTER_CORE (info) [C2][L4] Link attached: > dir=out source={(dyn) expire:link} target={ expire:link} > 2023-02-01 16:51:57.070222 -0500 TCP_ADAPTOR (debug) [C2] > qdr_tcp_activate_CT: call pn_raw_connection_wake() > 2023-02-01 16:51:57.070246 -0500 ROUTER_CORE (trace) Core action > 'link_first_attach' > 2023-02-01 16:51:57.070273 -0500 TCP_ADAPTOR (debug) [C2][L4] (listener > outgoing) qdr_tcp_second_attach > 2023-02-01 16:51:57.070347 -0500 DEFAULT (trace) Parse tree search for 'echo' > 2023-02-01 16:51:57.070376 -0500 TCP_ADAPTOR (trace) [C2][L5] handle_incoming > qdr_tcp_second_attach for listener connection. read_closed:F, flow_enabled:F > 2023-02-01 16:51:57.070404 -0500 DEFAULT (trace) Parse tree match not found > 2023-02-01 16:51:57.070425 -0500 TCP_ADAPTOR (debug) [C2][L5] Waiting for > credit before initiating listener ingress stream message, returning > 2023-02-01 16:51:57.070456 -0500 TCP_ADAPTOR (debug) [C2][L4] > qdr_tcp_get_credit: NOOP > 2023-02-01 16:51:57.070517 -0500 TCP_ADAPTOR (trace) Listener > tcpListener/0.0.0.0:9000 (0.0.0.0:9000) service address echo consumer count > updates: local=1 in-process=0 remote=0 > 2023-02-01 16:51:57.070553 -0500 ROUTER_CORE (info) [C2][L5] Link attached: > dir=in source={ expire:link} target={echo expire:link} > 2023-02-01 16:51:57.070583 -0500 ROUTER_CORE (trace) Core action > 'add_tcp_connection' > 2023-02-01 16:51:57.070606 -0500 TCP_ADAPTOR (debug) [C2] > PN_RAW_CONNECTION_WAKE listener > 2023-02-01 16:51:57.070646 -0500 TCP_A
[jira] [Commented] (PROTON-2748) Raw connections do not always complete close operations
[ https://issues.apache.org/jira/browse/PROTON-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752818#comment-17752818 ] Clifford Jansen commented on PROTON-2748: - This is the proposed behaviour for normal operation and asynchronous network errors. In normal operation the application may suspend/resume activity on a raw connection by withholding/supplying read and raw buffers as desired. In the absence of network errors, pending input bytes will be available for read before the CLOSED_READ event and pending output bytes will be sent before a CLOSED_WRITE event. READ and READ_CLOSED activity will not be polled/requested of the OS by the raw connection in the absence of read buffers. If there are no queued output buffers for writing, the raw connection will be suspended until a future pn_raw_connection_wake() or network error. Async disconnect (RST) is always immediately detected and leads to DISCONNECTED state and subsequent resource cleanup including close of the underlying socket without further blocking. pn_raw_connection_close() results in progression to DISCONNECTED state without blocking (including resource cleanup). In particular, no acknowledgment of the close operation is required or expected from the peer and the TCP connection is cleaned up by the operating system according to its configured SO_LINGER policy. > Raw connections do not always complete close operations > --- > > Key: PROTON-2748 > URL: https://issues.apache.org/jira/browse/PROTON-2748 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.39.0 > Environment: linux epoll >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Attachments: pn2748.patch > > > The Proton raw_connection_t currently requires cooperation from the > application layer to complete a close. There is a baked in assumption that > the application will always eventually provide a read buffer. A second > assumption is that the peer (not necessarily a Proton raw connection) will > detect a read close on its side, and do a graceful close of it's write side > "soon". > These incorrect assumptions can leave the raw connection in a hung state > waiting for non-existent wind up activity by the application or peer, > respectively. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2748) Raw connections do not always complete close operations
[ https://issues.apache.org/jira/browse/PROTON-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740290#comment-17740290 ] Clifford Jansen commented on PROTON-2748: - test case in pn2748.patch > Raw connections do not always complete close operations > --- > > Key: PROTON-2748 > URL: https://issues.apache.org/jira/browse/PROTON-2748 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.39.0 > Environment: linux epoll >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Attachments: pn2748.patch > > > The Proton raw_connection_t currently requires cooperation from the > application layer to complete a close. There is a baked in assumption that > the application will always eventually provide a read buffer. A second > assumption is that the peer (not necessarily a Proton raw connection) will > detect a read close on its side, and do a graceful close of it's write side > "soon". > These incorrect assumptions can leave the raw connection in a hung state > waiting for non-existent wind up activity by the application or peer, > respectively. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (PROTON-2748) Raw connections do not always complete close operations
[ https://issues.apache.org/jira/browse/PROTON-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen updated PROTON-2748: Attachment: pn2748.patch > Raw connections do not always complete close operations > --- > > Key: PROTON-2748 > URL: https://issues.apache.org/jira/browse/PROTON-2748 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.39.0 > Environment: linux epoll >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Attachments: pn2748.patch > > > The Proton raw_connection_t currently requires cooperation from the > application layer to complete a close. There is a baked in assumption that > the application will always eventually provide a read buffer. A second > assumption is that the peer (not necessarily a Proton raw connection) will > detect a read close on its side, and do a graceful close of it's write side > "soon". > These incorrect assumptions can leave the raw connection in a hung state > waiting for non-existent wind up activity by the application or peer, > respectively. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2748) Raw connections do not always complete close operations
Clifford Jansen created PROTON-2748: --- Summary: Raw connections do not always complete close operations Key: PROTON-2748 URL: https://issues.apache.org/jira/browse/PROTON-2748 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.39.0 Environment: linux epoll Reporter: Clifford Jansen Assignee: Clifford Jansen The Proton raw_connection_t currently requires cooperation from the application layer to complete a close. There is a baked in assumption that the application will always eventually provide a read buffer. A second assumption is that the peer (not necessarily a Proton raw connection) will detect a read close on its side, and do a graceful close of it's write side "soon". These incorrect assumptions can leave the raw connection in a hung state waiting for non-existent wind up activity by the application or peer, respectively. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2747) Switch to OpenSSL for TLS support on Windows
Clifford Jansen created PROTON-2747: --- Summary: Switch to OpenSSL for TLS support on Windows Key: PROTON-2747 URL: https://issues.apache.org/jira/browse/PROTON-2747 Project: Qpid Proton Issue Type: Improvement Components: proton-c Affects Versions: proton-c-future Environment: Windows Reporter: Clifford Jansen Assignee: Clifford Jansen Proton-C performance has received considerable attention in the last few years resulting in significant performance boosts. Further improvements are planned and some of these are expected to require non-trivial plumbing changes to the the IO subsystem including TLS support. Currently a lot of this plumbing has twinned implementations for Windows and Posix. Given that today the use and adoption of open source software is actively supported by Microsoft, including integrated build tool chains, it makes sense to simplify the Proton code for future enhancements and long term maintenance. This JIRA tracks ongoing implementation work for the switch from Schannel libraries (native Windows) to OpenSSL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (PROTON-2736) TLS OpenSSL library: hang with large application data frames
[ https://issues.apache.org/jira/browse/PROTON-2736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen updated PROTON-2736: Fix Version/s: proton-c-0.39.0 > TLS OpenSSL library: hang with large application data frames > > > Key: PROTON-2736 > URL: https://issues.apache.org/jira/browse/PROTON-2736 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.38.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Fix For: proton-c-0.39.0 > > > OpenSSL maintains a buffer large enough for the largest possible TLS protocol > record + 1K. The Proton TLS decrypt loop is unaware of record boundaries and > repeatedly adds encrypted bytes at one end and takes out decrypted bytes at > the other, stopping when there is no more to decrypt or no more application > buffer space to move decrypted content into. > It also tests if there are remaining decrypted bytes available should the > application provide additional buffers. This test can fail in the case that > the OpenSSL buffer is completely filled with: > handshake record > 1K followed by > partial max sized application data record > The SSL_peek operation will not see any application data and Proton > "remembers" the full buffer without allowing that the handshake record has > been processed and the buffer is no longer full. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2736) TLS OpenSSL library: hang with large application data frames
[ https://issues.apache.org/jira/browse/PROTON-2736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2736. - Resolution: Fixed > TLS OpenSSL library: hang with large application data frames > > > Key: PROTON-2736 > URL: https://issues.apache.org/jira/browse/PROTON-2736 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.38.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Fix For: proton-c-0.39.0 > > > OpenSSL maintains a buffer large enough for the largest possible TLS protocol > record + 1K. The Proton TLS decrypt loop is unaware of record boundaries and > repeatedly adds encrypted bytes at one end and takes out decrypted bytes at > the other, stopping when there is no more to decrypt or no more application > buffer space to move decrypted content into. > It also tests if there are remaining decrypted bytes available should the > application provide additional buffers. This test can fail in the case that > the OpenSSL buffer is completely filled with: > handshake record > 1K followed by > partial max sized application data record > The SSL_peek operation will not see any application data and Proton > "remembers" the full buffer without allowing that the handshake record has > been processed and the buffer is no longer full. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2736) TLS OpenSSL library: hang with large application data frames
Clifford Jansen created PROTON-2736: --- Summary: TLS OpenSSL library: hang with large application data frames Key: PROTON-2736 URL: https://issues.apache.org/jira/browse/PROTON-2736 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.38.0 Reporter: Clifford Jansen Assignee: Clifford Jansen OpenSSL maintains a buffer large enough for the largest possible TLS protocol record + 1K. The Proton TLS decrypt loop is unaware of record boundaries and repeatedly adds encrypted bytes at one end and takes out decrypted bytes at the other, stopping when there is no more to decrypt or no more application buffer space to move decrypted content into. It also tests if there are remaining decrypted bytes available should the application provide additional buffers. This test can fail in the case that the OpenSSL buffer is completely filled with: handshake record > 1K followed by partial max sized application data record The SSL_peek operation will not see any application data and Proton "remembers" the full buffer without allowing that the handshake record has been processed and the buffer is no longer full. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2725) epoll spin locks disabled
Clifford Jansen created PROTON-2725: --- Summary: epoll spin locks disabled Key: PROTON-2725 URL: https://issues.apache.org/jira/browse/PROTON-2725 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.38.0 Reporter: Clifford Jansen Assignee: Clifford Jansen PROTON-2346 has the unfortunate effect of never enabling adative spin locks even on platforms that support them. PTHREAD_MUTEX_ADAPTIVE_NP is an enumeration and the #ifdef test for it fails even when it exists as a platform enumerated option. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2673) Proactor hangs if pn_raw_connection_wake() is called with outstanding connection attempt
[ https://issues.apache.org/jira/browse/PROTON-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2673. - Fix Version/s: proton-c-0.39.0 Resolution: Fixed > Proactor hangs if pn_raw_connection_wake() is called with outstanding > connection attempt > > > Key: PROTON-2673 > URL: https://issues.apache.org/jira/browse/PROTON-2673 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.38.0, proton-c-0.39.0 >Reporter: Ken Giusti >Assignee: Clifford Jansen >Priority: Major > Fix For: proton-c-0.39.0 > > Attachments: raw_wake.c > > > If pn_raw_connection_wake() is called on a raw connection that is attempting > to connect no further proactor events are generated and the proactor hangs. > Important observations: > * This only occurs {_}if there is no server available at the target address > for the connection{_}. If a server is present then the PN_RAW_CONNECTION_WAKE > and PN_RAW_CONNECTION_CONNECTED events arrive properly (in that order). > * The host address is "localhost" - using "127.0.0.1" or "::1" instead > works. localhost on my machine maps to both "127.0.0.1" and "::1" > * Extra bonus: if you move the call to pn_raw_connection_wake() to _before_ > the call to pn_proactor_raw_connect() a crash occurs > See attached reproducer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2699) Turn off proactor fdlimit test by default
Clifford Jansen created PROTON-2699: --- Summary: Turn off proactor fdlimit test by default Key: PROTON-2699 URL: https://issues.apache.org/jira/browse/PROTON-2699 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.38.0 Reporter: Clifford Jansen Assignee: Clifford Jansen Fix For: proton-c-0.39.0 It has had many tweaks over the years yet remains sensitive to changes in OS versions, Python versions, parallelism of the test, system resources... i.e. it is flaky. Keep it around but off by default. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2673) Proactor hangs if pn_raw_connection_wake() is called with outstanding connection attempt
[ https://issues.apache.org/jira/browse/PROTON-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17704485#comment-17704485 ] Clifford Jansen commented on PROTON-2673: - PN_RAW_CONNECTION_WAKE can now be the first event ahead of a successful connected event. The doc for pn_raw_connection_wake() has also been updated to clarify when it has defined results. The restrictions could be looser, along the lines of AMQP connections, but that would require extra locking unhelpful for the normal use case. If this is too restrictive for the application, this could be revisited. > Proactor hangs if pn_raw_connection_wake() is called with outstanding > connection attempt > > > Key: PROTON-2673 > URL: https://issues.apache.org/jira/browse/PROTON-2673 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.38.0, proton-c-0.39.0 >Reporter: Ken Giusti >Assignee: Clifford Jansen >Priority: Major > Attachments: raw_wake.c > > > If pn_raw_connection_wake() is called on a raw connection that is attempting > to connect no further proactor events are generated and the proactor hangs. > Important observations: > * This only occurs {_}if there is no server available at the target address > for the connection{_}. If a server is present then the PN_RAW_CONNECTION_WAKE > and PN_RAW_CONNECTION_CONNECTED events arrive properly (in that order). > * The host address is "localhost" - using "127.0.0.1" or "::1" instead > works. localhost on my machine maps to both "127.0.0.1" and "::1" > * Extra bonus: if you move the call to pn_raw_connection_wake() to _before_ > the call to pn_proactor_raw_connect() a crash occurs > See attached reproducer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2695) Epoll proactor raw connections hang on incomplete batches
Clifford Jansen created PROTON-2695: --- Summary: Epoll proactor raw connections hang on incomplete batches Key: PROTON-2695 URL: https://issues.apache.org/jira/browse/PROTON-2695 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.38.0 Reporter: Clifford Jansen Assignee: Clifford Jansen If an application returns a batch before draining all available events from it, the internal state machine may not have completed the steps needed to determine the correct polling events of interest, leaving the associated task in a hung state. This is particularly relevant for the Catch2 test harness using the proactor. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2658) Proton TLS library - buffer leak on cleanup
Clifford Jansen created PROTON-2658: --- Summary: Proton TLS library - buffer leak on cleanup Key: PROTON-2658 URL: https://issues.apache.org/jira/browse/PROTON-2658 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.38.0 Reporter: Clifford Jansen Assignee: Clifford Jansen Fix For: proton-c-0.39.0 pn_tls_stop() should make all staged buffers retrievable on subsequent buffer get operations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2643) SSL connection hanging
[ https://issues.apache.org/jira/browse/PROTON-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2643. - Fix Version/s: proton-c-0.39.0 Assignee: Clifford Jansen Resolution: Fixed > SSL connection hanging > -- > > Key: PROTON-2643 > URL: https://issues.apache.org/jira/browse/PROTON-2643 > Project: Qpid Proton > Issue Type: Bug >Affects Versions: proton-c-0.37.0 > Environment: Qpid-proton 0.37 with epoll proactor and openssl 1.0.2k > running on centos7 >Reporter: Fredrik Hallenberg >Assignee: Clifford Jansen >Priority: Major > Fix For: proton-c-0.39.0 > > Attachments: ssl-issue-3.zip > > > With a CA bundle of a certain size the SSL/TLS connection process hangs. This > is 100% repeatable. The process stops before reaching verification callback, > it seems there is an issue with reading from the BIO sockets. I can only > repeat it with certain CA bundles, it seems they have to contain >100 > certificates but I have not found an obvious pattern. It does happen with my > current system bundle (/etc/ssl/certs/ca-bundle.crt). > I enclose an example with appropriate keys and bundles, the code is based on > the cpp ssl example in the proton release. See the readme file on how to run > it. Basically it will build a proton server from the example code and connect > to it using openssl s_client. There is a good and a bad bundle included. The > good one has a few less certificates than the big one but is otherwise the > same. If using the bad bundle the connection process will stop after a few > ssl read/writes. With the good bundle it proceeds as expected. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2643) SSL connection hanging
[ https://issues.apache.org/jira/browse/PROTON-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638383#comment-17638383 ] Clifford Jansen commented on PROTON-2643: - This looks to me like an OpenSSL bug. The server CertificateRequest (constructed from the ca-bad.pem example) plus the rest of the server's first response is just a bit larger than 17K, which happens to be the buffer size of the BIO. There have been several bugs fixed over the years relating to hangs on the BIO, but I could not find an exact match to this case. It appears fixed in OpenSSL 1.1 and above, so perhaps it was fixed accidentally as part of some other BIO hang bug. One workaround is to trim the CA list to get the overall server's response below 17K (by removing unnecessary certs from the CA database). It is also possible that increasing the CA list with dummy entries might also work (since the CertificateRequest size can be up to 64K and there are presumably tests for that edge case). Another workaround is to have the Proton code poke the OpenSSL session instance during the handshake phase to get it to "notice" opportunities to replenish the BIO buffer. I would normally be reluctant to add code like this but it has tiny overhead and, purely by coincidence, makes the operation slightly more similar to the new Proton TLS library for raw connections. This may result in reducing other bug variations between the two. > SSL connection hanging > -- > > Key: PROTON-2643 > URL: https://issues.apache.org/jira/browse/PROTON-2643 > Project: Qpid Proton > Issue Type: Bug >Affects Versions: proton-c-0.37.0 > Environment: Qpid-proton 0.37 with epoll proactor and openssl 1.0.2k > running on centos7 >Reporter: Fredrik Hallenberg >Priority: Major > Attachments: ssl-issue-3.zip > > > With a CA bundle of a certain size the SSL/TLS connection process hangs. This > is 100% repeatable. The process stops before reaching verification callback, > it seems there is an issue with reading from the BIO sockets. I can only > repeat it with certain CA bundles, it seems they have to contain >100 > certificates but I have not found an obvious pattern. It does happen with my > current system bundle (/etc/ssl/certs/ca-bundle.crt). > I enclose an example with appropriate keys and bundles, the code is based on > the cpp ssl example in the proton release. See the readme file on how to run > it. Basically it will build a proton server from the example code and connect > to it using openssl s_client. There is a good and a bad bundle included. The > good one has a few less certificates than the big one but is otherwise the > same. If using the bad bundle the connection process will stop after a few > ssl read/writes. With the good bundle it proceeds as expected. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2647) Fix FLOW event processing in send-abort example.
[ https://issues.apache.org/jira/browse/PROTON-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2647. - Resolution: Fixed > Fix FLOW event processing in send-abort example. > > > Key: PROTON-2647 > URL: https://issues.apache.org/jira/browse/PROTON-2647 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.37.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Fix For: proton-c-0.38.0 > > > The current send-abort example program relies on a cadence of FLOW events, > some self generated and some originating from the peer. This cadence can be > disrupted by the timing of frames at each peer. They can also be disrupted > by additional self generated FLOW frames in the case of smaller > max-frame-size configurations which may be chunked between event batches. > The program can be made deterministic by not counting FLOW events but by > checking the actual state change that may be expected with a FLOW event. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2647) Fix FLOW event processing in send-abort example.
[ https://issues.apache.org/jira/browse/PROTON-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629629#comment-17629629 ] Clifford Jansen commented on PROTON-2647: - This bug went unnoticed until the default max frame size was recently changed. > Fix FLOW event processing in send-abort example. > > > Key: PROTON-2647 > URL: https://issues.apache.org/jira/browse/PROTON-2647 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.37.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Fix For: proton-c-0.38.0 > > > The current send-abort example program relies on a cadence of FLOW events, > some self generated and some originating from the peer. This cadence can be > disrupted by the timing of frames at each peer. They can also be disrupted > by additional self generated FLOW frames in the case of smaller > max-frame-size configurations which may be chunked between event batches. > The program can be made deterministic by not counting FLOW events but by > checking the actual state change that may be expected with a FLOW event. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2647) Fix FLOW event processing in send-abort example.
Clifford Jansen created PROTON-2647: --- Summary: Fix FLOW event processing in send-abort example. Key: PROTON-2647 URL: https://issues.apache.org/jira/browse/PROTON-2647 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.37.0 Reporter: Clifford Jansen Assignee: Clifford Jansen Fix For: proton-c-0.38.0 The current send-abort example program relies on a cadence of FLOW events, some self generated and some originating from the peer. This cadence can be disrupted by the timing of frames at each peer. They can also be disrupted by additional self generated FLOW frames in the case of smaller max-frame-size configurations which may be chunked between event batches. The program can be made deterministic by not counting FLOW events but by checking the actual state change that may be expected with a FLOW event. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2586) TLS OpenSSL library: incomplete decryption/encryption of staged buffers
[ https://issues.apache.org/jira/browse/PROTON-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2586. - Resolution: Fixed > TLS OpenSSL library: incomplete decryption/encryption of staged buffers > --- > > Key: PROTON-2586 > URL: https://issues.apache.org/jira/browse/PROTON-2586 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.37.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > > OpenSSL processes TLS records one at time. It does its conversion work in > buffers just larger than a maximum sized TLS record (16K). When processing > large sized input and output buffers in a single pn_tls_process() call, the > Proton TLS library has to loop inserting unprocessed data into the small > OpenSSL buffer and extract the encrypted/decrypted data into the output > buffer and free space for the next iteration. The code currently can exit > the loop prematurely. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2535) TLS library - false indication of user data in OpenSSL
[ https://issues.apache.org/jira/browse/PROTON-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2535. - Resolution: Fixed > TLS library - false indication of user data in OpenSSL > -- > > Key: PROTON-2535 > URL: https://issues.apache.org/jira/browse/PROTON-2535 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.37.0 > Environment: OpenSSL >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Fix For: proton-c-0.38.0 > > > pn_tls_need_decrypt_output_buffers can falsely indicate the availability of > user data. For example if there is a handshake failure, BIO_pending can > indicate the presence of bytes but BIO_read will return -1 and the > appropriate error. > An application may be fooled into providing a decrypt output buffer that > won't be immediately be returned after the next pn_tls_process() step, since > no bytes will be read into it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2471) Run raw connection examples during proton-c examples test
[ https://issues.apache.org/jira/browse/PROTON-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626967#comment-17626967 ] Clifford Jansen commented on PROTON-2471: - My best suggestion is to see if pn_raw_connection() returns a non NULL value. If this is insufficient, we may have to add an equivalent to pn_ssl_present( void ); > Run raw connection examples during proton-c examples test > - > > Key: PROTON-2471 > URL: https://issues.apache.org/jira/browse/PROTON-2471 > Project: Qpid Proton > Issue Type: Test > Components: examples, proton-c >Affects Versions: proton-c-0.36.0, proton-c-0.37.0 >Reporter: Jiri Daněk >Assignee: Jiri Daněk >Priority: Major > Fix For: proton-c-future > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2622) TLS OpenSSL library: ensure capacity values match given capacity
[ https://issues.apache.org/jira/browse/PROTON-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2622. - Resolution: Fixed > TLS OpenSSL library: ensure capacity values match given capacity > > > Key: PROTON-2622 > URL: https://issues.apache.org/jira/browse/PROTON-2622 > Project: Qpid Proton > Issue Type: Wish > Components: proton-c >Affects Versions: proton-c-0.38.0 >Reporter: Ken Giusti >Assignee: Clifford Jansen >Priority: Major > > pn_tls_get_encrypt/decrypt_input_buffer_capacity() unconditionally return > the number of empty buffer slots. > However pn_tls_give_encrypt/decrypt_input_buffers() checks the state of the > tls session and can take zero buffers even though get capacity returned > 0. > In this case the application will have to "unwind" any buffer > allocation/setup work it did expecting there was capacity available. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2642) add tests for buffer capacity
Clifford Jansen created PROTON-2642: --- Summary: add tests for buffer capacity Key: PROTON-2642 URL: https://issues.apache.org/jira/browse/PROTON-2642 Project: Qpid Proton Issue Type: Improvement Components: proton-c Affects Versions: proton-c-0.37.0 Reporter: Clifford Jansen Assignee: Clifford Jansen Add test for correct buffer capacity, specifically for https://issues.apache.org/jira/browse/PROTON-2622 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2641) use consistent socket io cals in epoll proactor
[ https://issues.apache.org/jira/browse/PROTON-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2641. - Resolution: Fixed actual fix is in 93960f1e2129cf98200bdb2ab31e9ad868f71f61 > use consistent socket io cals in epoll proactor > --- > > Key: PROTON-2641 > URL: https://issues.apache.org/jira/browse/PROTON-2641 > Project: Qpid Proton > Issue Type: Improvement > Components: proton-c >Affects Versions: proton-c-0.37.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Minor > Fix For: proton-c-0.38.0 > > > Epoll proactor currently uses send/read for IO. For consistency it should > use write/read or send/recv. The latter allows the kernel to skip code > handling the generic to specific transition and is the more performant option > (even if rarely measurable). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2641) use consistent socket io cals in epoll proactor
Clifford Jansen created PROTON-2641: --- Summary: use consistent socket io cals in epoll proactor Key: PROTON-2641 URL: https://issues.apache.org/jira/browse/PROTON-2641 Project: Qpid Proton Issue Type: Improvement Components: proton-c Affects Versions: proton-c-0.37.0 Reporter: Clifford Jansen Assignee: Clifford Jansen Fix For: proton-c-0.38.0 Epoll proactor currently uses send/read for IO. For consistency it should use write/read or send/recv. The latter allows the kernel to skip code handling the generic to specific transition and is the more performant option (even if rarely measurable). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2640) Set a reasonable default maximum frame size
[ https://issues.apache.org/jira/browse/PROTON-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2640. - Resolution: Fixed > Set a reasonable default maximum frame size > --- > > Key: PROTON-2640 > URL: https://issues.apache.org/jira/browse/PROTON-2640 > Project: Qpid Proton > Issue Type: Improvement > Components: cpp-binding, proton-c, python-binding >Affects Versions: proton-c-0.38.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Fix For: proton-c-0.38.0 > > > The default is currently MAXINT. > > Instrumenting using quiver shows 32k is a reasonable tradeoff of reduced > latency between transmissions and additional byte overhead for large messages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2640) Set a reasonable default maximum frame size
Clifford Jansen created PROTON-2640: --- Summary: Set a reasonable default maximum frame size Key: PROTON-2640 URL: https://issues.apache.org/jira/browse/PROTON-2640 Project: Qpid Proton Issue Type: Improvement Components: cpp-binding, proton-c, python-binding Affects Versions: proton-c-0.38.0 Reporter: Clifford Jansen Assignee: Clifford Jansen Fix For: proton-c-0.38.0 The default is currently MAXINT. Instrumenting using quiver shows 32k is a reasonable tradeoff of reduced latency between transmissions and additional byte overhead for large messages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2639) write flush capability for libuv and Windows
[ https://issues.apache.org/jira/browse/PROTON-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626701#comment-17626701 ] Clifford Jansen commented on PROTON-2639: - See https://issues.apache.org/jira/browse/PROTON-2633 > write flush capability for libuv and Windows > > > Key: PROTON-2639 > URL: https://issues.apache.org/jira/browse/PROTON-2639 > Project: Qpid Proton > Issue Type: Improvement > Components: proton-c >Affects Versions: proton-c-0.38.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Minor > > There is a version implemented for the epoll proactor. Track here the pending > work for the other proactors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2633) Proactor: allow early writes to reduce latency
[ https://issues.apache.org/jira/browse/PROTON-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2633. - Resolution: Fixed > Proactor: allow early writes to reduce latency > -- > > Key: PROTON-2633 > URL: https://issues.apache.org/jira/browse/PROTON-2633 > Project: Qpid Proton > Issue Type: Improvement > Components: proton-c >Affects Versions: proton-c-0.37.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > > A new API call to instruct the proactor implementation to extract pending > output from the Proton engine and immediately deliver what it can to the > operation system for transmission to peer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2639) write flush capability for libuv and Windows
Clifford Jansen created PROTON-2639: --- Summary: write flush capability for libuv and Windows Key: PROTON-2639 URL: https://issues.apache.org/jira/browse/PROTON-2639 Project: Qpid Proton Issue Type: Improvement Components: proton-c Affects Versions: proton-c-0.38.0 Reporter: Clifford Jansen Assignee: Clifford Jansen There is a version implemented for the epoll proactor. Track here the pending work for the other proactors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2633) Proactor: allow early writes to reduce latency
Clifford Jansen created PROTON-2633: --- Summary: Proactor: allow early writes to reduce latency Key: PROTON-2633 URL: https://issues.apache.org/jira/browse/PROTON-2633 Project: Qpid Proton Issue Type: Improvement Components: proton-c Affects Versions: proton-c-0.37.0 Reporter: Clifford Jansen Assignee: Clifford Jansen A new API call to instruct the proactor implementation to extract pending output from the Proton engine and immediately deliver what it can to the operation system for transmission to peer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2613) TLS OpenSSL library: write channel not fully configured.
[ https://issues.apache.org/jira/browse/PROTON-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2613. - Fix Version/s: proton-c-0.38.0 Resolution: Fixed > TLS OpenSSL library: write channel not fully configured. > > > Key: PROTON-2613 > URL: https://issues.apache.org/jira/browse/PROTON-2613 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.37.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Fix For: proton-c-0.38.0 > > > The library code assumes that write operations provide more detail on partial > writes than just "try again later". There is a configuration option that > makes the low level SSL write operations more like BIO and Posix write > semantics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2612) TLS OpenSSL library: uninitialized raw buffer size for output buffers
[ https://issues.apache.org/jira/browse/PROTON-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2612. - Fix Version/s: proton-c-0.38.0 Resolution: Fixed > TLS OpenSSL library: uninitialized raw buffer size for output buffers > - > > Key: PROTON-2612 > URL: https://issues.apache.org/jira/browse/PROTON-2612 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.37.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Fix For: proton-c-0.38.0 > > > For TLS library output buffers (used for reading into), the size must be set > to zero regardless of its value when provided by the application... but is > not. This prevents the full capacity of the buffers to be used. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2613) TLS OpenSSL library: write channel not fully configured.
Clifford Jansen created PROTON-2613: --- Summary: TLS OpenSSL library: write channel not fully configured. Key: PROTON-2613 URL: https://issues.apache.org/jira/browse/PROTON-2613 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.37.0 Reporter: Clifford Jansen Assignee: Clifford Jansen The library code assumes that write operations provide more detail on partial writes than just "try again later". There is a configuration option that makes the low level SSL write operations more like BIO and Posix write semantics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2612) TLS OpenSSL library: uninitialized raw buffer size for output buffers
Clifford Jansen created PROTON-2612: --- Summary: TLS OpenSSL library: uninitialized raw buffer size for output buffers Key: PROTON-2612 URL: https://issues.apache.org/jira/browse/PROTON-2612 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.37.0 Reporter: Clifford Jansen Assignee: Clifford Jansen For TLS library output buffers (used for reading into), the size must be set to zero regardless of its value when provided by the application... but is not. This prevents the full capacity of the buffers to be used. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2586) TLS OpenSSL library: incomplete decryption/encryption of staged buffers
Clifford Jansen created PROTON-2586: --- Summary: TLS OpenSSL library: incomplete decryption/encryption of staged buffers Key: PROTON-2586 URL: https://issues.apache.org/jira/browse/PROTON-2586 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.37.0 Reporter: Clifford Jansen Assignee: Clifford Jansen OpenSSL processes TLS records one at time. It does its conversion work in buffers just larger than a maximum sized TLS record (16K). When processing large sized input and output buffers in a single pn_tls_process() call, the Proton TLS library has to loop inserting unprocessed data into the small OpenSSL buffer and extract the encrypted/decrypted data into the output buffer and free space for the next iteration. The code currently can exit the loop prematurely. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2543) Crash in epoll.c resched_pop_front
[ https://issues.apache.org/jira/browse/PROTON-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545037#comment-17545037 ] Clifford Jansen commented on PROTON-2543: - Thank you for the update. I will keep this open a bit longer and see if I can't get lucky on reproducing it myself with a few tweaks to my existing soak tests. If you can answer a subset of the questions I asked earlier, whatever is quick and easy, that may help me zero in on the bug. Thanks. > Crash in epoll.c resched_pop_front > -- > > Key: PROTON-2543 > URL: https://issues.apache.org/jira/browse/PROTON-2543 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Reporter: Fredrik Hallenberg >Assignee: Clifford Jansen >Priority: Major > Attachments: qpid-epoll-crash.patch > > > During stress testing it is fairly easy to reproduce a segfault in > resched_pop_front. Using gdb it is easy to see that polled_resched_front can > be zero when entering this function which causes the value to wrap and then a > crash in later calls. > polled_resched_front is not checked when calling this function in one > instance, the trivial fix to check this value is seen in the attached patch > seems to work. > Tested with Qpid Proton C++ 0.37. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2543) Crash in epoll.c resched_pop_front
[ https://issues.apache.org/jira/browse/PROTON-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544161#comment-17544161 ] Clifford Jansen commented on PROTON-2543: - I don't know if you have had any time to try to gather further information about the crash that you are seeing. It would certainly help me to be of greater assistance if you could provide more details about the environment where you see the crash: * cpu hardware type and model * OS and version * compiler (gcc/clang/other) * Number of concurrent threads servicing proactor event batches * Number of active proactors in failing process (usually 1) * Running on bare hardware, VM, container * crash occurs during main operation or on shutdown (or both) * Types of connections and listeners ** All outgoing connections ** All incoming connections and listeners ** Mix of both (describe) ** Mainly/only pn_raw_connection_t or pn_connection_t connections. ** connections are over a network/virtual network/loopback If you are having difficulty reproducing the crash in debug mode, perhaps I could provide an instrumented version of epoll.c that could give us recent proactor history and help debug the problem. Also, if you could provide a debugger dump of the failing pn_proactor_t at time of one of your crashes, that might help me think of other things to explore. Thank you for any information you can provide. > Crash in epoll.c resched_pop_front > -- > > Key: PROTON-2543 > URL: https://issues.apache.org/jira/browse/PROTON-2543 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Reporter: Fredrik Hallenberg >Assignee: Clifford Jansen >Priority: Major > Attachments: qpid-epoll-crash.patch > > > During stress testing it is fairly easy to reproduce a segfault in > resched_pop_front. Using gdb it is easy to see that polled_resched_front can > be zero when entering this function which causes the value to wrap and then a > crash in later calls. > polled_resched_front is not checked when calling this function in one > instance, the trivial fix to check this value is seen in the attached patch > seems to work. > Tested with Qpid Proton C++ 0.37. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2543) Crash in epoll.c resched_pop_front
[ https://issues.apache.org/jira/browse/PROTON-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541594#comment-17541594 ] Clifford Jansen commented on PROTON-2543: - https://rr-project.org/ The related package name is "rr" on Fedora and Ubuntu. If you can catch the failure in rr, you can reproduce exactly the run that failed and multi threaded bugs can be debugged more easily (you can move backwards and forwards in time in the debugger). However, you may find that your reproducer fails easily outside of rr but stubbornly refuses to do so with rr in the mix. > Crash in epoll.c resched_pop_front > -- > > Key: PROTON-2543 > URL: https://issues.apache.org/jira/browse/PROTON-2543 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Reporter: Fredrik Hallenberg >Assignee: Clifford Jansen >Priority: Major > Attachments: qpid-epoll-crash.patch > > > During stress testing it is fairly easy to reproduce a segfault in > resched_pop_front. Using gdb it is easy to see that polled_resched_front can > be zero when entering this function which causes the value to wrap and then a > crash in later calls. > polled_resched_front is not checked when calling this function in one > instance, the trivial fix to check this value is seen in the attached patch > seems to work. > Tested with Qpid Proton C++ 0.37. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2543) Crash in epoll.c resched_pop_front
[ https://issues.apache.org/jira/browse/PROTON-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541496#comment-17541496 ] Clifford Jansen commented on PROTON-2543: - Thank you for the bug report and suggested patch. Unfortunately your suggested fix targets the symptom you are seeing but not the underlying problem. It should never be possible that p->resched_cutoff is non-null while p->polled_resched_count is zero, so your code should have no effect. Yet we know it does. The patch allows the proactor to keep running even though one of its critical scheduling lists is in an undefined state. This could lead to crashes or hangs even further removed from the actual problem. Have you tried running your reproducer with a "Debug" CMake build? There are several asserts in the code that might catch the broken list earlier or point us closer to a good place to look. Alternatively, can your reproducer be pared down and shared in this JIRA? Otherwise, is it possible for you to trigger the bug using rr? In the crash analysis is should be possible to check for the point at which the list looses its integrity from the most recent poller_do_epoll() to a subsequent resched_pop_front(). > Crash in epoll.c resched_pop_front > -- > > Key: PROTON-2543 > URL: https://issues.apache.org/jira/browse/PROTON-2543 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Reporter: Fredrik Hallenberg >Assignee: Clifford Jansen >Priority: Major > Attachments: qpid-epoll-crash.patch > > > During stress testing it is fairly easy to reproduce a segfault in > resched_pop_front. Using gdb it is easy to see that polled_resched_front can > be zero when entering this function which causes the value to wrap and then a > crash in later calls. > polled_resched_front is not checked when calling this function in one > instance, the trivial fix to check this value is seen in the attached patch > seems to work. > Tested with Qpid Proton C++ 0.37. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-1870) better logging for ssl
[ https://issues.apache.org/jira/browse/PROTON-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536142#comment-17536142 ] Clifford Jansen commented on PROTON-1870: - Commit dfebbe8c provides TLS error alert feedback to peer as provided by default in the OpenSSL library. It doesn't necessarily address the lack of detail of error messages on either side. > better logging for ssl > -- > > Key: PROTON-1870 > URL: https://issues.apache.org/jira/browse/PROTON-1870 > Project: Qpid Proton > Issue Type: Improvement > Components: python-binding >Affects Versions: proton-0.9.1, proton-c-0.31.0 >Reporter: Gordon Sim >Priority: Major > Labels: logging, tls, usability > > Would be nice to have better logging for ssl connections, particularly where > they fail, e.g. the sni used, the ca the peer cert is signed with etc -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2535) TLS library - false indication of user data in OpenSSL
Clifford Jansen created PROTON-2535: --- Summary: TLS library - false indication of user data in OpenSSL Key: PROTON-2535 URL: https://issues.apache.org/jira/browse/PROTON-2535 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.37.0 Environment: OpenSSL Reporter: Clifford Jansen Assignee: Clifford Jansen pn_tls_need_decrypt_output_buffers can falsely indicate the availability of user data. For example if there is a handshake failure, BIO_pending can indicate the presence of bytes but BIO_read will return -1 and the appropriate error. An application may be fooled into providing a decrypt output buffer that won't be immediately be returned after the next pn_tls_process() step, since no bytes will be read into it. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2522) Intermittent C fdlimit test failures
[ https://issues.apache.org/jira/browse/PROTON-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509626#comment-17509626 ] Clifford Jansen commented on PROTON-2522: - Preliminary investigation indicates that increasing the sleep time between steps in the test makes the error go away. A more robust test mechanism is obviously preferable to just increasing pause times and making the tests run slower. > Intermittent C fdlimit test failures > > > Key: PROTON-2522 > URL: https://issues.apache.org/jira/browse/PROTON-2522 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.36.0, proton-c-0.37.0 > Environment: Specifics unknown. > On some hardware, fails with Python 3.10 but not 3.9. > Also seen on other harware with Python 3.6. > But also seen >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > > The CTest: c-fdlimit-tests fails in some environments with output containing: > > /usr/lib64/python3.10/subprocess.py:1067: ResourceWarning: subprocess 27520 > is still running > > and > > self.assertNotEqual(sender.poll(), 0) > AssertionError: 0 == 0 > > First reported by Roddie Kieley and Gordon Sim. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2522) Intermittent C fdlimit test failures
Clifford Jansen created PROTON-2522: --- Summary: Intermittent C fdlimit test failures Key: PROTON-2522 URL: https://issues.apache.org/jira/browse/PROTON-2522 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.36.0, proton-c-0.37.0 Environment: Specifics unknown. On some hardware, fails with Python 3.10 but not 3.9. Also seen on other harware with Python 3.6. But also seen Reporter: Clifford Jansen Assignee: Clifford Jansen The CTest: c-fdlimit-tests fails in some environments with output containing: /usr/lib64/python3.10/subprocess.py:1067: ResourceWarning: subprocess 27520 is still running and self.assertNotEqual(sender.poll(), 0) AssertionError: 0 == 0 First reported by Roddie Kieley and Gordon Sim. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2519) TLSlibrary: null pointer reference
[ https://issues.apache.org/jira/browse/PROTON-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2519. - Resolution: Fixed > TLSlibrary: null pointer reference > -- > > Key: PROTON-2519 > URL: https://issues.apache.org/jira/browse/PROTON-2519 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.37.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > > Thanks to Coverity: > *** CID 376597: Null pointer dereferences (FORWARD_NULL) > /qpid-proton/c/src/tls/openssl.c: 2283 in pn_tls_config_set_alpn_protocols() > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2519) TLSlibrary: null pointer reference
Clifford Jansen created PROTON-2519: --- Summary: TLSlibrary: null pointer reference Key: PROTON-2519 URL: https://issues.apache.org/jira/browse/PROTON-2519 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.37.0 Reporter: Clifford Jansen Assignee: Clifford Jansen Thanks to Coverity: *** CID 376597: Null pointer dereferences (FORWARD_NULL) /qpid-proton/c/src/tls/openssl.c: 2283 in pn_tls_config_set_alpn_protocols() -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2512) Proton raw TLS library does not build on aarch64 Ubuntu in Travis CI
[ https://issues.apache.org/jira/browse/PROTON-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2512. - Resolution: Fixed > Proton raw TLS library does not build on aarch64 Ubuntu in Travis CI > > > Key: PROTON-2512 > URL: https://issues.apache.org/jira/browse/PROTON-2512 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.37.0 >Reporter: Jiri Daněk >Priority: Major > > https://app.travis-ci.com/github/jiridanek/skupper-router/jobs/562112202#L605 > {noformat} > cmake .. > -DCMAKE_INSTALL_PREFIX=/home/travis/build/jiridanek/skupper-router/install > -DCMAKE_BUILD_TYPE=RelWithDebInfo -DBUILD_BINDINGS=python -DBUILD_TLS=ON > {noformat} > [...] > {noformat} > [ 8%] Building C object > c/CMakeFiles/qpid-proton-proactor-objects.dir/src/proactor/epoll_raw_connection.c.o > /home/travis/build/jiridanek/skupper-router/qpid-proton/c/src/tls/openssl.c:1465:22: > error: unused function 'size_min' [-Werror,-Wunused-function] > static inline size_t size_min(uint32_t a, uint32_t b) { > ^ > 1 error generated. > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2517) The new C codec can misinterpret pn_data_t values resulting in unintended wire data.
Clifford Jansen created PROTON-2517: --- Summary: The new C codec can misinterpret pn_data_t values resulting in unintended wire data. Key: PROTON-2517 URL: https://issues.apache.org/jira/browse/PROTON-2517 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.37.0 Reporter: Clifford Jansen Assignee: Clifford Jansen See the C++ frame trace from https://issues.redhat.com/browse/ENTMQCL-3278 The zero length array is printed instead of a null because the test in emit_multiple() from emitters.h fails to set the current node of the pn_data_t to the first node. The test if (pn_data_type(data) == PN_ARRAY) { //... fails and the array processing logic is bypassed, including the lines switch (pn_data_get_array(data)) { case 0: pni_emitter_writef8(emitter, PNE_NULL); -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2509) python-integration-test errors with tsan and asan runtime checks
Clifford Jansen created PROTON-2509: --- Summary: python-integration-test errors with tsan and asan runtime checks Key: PROTON-2509 URL: https://issues.apache.org/jira/browse/PROTON-2509 Project: Qpid Proton Issue Type: Bug Components: python-binding Affects Versions: proton-c-0.36.0 Environment: Fedora release 34. Reporter: Clifford Jansen Fix For: proton-c-0.38.0 build with -DRUNTIME_CHECK=asan (or tsan) and test with ctest -V -R python-integration-test -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2484) epoll proactor memory use after free
[ https://issues.apache.org/jira/browse/PROTON-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2484. - Resolution: Fixed > epoll proactor memory use after free > > > Key: PROTON-2484 > URL: https://issues.apache.org/jira/browse/PROTON-2484 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.36.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Fix For: proton-c-0.37.0 > > > ASAN correctly notes use of task memory after task deletion. Notably using > the task's pointer value for the proactor. This value can be saved at a time > the task is known to still exist. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2483) TSAN reported potential deadlock in epoll proactor when run via Qpid Dispatch router.
[ https://issues.apache.org/jira/browse/PROTON-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2483. - Resolution: Fixed > TSAN reported potential deadlock in epoll proactor when run via Qpid Dispatch > router. > - > > Key: PROTON-2483 > URL: https://issues.apache.org/jira/browse/PROTON-2483 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.36.0 > Environment: linux epoll >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Fix For: proton-c-0.37.0 > > Attachments: tsan_out.txt > > > The traces are incomplete but the 4 way thread tangle can be inferred as > follows: > A: pn_proactor_set_timeout() (p->task.mutex + tm->task.mutex) > B: pni_timer_manager_process() (tm->task.mutex + tm->deletion_mutex) > C: pni_connection_timeout() (tm->deletion_mutex + pc1->task.mutex) > D: proactor_remove() (pc1->task.mutex + p->task.mutex) > While this particular trace is a false positive (D occurs after all other > threads have been joined and there are no competing threads to complete the > circle), the lock ordering is clearly asking for eventual trouble. > The proactor set_timeout and cancel_timeout API calls do not need to hold the > proactor task lock while interacting with the timer manager, but do so as a > convenience to prevent collisions between simultaneous sets/cancels. A > separate lock can achieve that purpose, stopping A from participating in the > potential deadlock. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2362) c-threaderciser timed out on 32-core machine.
[ https://issues.apache.org/jira/browse/PROTON-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2362. - Fix Version/s: proton-c-0.37.0 Resolution: Fixed > c-threaderciser timed out on 32-core machine. > - > > Key: PROTON-2362 > URL: https://issues.apache.org/jira/browse/PROTON-2362 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.33.0, proton-c-0.35.0, proton-c-0.34.0, > proton-c-0.36.0, proton-c-0.37.0 >Reporter: michael goulish >Assignee: Clifford Jansen >Priority: Major > Fix For: proton-c-0.37.0 > > Attachments: tsan_tr1.txt, tsan_tr2.txt, tsan_tr3.txt > > > Using recent master – maybe 3 days old or so – I just ran Proton's ctest, > after turning on THREADERCISER. I ran it on a box with 32 physical cores, 64 > threads. > > Test number 6 – c-threaderciser – failed with timeout after 1500 seconds. > ( 1.5e18 femtoseconds. ) > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2497) General TLS library for Proton C
[ https://issues.apache.org/jira/browse/PROTON-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2497. - Resolution: Implemented Initial implementation. API is unsettled. Does not build by default. CMake flag -DBUILD_TLS=ON is required to build it. > General TLS library for Proton C > > > Key: PROTON-2497 > URL: https://issues.apache.org/jira/browse/PROTON-2497 > Project: Qpid Proton > Issue Type: New Feature > Components: proton-c >Affects Versions: proton-c-0.36.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Fix For: proton-c-0.37.0 > > > The current TLS functionality for Proton (see "c/include/proton/ssl.h") is > tightly coupled to AMQP connections and does not allow TLS sessions for > arbitrary content including Proton raw connections. > A more generalized API is proposed that works with arrays of pn_raw_buffer_t > content. As it matures it could serve as the TLS engine for AMQP connections > as well. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2500) Proactor memory leak on aborted shutdown
Clifford Jansen created PROTON-2500: --- Summary: Proactor memory leak on aborted shutdown Key: PROTON-2500 URL: https://issues.apache.org/jira/browse/PROTON-2500 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.36.0 Environment: Linux epoll: yes. libuv: no. Windows IOCP: TBD. Reporter: Clifford Jansen Assignee: Clifford Jansen Attachments: ptest.diff If pn_proactor_free is called while pending closes from a pn_proactor_disconnect are pending, some reference counts remain positive and memory leaks occur. See test case. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2497) General TLS library for Proton C
Clifford Jansen created PROTON-2497: --- Summary: General TLS library for Proton C Key: PROTON-2497 URL: https://issues.apache.org/jira/browse/PROTON-2497 Project: Qpid Proton Issue Type: New Feature Components: proton-c Affects Versions: proton-c-0.36.0 Reporter: Clifford Jansen Assignee: Clifford Jansen The current TLS functionality for Proton (see "c/include/proton/ssl.h") is tightly coupled to AMQP connections and does not allow TLS sessions for arbitrary content including Proton raw connections. A more generalized API is proposed that works with arrays of pn_raw_buffer_t content. As it matures it could serve as the TLS engine for AMQP connections as well. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2484) epoll proactor memory use after free
Clifford Jansen created PROTON-2484: --- Summary: epoll proactor memory use after free Key: PROTON-2484 URL: https://issues.apache.org/jira/browse/PROTON-2484 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.36.0 Reporter: Clifford Jansen Assignee: Clifford Jansen ASAN correctly notes use of task memory after task deletion. Notably using the task's pointer value for the proactor. This value can be saved at a time the task is known to still exist. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (PROTON-2483) TSAN reported potential deadlock in epoll proactor when run via Qpid Dispatch router.
[ https://issues.apache.org/jira/browse/PROTON-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen updated PROTON-2483: Attachment: tsan_out.txt > TSAN reported potential deadlock in epoll proactor when run via Qpid Dispatch > router. > - > > Key: PROTON-2483 > URL: https://issues.apache.org/jira/browse/PROTON-2483 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.36.0 > Environment: linux epoll >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Attachments: tsan_out.txt > > > The traces are incomplete but the 4 way thread tangle can be inferred as > follows: > A: pn_proactor_set_timeout() (p->task.mutex + tm->task.mutex) > B: pni_timer_manager_process() (tm->task.mutex + tm->deletion_mutex) > C: pni_connection_timeout() (tm->deletion_mutex + pc1->task.mutex) > D: proactor_remove() (pc1->task.mutex + p->task.mutex) > While this particular trace is a false positive (D occurs after all other > threads have been joined and there are no competing threads to complete the > circle), the lock ordering is clearly asking for eventual trouble. > The proactor set_timeout and cancel_timeout API calls do not need to hold the > proactor task lock while interacting with the timer manager, but do so as a > convenience to prevent collisions between simultaneous sets/cancels. A > separate lock can achieve that purpose, stopping A from participating in the > potential deadlock. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2483) TSAN reported potential deadlock in epoll proactor when run via Qpid Dispatch router.
Clifford Jansen created PROTON-2483: --- Summary: TSAN reported potential deadlock in epoll proactor when run via Qpid Dispatch router. Key: PROTON-2483 URL: https://issues.apache.org/jira/browse/PROTON-2483 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.36.0 Environment: linux epoll Reporter: Clifford Jansen Assignee: Clifford Jansen The traces are incomplete but the 4 way thread tangle can be inferred as follows: A: pn_proactor_set_timeout() (p->task.mutex + tm->task.mutex) B: pni_timer_manager_process() (tm->task.mutex + tm->deletion_mutex) C: pni_connection_timeout() (tm->deletion_mutex + pc1->task.mutex) D: proactor_remove() (pc1->task.mutex + p->task.mutex) While this particular trace is a false positive (D occurs after all other threads have been joined and there are no competing threads to complete the circle), the lock ordering is clearly asking for eventual trouble. The proactor set_timeout and cancel_timeout API calls do not need to hold the proactor task lock while interacting with the timer manager, but do so as a convenience to prevent collisions between simultaneous sets/cancels. A separate lock can achieve that purpose, stopping A from participating in the potential deadlock. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-2436) TSAN race in epoll.c post_event with raw connection
[ https://issues.apache.org/jira/browse/PROTON-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen resolved PROTON-2436. - Fix Version/s: proton-c-0.37.0 Resolution: Fixed Make ownership of scheduled io events compared to task-processed io events consistent between AMQP connections, listeners, and raw connections. > TSAN race in epoll.c post_event with raw connection > --- > > Key: PROTON-2436 > URL: https://issues.apache.org/jira/browse/PROTON-2436 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.36.0 >Reporter: Ken Giusti >Assignee: Clifford Jansen >Priority: Major > Fix For: proton-c-0.37.0 > > > today's github CI run of dispatch+proton main kicked up a tsan error in > proton I've never seen before: > https://github.com/apache/qpid-dispatch/runs/3700836319?check_suite_focus=true#step:27:2142 > > {noformat} > 70: WARNING: ThreadSanitizer: data race (pid=3075) > 70: Write of size 4 at 0x7b68dd38 by main thread (mutexes: write M257): > 70: #0 post_event > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-proton/c/src/proactor/epoll.c:2304 > (libqpid-proton-proactor.so.1+0x14108) > 70: #1 poller_do_epoll > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-proton/c/src/proactor/epoll.c:2534 > (libqpid-proton-proactor.so.1+0x14108) > 70: #2 next_event_batch > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-proton/c/src/proactor/epoll.c:2438 > (libqpid-proton-proactor.so.1+0x14108) > 70: #3 pn_proactor_wait > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-proton/c/src/proactor/epoll.c:2650 > (libqpid-proton-proactor.so.1+0x14622) > 70: #4 thread_run > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/server.c:1118 > (qdrouterd+0x4d83a9) > 70: #5 qd_server_run > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/server.c:1527 > (qdrouterd+0x4d904c) > 70: #6 main_process > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/router/src/main.c:115 > (qdrouterd+0x426cdc) > 70: #7 main > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/router/src/main.c:369 > (qdrouterd+0x42623c) > 70: > 70: Previous read of size 4 at 0x7b68dd38 by thread T3 (mutexes: write > M499): > 70: #0 pni_raw_connection_process > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-proton/c/src/proactor/epoll_raw_connection.c:355 > (libqpid-proton-proactor.so.1+0x108ec) > 70: #1 process > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-proton/c/src/proactor/epoll.c:2230 > (libqpid-proton-proactor.so.1+0x108ec) > 70: #2 next_event_batch > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-proton/c/src/proactor/epoll.c:2419 > (libqpid-proton-proactor.so.1+0x108ec) > 70: #3 pn_proactor_wait > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-proton/c/src/proactor/epoll.c:2650 > (libqpid-proton-proactor.so.1+0x14622) > 70: #4 thread_run > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/server.c:1118 > (qdrouterd+0x4d83a9) > 70: #5 _thread_init > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/posix/threading.c:172 > (qdrouterd+0x47fe2d) > 70: > 70: Location is heap block of size 1536 at 0x7b68d800 allocated by main > thread: > 70: #0 calloc (libtsan.so.0+0x32b3e) > 70: #1 pn_raw_connection > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-proton/c/src/proactor/epoll_raw_connection.c:168 > (libqpid-proton-proactor.so.1+0xdf82) > 70: #2 _do_reconnect > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/adaptors/http1/http1_server.c:451 > (qdrouterd+0x43da47) > 70: #3 qd_timer_visit > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/timer.c:316 > (qdrouterd+0x4daddf) > 70: #4 handle > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/server.c:1018 > (qdrouterd+0x4d60d6) > 70: #5 thread_run > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/server.c:1133 > (qdrouterd+0x4d84e7) > 70: #6 qd_server_run > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/server.c:1527 > (qdrouterd+0x4d904c) > 70: #7 main_process > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/router/src/main.c:115 > (qdrouterd+0x426cdc) > 70: #8 main > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/router/src/main.c:369 > (qdrouterd+0x42623c) > 70: > 70: Mutex M257 (0x7b640003aa20) created at: > 70: #0 pthread_mutex_init (libtsan.so.0+0x49603) > 70: #1 pmutex_init > /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-proton/c/src/proactor/epoll-internal.h:323 > (libqpid-proton-proactor.so.1+0xd52c) > 70: #2 pn_proactor > /home
[jira] [Commented] (PROTON-2362) c-threaderciser timed out on 32-core machine.
[ https://issues.apache.org/jira/browse/PROTON-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447116#comment-17447116 ] Clifford Jansen commented on PROTON-2362: - I have a less thunderous 8 core (16 thread) machine. If I run the threaderciser under tsan with ambitious pthread counts (> 100), I can provoke three separate thread traces with helpful debugging. tsan_trX.txt files attached. > c-threaderciser timed out on 32-core machine. > - > > Key: PROTON-2362 > URL: https://issues.apache.org/jira/browse/PROTON-2362 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.33.0, proton-c-0.34.0 >Reporter: michael goulish >Priority: Major > Attachments: tsan_tr1.txt, tsan_tr2.txt, tsan_tr3.txt > > > Using recent master – maybe 3 days old or so – I just ran Proton's ctest, > after turning on THREADERCISER. I ran it on a box with 32 physical cores, 64 > threads. > > Test number 6 – c-threaderciser – failed with timeout after 1500 seconds. > ( 1.5e18 femtoseconds. ) > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (PROTON-2362) c-threaderciser timed out on 32-core machine.
[ https://issues.apache.org/jira/browse/PROTON-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen updated PROTON-2362: Attachment: tsan_tr3.txt tsan_tr2.txt tsan_tr1.txt > c-threaderciser timed out on 32-core machine. > - > > Key: PROTON-2362 > URL: https://issues.apache.org/jira/browse/PROTON-2362 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.33.0, proton-c-0.34.0 >Reporter: michael goulish >Priority: Major > Attachments: tsan_tr1.txt, tsan_tr2.txt, tsan_tr3.txt > > > Using recent master – maybe 3 days old or so – I just ran Proton's ctest, > after turning on THREADERCISER. I ran it on a box with 32 physical cores, 64 > threads. > > Test number 6 – c-threaderciser – failed with timeout after 1500 seconds. > ( 1.5e18 femtoseconds. ) > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Closed] (PROTON-2432) Proton crashes because of a concurrency failure in collector->pool
[ https://issues.apache.org/jira/browse/PROTON-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen closed PROTON-2432. --- Resolution: Not A Bug > Proton crashes because of a concurrency failure in collector->pool > -- > > Key: PROTON-2432 > URL: https://issues.apache.org/jira/browse/PROTON-2432 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.32.0 > Environment: RHEL 7 >Reporter: Jesse Hulsizer >Priority: Major > Attachments: proton-2432.patch > > > While running our application tests, our application crashes with many > different backtraces that look similar to this... > {noformat} > #0 0x in ?? () > #1 0x7fc777579198 in pn_class_incref () from > /usr/lib64/libqpid-proton.so.11 > #2 0x7fc777587d8a in pn_collector_put () from > /usr/lib64/libqpid-proton.so.11 > #3 0x7fc7775887ea in ?? () from /usr/lib64/libqpid-proton.so.11 > #4 0x7fc777588c7b in pn_transport_pending () from > /usr/lib64/libqpid-proton.so.11 > #5 0x7fc777588d9e in pn_transport_pop () from > /usr/lib64/libqpid-proton.so.11 > #6 0x7fc777599298 in ?? () from /usr/lib64/libqpid-proton.so.11 > #7 0x7fc77759a784 in ?? () from /usr/lib64/libqpid-proton.so.11 > #8 0x7fc7773236f0 in proton::container::impl::thread() () from > /usr/lib64/libqpid-proton-cpp.so.12 > #9 0x7fc7760b2470 in ?? () from /usr/lib64/libstdc++.so.6 > #10 0x7fc776309aa1 in start_thread () from /lib64/libpthread.so.0 > #11 0x7fc7758b6bdd in clone () from /lib64/libc.so.6{noformat} > Using gdb to probe one of the backtraces show that the collector->pool size > is -1... (seen here as 18446744073709551615) > {noformat} > (gdb) p *collector $1 = \{pool = 0x7fa7182de180, head = 0x7fa7182de250, tail > = 0x7fa7182b8b90, prev = 0x7fa7182ea010, freed = false} > (gdb) p collector->pool $2 = (pn_list_t *) 0x7fa7182de180 (gdb) p > *collector->pool $3 = \{clazz = 0x7fa74eb7c000, capacity = 16, size = > 18446744073709551615, elements = 0x7fa7182de1b0}{noformat} > The proton code was marked up with print statements which show that two > threads were accessing the collector->pool data structure at the same time... > {noformat} > 7b070700: pn_list_pop index 0 list->0x7fec401e0b70 value->0x7fec3c728a10 > 4700:pn_list_add index 1 size 2list->0x7fec401e0b70 value->0x7fec402095b0 > 7b070700: pn_list_pop size 1 list->0x7fec401e0b70 > 4700: pn_list_pop size 1 list->0x7fec401e0b70 > 7b070700: pn_list_pop index 0 list->0x7fec401e0b70 value->0x7fec3c728a10 > 4700: pn_list_pop index 0 list->0x7fec401e0b70 > value->0x7fec3c728a10{noformat} > The hex number on the far left is the thread id. As can be seen in the last > two lines, two threads are popping from the collector->pool simultaneously. > This produces the -1 size as seen up above -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2422) Proton will sometimes fail to send empty frame if the idle timeout ratio between peers is greater than 2.
[ https://issues.apache.org/jira/browse/PROTON-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435827#comment-17435827 ] Clifford Jansen commented on PROTON-2422: - Thank-you for the reproducer. This is caused by the wrong substitution of sort compare function. The existing code assumes that the Proton class of the timer_deadline is used to determine the ordering between objects. The compare function actually used is from a separate class specified at list creation. The fix is to decouple the class definition from the items in the list. The timer_deadline class now has no instantiated Proton objects but does provide a "static" compare function called by the pn_list on the list items to be sorted. > Proton will sometimes fail to send empty frame if the idle timeout ratio > between peers is greater than 2. > -- > > Key: PROTON-2422 > URL: https://issues.apache.org/jira/browse/PROTON-2422 > Project: Qpid Proton > Issue Type: Bug > Components: cpp-binding, proton-c >Affects Versions: proton-c-0.33.0 > Environment: RHEL 8 >Reporter: Jesse Hulsizer >Assignee: Clifford Jansen >Priority: Minor > Attachments: instrument.patch, reproducer.cpp > > > When a connection is made to a proton listener with both sides having > different idle timeout intervals, the epoll_timer can fail to trigger, > resulting in no empty frames being sent, and the connection dropped with a > 'amqp:resource-limit-exceeded: local-idle-timeout expired' exception. > Instrumentation of the proton library showed that when the an epoll timer > deadline was rolled back and the timer resequenced due to the peer idle > timeout being shorter than the local, the new timer is pushed on the timer > manager heap incorrectly. The timer deadline object should be pushed on the > timer heap in order by deadline, by in fact the timer is pushed on the head > by timer deadline object address. This causes the invalidated timer to be > first on the list, and the proactor timer set incorrectly. When enough time > has elapsed, the remote peer will close the connection due to inactivity. > Note that if the newly created resequenced timer deadline object has an > address lower than the old invalidated timer deadline object, proton will > work correctly. > I've attached a reproducer as well as a patch for the instrumentation. > Annotated proton logging from the reproducer is below. > This issue does not occur prior to 0.33.0 > {code:java} > [builder@SE-RHEL8-ITCM-TEST-01 qpid-proton-idle-timeout-repo $ ] > PN_LOG='frame info+' ./a.out > listening on 9030 > # The initial connection > [0x7fdb3c001be0]: SASL:FRAME: -> SASL > [0x7fdb44002620]: SASL:FRAME: <- SASL > [0x7fdb44002620]: SASL:FRAME: -> SASL > [0x7fdb44002620]: AMQP:FRAME:0 -> @sasl-mechanisms(64) > [sasl-server-mechanisms=@PN_SYMBOL[:ANONYMOUS]] > [0x7fdb3c001be0]: SASL:FRAME: <- SASL > [0x7fdb3c001be0]: AMQP:FRAME:0 <- @sasl-mechanisms(64) > [sasl-server-mechanisms=@PN_SYMBOL[:ANONYMOUS]] > [0x7fdb4ca21e20]:EVENT: INFO:In pni_timer_set - timer* 0x7fdb3c008570, > deadline 5189836829, proactor_timer* 0x10B9660 > [0x7fdb4ca21e20]:EVENT: INFO:Start of timer heap dump > [0x7fdb4ca21e20]:EVENT: INFO:Stop of timer heap dump > [0x7fdb4ca21e20]:EVENT: INFO:Start of timer heap dump post > [0x7fdb4ca21e20]:EVENT: INFO:Heap position 0: td=0x0167fdb3c0085b0, > td->list_deadline=5189836829, td->timer=0x7fdb3c008570, > td->resequenced=false > [0x7fdb4ca21e20]:EVENT: INFO:Stop of timer heap dump > [0x7fdb3c001be0]: AMQP:FRAME:0 -> @sasl-init(65) [mechanism=:ANONYMOUS, > initial-response=b"anonymous@SE-RHEL8-ITCM-TEST-01"] > [0x7fdb44002620]: AMQP:FRAME:0 <- @sasl-init(65) [mechanism=:ANONYMOUS, > initial-response=b"anonymous@SE-RHEL8-ITCM-TEST-01"] > [0x7fdb44002620]: SASL: INFO:Authenticated user: anonymous for anonymous with > mechanism ANONYMOUS > [0x7fdb44002620]: AMQP:FRAME:0 -> @sasl-outcome(68) [code=0] > [0x7fdb3c001be0]: AMQP:FRAME:0 <- @sasl-outcome(68) [code=0] > [0x7fdb4ca21e20]:EVENT: INFO:In pni_timer_set - timer* 0x7fdb3c008570, > deadline 5189836829, proactor_timer* 0x10B9660 > [0x7fdb4ca21e20]:EVENT: INFO:Start of timer heap dump > [0x7fdb4ca21e20]:EVENT: INFO:Heap position 0: td=0x0167fdb3c0085b0, > td->list_deadline=5189836829, td->timer=0x7fdb3c008570, > td->resequenced=false > [0x7fdb4ca21e20]:EVENT: INFO:Stop of timer heap dump > [0x7fdb3c001be0]: AMQP:FRAME: -> AMQP > [0x7fdb3c001be0]: AMQP:FRAME:0 -> @open(16) > [container-id="cf87e911-f46b-471a-a664-e34de8a57b6b", hostname="127.0.0.1", > channel-max=32767, idle-time-out=2] > [0x7fdb44002620]: AMQP:FRAME: <- AMQP > [0x7fdb44002620]: AMQP:FRAME:0 <- @open(16) > [container-id="cf87e911-f46b-
[jira] [Commented] (PROTON-2432) Proton crashes because of a concurrency failure in collector->pool
[ https://issues.apache.org/jira/browse/PROTON-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418723#comment-17418723 ] Clifford Jansen commented on PROTON-2432: - Further to Robbie's excellent response: See also the "Thread-safety" note in messaging_handler.hpp. Useful examples working with work queues can be found in cpp/examples including broker.cpp and the multithreaded clients. An alternate method to achieve thread safety in Proton (from using proton::work_queue) is to use connection::wake() paired with on_connection_wake() and have your own locking mechanism to manage your own work queue concept to ensure active use of the connection only happens in the dedicated thread that receives the connection callbacks. One frequent "gotcha" is inadvertent use of the connection or its sub-objects (senders/receivers/deliveries) from another thread. Destructors and copy constructors are the usual problem. A good strategy is to get a smart pointer to the Proton object while in the callback and stash it until a future safe callback where the application is ready to release it, and do so via smart_ptr::reset(). That way the destructor is called exactly when you want it, and any unnoticed copies of the shared ptr in another tread will have no surprise calls into the Proton engine. > Proton crashes because of a concurrency failure in collector->pool > -- > > Key: PROTON-2432 > URL: https://issues.apache.org/jira/browse/PROTON-2432 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.32.0 > Environment: RHEL 7 >Reporter: Jesse Hulsizer >Priority: Major > Attachments: proton-2432.patch > > > While running our application tests, our application crashes with many > different backtraces that look similar to this... > {noformat} > #0 0x in ?? () > #1 0x7fc777579198 in pn_class_incref () from > /usr/lib64/libqpid-proton.so.11 > #2 0x7fc777587d8a in pn_collector_put () from > /usr/lib64/libqpid-proton.so.11 > #3 0x7fc7775887ea in ?? () from /usr/lib64/libqpid-proton.so.11 > #4 0x7fc777588c7b in pn_transport_pending () from > /usr/lib64/libqpid-proton.so.11 > #5 0x7fc777588d9e in pn_transport_pop () from > /usr/lib64/libqpid-proton.so.11 > #6 0x7fc777599298 in ?? () from /usr/lib64/libqpid-proton.so.11 > #7 0x7fc77759a784 in ?? () from /usr/lib64/libqpid-proton.so.11 > #8 0x7fc7773236f0 in proton::container::impl::thread() () from > /usr/lib64/libqpid-proton-cpp.so.12 > #9 0x7fc7760b2470 in ?? () from /usr/lib64/libstdc++.so.6 > #10 0x7fc776309aa1 in start_thread () from /lib64/libpthread.so.0 > #11 0x7fc7758b6bdd in clone () from /lib64/libc.so.6{noformat} > Using gdb to probe one of the backtraces show that the collector->pool size > is -1... (seen here as 18446744073709551615) > {noformat} > (gdb) p *collector $1 = \{pool = 0x7fa7182de180, head = 0x7fa7182de250, tail > = 0x7fa7182b8b90, prev = 0x7fa7182ea010, freed = false} > (gdb) p collector->pool $2 = (pn_list_t *) 0x7fa7182de180 (gdb) p > *collector->pool $3 = \{clazz = 0x7fa74eb7c000, capacity = 16, size = > 18446744073709551615, elements = 0x7fa7182de1b0}{noformat} > The proton code was marked up with print statements which show that two > threads were accessing the collector->pool data structure at the same time... > {noformat} > 7b070700: pn_list_pop index 0 list->0x7fec401e0b70 value->0x7fec3c728a10 > 4700:pn_list_add index 1 size 2list->0x7fec401e0b70 value->0x7fec402095b0 > 7b070700: pn_list_pop size 1 list->0x7fec401e0b70 > 4700: pn_list_pop size 1 list->0x7fec401e0b70 > 7b070700: pn_list_pop index 0 list->0x7fec401e0b70 value->0x7fec3c728a10 > 4700: pn_list_pop index 0 list->0x7fec401e0b70 > value->0x7fec3c728a10{noformat} > The hex number on the far left is the thread id. As can be seen in the last > two lines, two threads are popping from the collector->pool simultaneously. > This produces the -1 size as seen up above -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2411) Simultaneous idle timeout sequencing errors
[ https://issues.apache.org/jira/browse/PROTON-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392573#comment-17392573 ] Clifford Jansen commented on PROTON-2411: - p2411_0.diff is a patch that can be applied to Proton 0.34 to help debug this issue. Instead of aborting if an AMQP connection is seen to set an earlier heartbeat timeout more than once, it prints a detailed diagnostic and continues to run. The problem is supposed to be very rare and this change could introduce some new runaway problem so if there are more than a handful of such sequencing errors on a single connection, the connection is terminated, and the process can continue to run, perhaps to reconnect as for any other temporary network failure (or to continue listening in the case of the router). To collect the error information Proton clients should be started with PN_LOG=ERROR+ in their process environment, or any other setting that includes ERROR level logging. Similarly, the router configuration should allow "error+" logging levels. The log messages will contain either "timer sequence error" or "timer multi sequence errors" If you use the patch and find examples of these errors in the logs, please add a representative sample to the JIRA. > Simultaneous idle timeout sequencing errors > --- > > Key: PROTON-2411 > URL: https://issues.apache.org/jira/browse/PROTON-2411 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.34.0 >Reporter: Jaap Wiggelinkhuizen >Priority: Critical > Attachments: p2411_0.diff > > > In our mission critical software we use Qpid proton 0.34.0 in our C++-client > software together with the Qpid dispatch router 1.16.0. We updated to these > versions not so long ago, before we used proton 0.25.0 and dispatch 1.3.0. > Our application runs on several VM’s with a router on each VM. All clients > connect to the local router only and the routers connect to eachother in a > hub spoke pattern. In both the client configuration as the router > configuration we have configured an idle timeout of 30 seconds. > On July 4th we were confronted with an incident in production where a lot of > our client processes reported problems regarding the idle timeouts. These > client processes were already running stable for more than 3 weeks. The > problem appeared in two flavors: > # Transport error “error: amqp:resource-limit-exceeded: local-idle-timeout > expired” > # epoll proactor failure in epoll_timer.c:263: “idle timeout sequencing > error” > On each VM at least 3 processes showed one of these problems in a total time > window of less than a minute. We haven’t found any cause in the underlying > hardware, hypervisor, network or operating system until now. > Although we don’t know the root cause of the problems, we can solve the first > situation by using the proper reconnect settings (by mistake we handled > on_transport_error() as a fatal situation and will correct that so that only > on_transport_close() will be handled as fatal). However the second situation > is more odd because it results in an abort within proton itself. The comments > in epoll_timer.c explain that this error occurs when a connection timer is > moved backwards a second time. We don’t understand how this can happen > suddenly. > > Last sunday the problem occurred again on two more production sites where our > software was operational just over 3 weeks now. And again it has happened on > all VM's within a short timeframe. It's interesting that it only occurs on > sunday mornings until now. Maybe it has something to do with how long the > software is running and the fact that on sunday mornings there is less > messaging traffic, i.e. more heartbeats?... > > Unfortunately we haven't been able to reproduce the issue at our test > facilities and hence can not provide a reproducer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (PROTON-2411) Simultaneous idle timeout sequencing errors
[ https://issues.apache.org/jira/browse/PROTON-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen updated PROTON-2411: Attachment: p2411_0.diff > Simultaneous idle timeout sequencing errors > --- > > Key: PROTON-2411 > URL: https://issues.apache.org/jira/browse/PROTON-2411 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.34.0 >Reporter: Jaap Wiggelinkhuizen >Priority: Critical > Attachments: p2411_0.diff > > > In our mission critical software we use Qpid proton 0.34.0 in our C++-client > software together with the Qpid dispatch router 1.16.0. We updated to these > versions not so long ago, before we used proton 0.25.0 and dispatch 1.3.0. > Our application runs on several VM’s with a router on each VM. All clients > connect to the local router only and the routers connect to eachother in a > hub spoke pattern. In both the client configuration as the router > configuration we have configured an idle timeout of 30 seconds. > On July 4th we were confronted with an incident in production where a lot of > our client processes reported problems regarding the idle timeouts. These > client processes were already running stable for more than 3 weeks. The > problem appeared in two flavors: > # Transport error “error: amqp:resource-limit-exceeded: local-idle-timeout > expired” > # epoll proactor failure in epoll_timer.c:263: “idle timeout sequencing > error” > On each VM at least 3 processes showed one of these problems in a total time > window of less than a minute. We haven’t found any cause in the underlying > hardware, hypervisor, network or operating system until now. > Although we don’t know the root cause of the problems, we can solve the first > situation by using the proper reconnect settings (by mistake we handled > on_transport_error() as a fatal situation and will correct that so that only > on_transport_close() will be handled as fatal). However the second situation > is more odd because it results in an abort within proton itself. The comments > in epoll_timer.c explain that this error occurs when a connection timer is > moved backwards a second time. We don’t understand how this can happen > suddenly. > > Last sunday the problem occurred again on two more production sites where our > software was operational just over 3 weeks now. And again it has happened on > all VM's within a short timeframe. It's interesting that it only occurs on > sunday mornings until now. Maybe it has something to do with how long the > software is running and the fact that on sunday mornings there is less > messaging traffic, i.e. more heartbeats?... > > Unfortunately we haven't been able to reproduce the issue at our test > facilities and hence can not provide a reproducer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2403) libuv based proactor test errors
[ https://issues.apache.org/jira/browse/PROTON-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370125#comment-17370125 ] Clifford Jansen commented on PROTON-2403: - pn2403_0.diff works on Fedora 34 and libuv-1.41.0-1. Doc for read_start() indicates a change in behaviour in V1.38, but the specified change does not explain why the proactor code was working with versions earlier than 1.38. TBD if this is a general fix or version dependent code needs to created becreated. > libuv based proactor test errors > > > Key: PROTON-2403 > URL: https://issues.apache.org/jira/browse/PROTON-2403 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.34.0 > Environment: Builds using the libuv proactor. >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Attachments: pn2403_0.diff > > > New test failures are seen with recent versions of libuv. At least starting > with version 1.41 of libuv and perhaps earlier. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (PROTON-2403) libuv based proactor test errors
[ https://issues.apache.org/jira/browse/PROTON-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen updated PROTON-2403: Attachment: pn2403_0.diff > libuv based proactor test errors > > > Key: PROTON-2403 > URL: https://issues.apache.org/jira/browse/PROTON-2403 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.34.0 > Environment: Builds using the libuv proactor. >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Attachments: pn2403_0.diff > > > New test failures are seen with recent versions of libuv. At least starting > with version 1.41 of libuv and perhaps earlier. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2403) libuv based proactor test errors
Clifford Jansen created PROTON-2403: --- Summary: libuv based proactor test errors Key: PROTON-2403 URL: https://issues.apache.org/jira/browse/PROTON-2403 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: proton-c-0.34.0 Environment: Builds using the libuv proactor. Reporter: Clifford Jansen Assignee: Clifford Jansen New test failures are seen with recent versions of libuv. At least starting with version 1.41 of libuv and perhaps earlier. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (PROTON-2397) Update default client TLS defaults for verifying outbound connections to AMQP servers.
[ https://issues.apache.org/jira/browse/PROTON-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen updated PROTON-2397: Description: Proton C and its associated bindings do not have consistent default client side TLS configuration. Proton libraries will be changed on a per-language/binding basis so that all clients verify the server's certificate and identifying name by default, i.e. to use PN_SSL_VERIFY_PEER_NAME unless the application takes steps to change the desired level of authentication. This default behaviour is required for the Proton libraries to be compliant with the TLS specification 1.3 (RFC 8446). Such compliance is obviously highly desirable now and will become mandatory in the future. C++ applications will not be affected (this is the existing default). C, Python, Ruby and Go applications that fully configure their client connections are also unaffected. Python programs that use MESSAGING_CONNECT_FILE (or the connect.json equivalent) are unaffected. Proton applications that do not make outbound connections are unaffected. All other applications may run into stricter verification policies that cause previously successful TLS negotiations to now fail. These applications will need to either: - explicitly downgrade the verification mechanism of outgoing connections to the old default (PN_SSL_ANONYMOUS_PEER) - update server certificates and/or client trusted root CA's as required to work in the full PN_SSL_VERIFY_PEER_NAME verification mode. > Update default client TLS defaults for verifying outbound connections to AMQP > servers. > -- > > Key: PROTON-2397 > URL: https://issues.apache.org/jira/browse/PROTON-2397 > Project: Qpid Proton > Issue Type: Improvement > Components: cpp-binding, go-binding, proton-c, python-binding, > ruby-binding >Affects Versions: proton-c-0.34.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Fix For: proton-c-0.35.0 > > > Proton C and its associated bindings do not have consistent default client > side TLS configuration. Proton libraries will be changed on a > per-language/binding basis so that all clients verify the server's > certificate and identifying name by default, i.e. to use > PN_SSL_VERIFY_PEER_NAME unless the application takes steps to change the > desired level of authentication. > This default behaviour is required for the Proton libraries to be compliant > with the TLS specification 1.3 (RFC 8446). Such compliance is obviously > highly desirable now and will become mandatory in the future. > C++ applications will not be affected (this is the existing default). > C, Python, Ruby and Go applications that fully configure their client > connections are also unaffected. > Python programs that use MESSAGING_CONNECT_FILE (or the connect.json > equivalent) are unaffected. > Proton applications that do not make outbound connections are unaffected. > All other applications may run into stricter verification policies that cause > previously successful TLS negotiations to now fail. These applications will > need to either: > - explicitly downgrade the verification mechanism of outgoing connections to > the old default (PN_SSL_ANONYMOUS_PEER) > - update server certificates and/or client trusted root CA's as required to > work in the full PN_SSL_VERIFY_PEER_NAME verification mode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (PROTON-2397) Update default client TLS defaults for verifying outbound connections to AMQP servers.
[ https://issues.apache.org/jira/browse/PROTON-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clifford Jansen updated PROTON-2397: Environment: (was: Proton C and its associated bindings do not have consistent default client side TLS configuration. Proton libraries will be changed on a per-language/binding basis so that all clients verify the server's certificate and identifying name by default, i.e. to use PN_SSL_VERIFY_PEER_NAME unless the application takes steps to change the desired level of authentication. This default behaviour is required for the Proton libraries to be compliant with the TLS specification 1.3 (RFC 8446). Such compliance is obviously highly desirable now and will become mandatory in the future. C++ applications will not be affected (this is the existing default). C, Python, Ruby and Go applications that fully configure their client connections are also unaffected. Python programs that use MESSAGING_CONNECT_FILE (or the connect.json equivalent) are unaffected. Proton applications that do not make outbound connections are unaffected. All other applications may run into stricter verification policies that cause previously successful TLS negotiations to now fail. These applications will need to either: - explicitly downgrade the verification mechanism of outgoing connections to the old default (PN_SSL_ANONYMOUS_PEER) - update server certificates and/or client trusted root CA's as required to work in the full PN_SSL_VERIFY_PEER_NAME verification mode. ) > Update default client TLS defaults for verifying outbound connections to AMQP > servers. > -- > > Key: PROTON-2397 > URL: https://issues.apache.org/jira/browse/PROTON-2397 > Project: Qpid Proton > Issue Type: Improvement > Components: cpp-binding, go-binding, proton-c, python-binding, > ruby-binding >Affects Versions: proton-c-0.34.0 >Reporter: Clifford Jansen >Assignee: Clifford Jansen >Priority: Major > Fix For: proton-c-0.35.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org