[jira] [Created] (PROTON-1170) closed links are never deleted
michael goulish created PROTON-1170: --- Summary: closed links are never deleted Key: PROTON-1170 URL: https://issues.apache.org/jira/browse/PROTON-1170 Project: Qpid Proton Issue Type: Bug Components: proton-c Environment: miserable Reporter: michael goulish I wrote a reactor-based application that makes a single connection, and then repeatedly makes-and-closes links (receivers) on that connection. It makes and closes the links as fast as possible: as soon as it gets the on_receiver_close event, it makes a new one. As soon as it gets the on_receiver_open event -- it closes that receiver. This application talks to a dispatch router. Problem: Both the router and my application grow their memory (RSS) rapidly -- and the router's ability to respond to new link creations slows down rapidly. Looking at the router with Valgrind/Callgrind, after about 15,000 links have been created and closed I see that 45% of all CPU time on the router is being consumed by pn_find_link(). Instrumenting that code, I see that the list it is looking at never decreases in size. I tried creating my links with the "lifetime_policy" set to DELETE_ON_CLOSE, but that had no effect. Grepping for that symbol, I see that it does not occur in the proton C code except in its definition, and in a printing convenience function. Major scalability bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PROTON-1009) message.h does not have a set method for annotations
michael goulish created PROTON-1009: --- Summary: message.h does not have a set method for annotations Key: PROTON-1009 URL: https://issues.apache.org/jira/browse/PROTON-1009 Project: Qpid Proton Issue Type: Bug Components: proton-c Reporter: michael goulish Comments above the method pn_message_annotations() indicate that it can bot set and get annotations -- but in fact it has no way to set. And it looks like there is no other way in the C API, either. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PROTON-1009) message.h does not have a set method for annotations
[ https://issues.apache.org/jira/browse/PROTON-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish resolved PROTON-1009. - Resolution: Not A Problem Oops. I didn't realize that the function is returning a pointer that can be used to change the annotations. *That's* how you set them. Sorry for the noise. > message.h does not have a set method for annotations > > > Key: PROTON-1009 > URL: https://issues.apache.org/jira/browse/PROTON-1009 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Reporter: michael goulish > > Comments above the method pn_message_annotations() indicate that it can bot > set and get annotations -- but in fact it has no way to set. > And it looks like there is no other way in the C API, either. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (PROTON-992) Proton's use of Cyrus SASL is not thread-safe.
[ https://issues.apache.org/jira/browse/PROTON-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish closed PROTON-992. -- Resolution: Duplicate this is a duplicate of PROTON-862 > Proton's use of Cyrus SASL is not thread-safe. > -- > > Key: PROTON-992 > URL: https://issues.apache.org/jira/browse/PROTON-992 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: 0.10 >Reporter: michael goulish >Assignee: michael goulish >Priority: Critical > > Documentation for the Cyrus SASL library says that the library is believed to > be thread-safe only if the code that uses it meets several requirements. > The requirements are: > * you supply mutex functions (see sasl_set_mutex()) > * you make no libsasl calls until sasl_client/server_init() completes > * no libsasl calls are made after sasl_done() is begun > * when using GSSAPI, you use a thread-safe GSS / Kerberos 5 library. > It says explicitly that that sasl_set* calls are not thread safe, since they > set global state. > The proton library makes calls to sasl_set* functions in : > pni_init_client() > pni_init_server(), and > pni_process_init() > Since those are internal functions, there is no way for code that uses Proton > to lock around those calls. > I think proton needs a new API call to let applications call > sasl_set_mutex(). Or something. > We probably also need other protections to meet the other requirements > specified in the Cyrus documentation (and quoted above). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PROTON-992) Proton's use of Cyrus SASL is not thread-safe.
[ https://issues.apache.org/jira/browse/PROTON-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902731#comment-14902731 ] michael goulish commented on PROTON-992: oops. this is a duplicate of PROTON-862 > Proton's use of Cyrus SASL is not thread-safe. > -- > > Key: PROTON-992 > URL: https://issues.apache.org/jira/browse/PROTON-992 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: 0.10 >Reporter: michael goulish >Assignee: michael goulish >Priority: Critical > > Documentation for the Cyrus SASL library says that the library is believed to > be thread-safe only if the code that uses it meets several requirements. > The requirements are: > * you supply mutex functions (see sasl_set_mutex()) > * you make no libsasl calls until sasl_client/server_init() completes > * no libsasl calls are made after sasl_done() is begun > * when using GSSAPI, you use a thread-safe GSS / Kerberos 5 library. > It says explicitly that that sasl_set* calls are not thread safe, since they > set global state. > The proton library makes calls to sasl_set* functions in : > pni_init_client() > pni_init_server(), and > pni_process_init() > Since those are internal functions, there is no way for code that uses Proton > to lock around those calls. > I think proton needs a new API call to let applications call > sasl_set_mutex(). Or something. > We probably also need other protections to meet the other requirements > specified in the Cyrus documentation (and quoted above). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PROTON-992) Proton's use of Cyrus SASL is not thread-safe.
michael goulish created PROTON-992: -- Summary: Proton's use of Cyrus SASL is not thread-safe. Key: PROTON-992 URL: https://issues.apache.org/jira/browse/PROTON-992 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.10 Reporter: michael goulish Priority: Critical Documentation for the Cyrus SASL library says that the library is believed to be thread-safe only if the code that uses it meets several requirements. The requirements are: * you supply mutex functions (see sasl_set_mutex()) * you make no libsasl calls until sasl_client/server_init() completes * no libsasl calls are made after sasl_done() is begun * when using GSSAPI, you use a thread-safe GSS / Kerberos 5 library. It says explicitly that that sasl_set* calls are not thread safe, since they set global state. The proton library makes calls to sasl_set* functions in : pni_init_client() pni_init_server(), and pni_process_init() Since those are internal functions, there is no way for code that uses Proton to lock around those calls. I think proton needs a new API call to let applications call sasl_set_mutex(). Or something. We probably also need other protections to meet the other requirements specified in the Cyrus documentation (and quoted above). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (PROTON-919) make C impl behave like java wrt channel_max error
[ https://issues.apache.org/jira/browse/PROTON-919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish closed PROTON-919. -- Resolution: Fixed Fix Version/s: 0.10 commit 4ee726002804d7286a8c76b42e0a0717e0798822 please NOTE that this change also adds #define PN_OK (0) to the list of errors in error.h make C impl behave like java wrt channel_max error -- Key: PROTON-919 URL: https://issues.apache.org/jira/browse/PROTON-919 Project: Qpid Proton Issue Type: Improvement Components: proton-c, python-binding Reporter: michael goulish Assignee: michael goulish Priority: Minor Fix For: 0.10 In the Java impl, I made TransportImpl throw an exception if the application tries to change the local channel_max setting after we have already sent the OPEN frame to the remote peer. ( Because at that point we communicate our channel_max limit to the peer -- no fair changing it afterwards.) One reviewer suggested that it would be nice if the C impl worked the same way. That would mean that pn_set_channel_max() would have to return a result code, which the Python binding would detect -- Python binding throws exception, python tests detect it -- so it would work same way as Java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (PROTON-864) don't crash when channel number goes high
[ https://issues.apache.org/jira/browse/PROTON-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish closed PROTON-864. -- Resolution: Fixed Fix Version/s: 0.10 This is a duplicate of PROTON-842 don't crash when channel number goes high - Key: PROTON-864 URL: https://issues.apache.org/jira/browse/PROTON-864 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.9 Reporter: michael goulish Assignee: michael goulish Fix For: 0.10 Code in transport.c, and a little in engine.c, looks at the topmost bit in channel numbers to decide if the channels are in use. This causes crashes when the number of channels in a single connection goes beyond 32767. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PROTON-949) proton doesn't build with ccache swig
michael goulish created PROTON-949: -- Summary: proton doesn't build with ccache swig Key: PROTON-949 URL: https://issues.apache.org/jira/browse/PROTON-949 Project: Qpid Proton Issue Type: Bug Components: proton-c Reporter: michael goulish Thanks to aconway for finding this and saving me a day of madness and horror. On freshly-downloaded proton tree, if I use this swig: /usr/lib64/ccache/swig the build fails this way: qpid-proton/build/proton-c/bindings/python/cprotonPYTHON_wrap.c:4993:25: error: 'PN_HANDLE' undeclared (first use in this function) PNI_PYTRACER = *((PN_HANDLE *)(argp)); -- but if I delete that swig executable, and use the one in /bin/swig , then everything works. yikes. aconway believes the bug is in ccache-swig, not in proton, but I want to put this here in case this bites someone else in Proton Land. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PROTON-946) remove generated data structure definitions from protocol.h
[ https://issues.apache.org/jira/browse/PROTON-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish updated PROTON-946: --- Description: Currently protocol.h.py reads the AMQP 1.0 spec xml files and generates all of its output into protocol.h -- even the data structure definitions. Those definitions are currently protected by #ifdef DEFINE_FIELDS , which is defined only in codec.c -- so the definitions only show up in that file, while other .c files only see the declarations. If DEFINE_FIELDS is #defined in any other file, compilation will fail with multiple definition errors. The structure declarations should remain in the .h file , but the actual definitions should be moved into a generated .c file. was: Currently protocol.h.py reads the AMQP 1.0 spec xml files and generates all of its output into protocol.h -- evel the data structure definitions. Those definitions are currently protected by #ifdef DEFINE_FIELDS , which is defined only in codec.c -- so the definitions only show up in that file, while other .c files only see the declarations. If DEFINE_FIELDS is #defined in any other file, compilation will fail with multiple definition errors. The structure declarations should remain in the .h file , but the actual definitions should be moved into a generated .c file. Summary: remove generated data structure definitions from protocol.h (was: remove generated data structure definitions from .protocol.h) remove generated data structure definitions from protocol.h --- Key: PROTON-946 URL: https://issues.apache.org/jira/browse/PROTON-946 Project: Qpid Proton Issue Type: Improvement Components: proton-c Affects Versions: 0.10 Reporter: michael goulish Assignee: michael goulish Currently protocol.h.py reads the AMQP 1.0 spec xml files and generates all of its output into protocol.h -- even the data structure definitions. Those definitions are currently protected by #ifdef DEFINE_FIELDS , which is defined only in codec.c -- so the definitions only show up in that file, while other .c files only see the declarations. If DEFINE_FIELDS is #defined in any other file, compilation will fail with multiple definition errors. The structure declarations should remain in the .h file , but the actual definitions should be moved into a generated .c file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PROTON-946) remove generated data structure definitions from .protocol.h
michael goulish created PROTON-946: -- Summary: remove generated data structure definitions from .protocol.h Key: PROTON-946 URL: https://issues.apache.org/jira/browse/PROTON-946 Project: Qpid Proton Issue Type: Improvement Components: proton-c Affects Versions: 0.10 Reporter: michael goulish Assignee: michael goulish Currently protocol.h.py reads the AMQP 1.0 spec xml files and generates all of its output into protocol.h -- evel the data structure definitions. Those definitions are currently protected by #ifdef DEFINE_FIELDS , which is defined only in codec.c -- so the definitions only show up in that file, while other .c files only see the declarations. If DEFINE_FIELDS is #defined in any other file, compilation will fail with multiple definition errors. The structure declarations should remain in the .h file , but the actual definitions should be moved into a generated .c file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PROTON-826) recent checkin causes frequent double-free or corruption crash
[ https://issues.apache.org/jira/browse/PROTON-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish resolved PROTON-826. Resolution: Fixed I recreated my test from February, and cannot reproduce the bug using latest dispatch + protron code. recent checkin causes frequent double-free or corruption crash -- Key: PROTON-826 URL: https://issues.apache.org/jira/browse/PROTON-826 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.9 Reporter: michael goulish Assignee: michael goulish Priority: Blocker In my dispatch testing I am seeing frequent crashes in proton library that began with proton checkin 01cb00c on 2015-02-15 report read and write errors through the transport The output at crash-time says this: --- *** Error in `/home/mick/dispatch/install/sbin/qdrouterd': double free or corruption (fasttop): 0x020ee880 *** === Backtrace: = /lib64/libc.so.6[0x3e3d875a4f] /lib64/libc.so.6[0x3e3d87cd78] /lib64/libqpid-proton.so.2(pn_error_clear+0x18)[0x7f4f4f4e1f18] /lib64/libqpid-proton.so.2(pn_error_set+0x11)[0x7f4f4f4e1f41] /lib64/libqpid-proton.so.2(pn_error_vformat+0x3e)[0x7f4f4f4e1f9e] /lib64/libqpid-proton.so.2(pn_error_format+0x82)[0x7f4f4f4e2032] /lib64/libqpid-proton.so.2(pn_i_error_from_errno+0x67)[0x7f4f4f4fd737] /lib64/libqpid-proton.so.2(pn_recv+0x5a)[0x7f4f4f4fd16a] /home/mick/dispatch/install/lib64/libqpid-dispatch.so.0(qdpn_connector_process+0xd7)[0x7f4f4f759430] The backtrace from the core file looks like this: #0 0x003e3d835877 in raise () from /lib64/libc.so.6 #1 0x003e3d836f68 in abort () from /lib64/libc.so.6 #2 0x003e3d875a54 in __libc_message () from /lib64/libc.so.6 #3 0x003e3d87cd78 in _int_free () from /lib64/libc.so.6 #4 0x7fbf8a59b2e8 in pn_error_clear (error=error@entry=0x1501140) at /home/mick/rh-qpid-proton/proton-c/src/error.c:56 #5 0x7fbf8a59b311 in pn_error_set (error=error@entry=0x1501140, code=code@entry=-2, text=text@entry=0x7fbf801a69c0 recv: Resource temporarily unavailable) at /home/mick/rh-qpid-proton/proton-c/src/error.c:65 #6 0x7fbf8a59b36e in pn_error_vformat (error=0x1501140, code=-2, fmt=optimized out, ap=ap@entry=0x7fbf801a6de8) at /home/mick/rh-qpid-proton/proton-c/src/error.c:81 #7 0x7fbf8a59b402 in pn_error_format (error=error@entry=0x1501140, code=optimized out, fmt=fmt@entry=0x7fbf8a5bb21e %s: %s) at /home/mick/rh-qpid-proton/proton-c/src/error.c:89 #8 0x7fbf8a5b6797 in pn_i_error_from_errno (error=0x1501140, msg=msg@entry=0x7fbf8a5bbe1a recv) at /home/mick/rh-qpid-proton/proton-c/src/platform.c:119 #9 0x7fbf8a5b61ca in pn_recv (io=0x14e77b0, socket=optimized out, buf=optimized out, size=optimized out) at /home/mick/rh-qpid-proton/proton-c/src/posix/io.c:271 #10 0x7fbf8a812430 in qdpn_connector_process (c=0x7fbf7801c7f0) - And I can prevent the crash from happening, apparently forever, by commenting out this line: free(error-text); in the function pn_error_clear in the file proton-c/src/error.c The error text that is being freed which causes the crash looks like this: $2 = {text = 0x7f66e8104e30 recv: Resource temporarily unavailable, root = 0x0, code = -2} My dispatch test creates a router network and then repeatedly kills and restarts a randomly-selected router. After this proton checkin it almost never gets through 5 iterations without this crash. After I commented out that line, it got through more than 500 iterations before I stopped it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PROTON-826) recent checkin causes frequent double-free or corruption crash
[ https://issues.apache.org/jira/browse/PROTON-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish reassigned PROTON-826: -- Assignee: michael goulish recent checkin causes frequent double-free or corruption crash -- Key: PROTON-826 URL: https://issues.apache.org/jira/browse/PROTON-826 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.9 Reporter: michael goulish Assignee: michael goulish Priority: Blocker In my dispatch testing I am seeing frequent crashes in proton library that began with proton checkin 01cb00c on 2015-02-15 report read and write errors through the transport The output at crash-time says this: --- *** Error in `/home/mick/dispatch/install/sbin/qdrouterd': double free or corruption (fasttop): 0x020ee880 *** === Backtrace: = /lib64/libc.so.6[0x3e3d875a4f] /lib64/libc.so.6[0x3e3d87cd78] /lib64/libqpid-proton.so.2(pn_error_clear+0x18)[0x7f4f4f4e1f18] /lib64/libqpid-proton.so.2(pn_error_set+0x11)[0x7f4f4f4e1f41] /lib64/libqpid-proton.so.2(pn_error_vformat+0x3e)[0x7f4f4f4e1f9e] /lib64/libqpid-proton.so.2(pn_error_format+0x82)[0x7f4f4f4e2032] /lib64/libqpid-proton.so.2(pn_i_error_from_errno+0x67)[0x7f4f4f4fd737] /lib64/libqpid-proton.so.2(pn_recv+0x5a)[0x7f4f4f4fd16a] /home/mick/dispatch/install/lib64/libqpid-dispatch.so.0(qdpn_connector_process+0xd7)[0x7f4f4f759430] The backtrace from the core file looks like this: #0 0x003e3d835877 in raise () from /lib64/libc.so.6 #1 0x003e3d836f68 in abort () from /lib64/libc.so.6 #2 0x003e3d875a54 in __libc_message () from /lib64/libc.so.6 #3 0x003e3d87cd78 in _int_free () from /lib64/libc.so.6 #4 0x7fbf8a59b2e8 in pn_error_clear (error=error@entry=0x1501140) at /home/mick/rh-qpid-proton/proton-c/src/error.c:56 #5 0x7fbf8a59b311 in pn_error_set (error=error@entry=0x1501140, code=code@entry=-2, text=text@entry=0x7fbf801a69c0 recv: Resource temporarily unavailable) at /home/mick/rh-qpid-proton/proton-c/src/error.c:65 #6 0x7fbf8a59b36e in pn_error_vformat (error=0x1501140, code=-2, fmt=optimized out, ap=ap@entry=0x7fbf801a6de8) at /home/mick/rh-qpid-proton/proton-c/src/error.c:81 #7 0x7fbf8a59b402 in pn_error_format (error=error@entry=0x1501140, code=optimized out, fmt=fmt@entry=0x7fbf8a5bb21e %s: %s) at /home/mick/rh-qpid-proton/proton-c/src/error.c:89 #8 0x7fbf8a5b6797 in pn_i_error_from_errno (error=0x1501140, msg=msg@entry=0x7fbf8a5bbe1a recv) at /home/mick/rh-qpid-proton/proton-c/src/platform.c:119 #9 0x7fbf8a5b61ca in pn_recv (io=0x14e77b0, socket=optimized out, buf=optimized out, size=optimized out) at /home/mick/rh-qpid-proton/proton-c/src/posix/io.c:271 #10 0x7fbf8a812430 in qdpn_connector_process (c=0x7fbf7801c7f0) - And I can prevent the crash from happening, apparently forever, by commenting out this line: free(error-text); in the function pn_error_clear in the file proton-c/src/error.c The error text that is being freed which causes the crash looks like this: $2 = {text = 0x7f66e8104e30 recv: Resource temporarily unavailable, root = 0x0, code = -2} My dispatch test creates a router network and then repeatedly kills and restarts a randomly-selected router. After this proton checkin it almost never gets through 5 iterations without this crash. After I commented out that line, it got through more than 500 iterations before I stopped it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PROTON-826) recent checkin causes frequent double-free or corruption crash
[ https://issues.apache.org/jira/browse/PROTON-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616749#comment-14616749 ] michael goulish commented on PROTON-826: I see why I didn't follow this up earlier. Current dispatch will not compile against latest proton because of some SASL issues. But I need to test against latest proton. SO ... now attempting to hack up dispatch so that it doesn't have SASL but will still build and run against latest proton recent checkin causes frequent double-free or corruption crash -- Key: PROTON-826 URL: https://issues.apache.org/jira/browse/PROTON-826 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.9 Reporter: michael goulish Assignee: michael goulish Priority: Blocker In my dispatch testing I am seeing frequent crashes in proton library that began with proton checkin 01cb00c on 2015-02-15 report read and write errors through the transport The output at crash-time says this: --- *** Error in `/home/mick/dispatch/install/sbin/qdrouterd': double free or corruption (fasttop): 0x020ee880 *** === Backtrace: = /lib64/libc.so.6[0x3e3d875a4f] /lib64/libc.so.6[0x3e3d87cd78] /lib64/libqpid-proton.so.2(pn_error_clear+0x18)[0x7f4f4f4e1f18] /lib64/libqpid-proton.so.2(pn_error_set+0x11)[0x7f4f4f4e1f41] /lib64/libqpid-proton.so.2(pn_error_vformat+0x3e)[0x7f4f4f4e1f9e] /lib64/libqpid-proton.so.2(pn_error_format+0x82)[0x7f4f4f4e2032] /lib64/libqpid-proton.so.2(pn_i_error_from_errno+0x67)[0x7f4f4f4fd737] /lib64/libqpid-proton.so.2(pn_recv+0x5a)[0x7f4f4f4fd16a] /home/mick/dispatch/install/lib64/libqpid-dispatch.so.0(qdpn_connector_process+0xd7)[0x7f4f4f759430] The backtrace from the core file looks like this: #0 0x003e3d835877 in raise () from /lib64/libc.so.6 #1 0x003e3d836f68 in abort () from /lib64/libc.so.6 #2 0x003e3d875a54 in __libc_message () from /lib64/libc.so.6 #3 0x003e3d87cd78 in _int_free () from /lib64/libc.so.6 #4 0x7fbf8a59b2e8 in pn_error_clear (error=error@entry=0x1501140) at /home/mick/rh-qpid-proton/proton-c/src/error.c:56 #5 0x7fbf8a59b311 in pn_error_set (error=error@entry=0x1501140, code=code@entry=-2, text=text@entry=0x7fbf801a69c0 recv: Resource temporarily unavailable) at /home/mick/rh-qpid-proton/proton-c/src/error.c:65 #6 0x7fbf8a59b36e in pn_error_vformat (error=0x1501140, code=-2, fmt=optimized out, ap=ap@entry=0x7fbf801a6de8) at /home/mick/rh-qpid-proton/proton-c/src/error.c:81 #7 0x7fbf8a59b402 in pn_error_format (error=error@entry=0x1501140, code=optimized out, fmt=fmt@entry=0x7fbf8a5bb21e %s: %s) at /home/mick/rh-qpid-proton/proton-c/src/error.c:89 #8 0x7fbf8a5b6797 in pn_i_error_from_errno (error=0x1501140, msg=msg@entry=0x7fbf8a5bbe1a recv) at /home/mick/rh-qpid-proton/proton-c/src/platform.c:119 #9 0x7fbf8a5b61ca in pn_recv (io=0x14e77b0, socket=optimized out, buf=optimized out, size=optimized out) at /home/mick/rh-qpid-proton/proton-c/src/posix/io.c:271 #10 0x7fbf8a812430 in qdpn_connector_process (c=0x7fbf7801c7f0) - And I can prevent the crash from happening, apparently forever, by commenting out this line: free(error-text); in the function pn_error_clear in the file proton-c/src/error.c The error text that is being freed which causes the crash looks like this: $2 = {text = 0x7f66e8104e30 recv: Resource temporarily unavailable, root = 0x0, code = -2} My dispatch test creates a router network and then repeatedly kills and restarts a randomly-selected router. After this proton checkin it almost never gets through 5 iterations without this crash. After I commented out that line, it got through more than 500 iterations before I stopped it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PROTON-930) add explicit AMQP 1.0 constants
michael goulish created PROTON-930: -- Summary: add explicit AMQP 1.0 constants Key: PROTON-930 URL: https://issues.apache.org/jira/browse/PROTON-930 Project: Qpid Proton Issue Type: Improvement Components: proton-c Reporter: michael goulish Assignee: michael goulish Priority: Minor Fix For: 0.10 Add an include file that has explicit defined constants for every numeric default value that is mandated by the AMQP 1.0 spec. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PROTON-925) proton-c seems to treat unspecified channel-max as implying 0
[ https://issues.apache.org/jira/browse/PROTON-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish resolved PROTON-925. Resolution: Fixed commit fc38e86a6f5a1b265552708e674d3c8040c1985b proton-c seems to treat unspecified channel-max as implying 0 - Key: PROTON-925 URL: https://issues.apache.org/jira/browse/PROTON-925 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.10 Reporter: Gordon Sim Assignee: michael goulish Priority: Blocker Fix For: 0.10 If max-channels is not specified in the open, it appears the latest proton-c treats that as implying the maximum is 0 though the spec states the default is 65535. This breaks compatibility with previous proton releases. E.g. the following is the interaction between a sender using the latest 0.10 and a receiver using proton 0.9. {noformat} [0x151c710]: - AMQP [0x151c710]:0 - @open(16) [container-id=65A6602D-5D24-4D39-9C6F-7403D98F5E15, hostname=localhost, channel-max=32767] [0x151c710]:0 - @begin(17) [next-outgoing-id=0, incoming-window=2147483647, outgoing-window=1] [0x151c710]:1 - @begin(17) [next-outgoing-id=0, incoming-window=2147483647, outgoing-window=1] [0x151c710]:2 - @begin(17) [next-outgoing-id=0, incoming-window=2147483647, outgoing-window=1] [0x151c710]:0 - @attach(18) [name=sender-xxx, handle=0, role=false, snd-settle-mode=2, rcv-settle-mode=0, source=@source(40) [address=queue_a, durable=0, timeout=0, dynamic=false], target=@target(41) [address=queue_a, durable=0, timeout=0, dynamic=false], initial-delivery-count=0] [0x151c710]:1 - @attach(18) [name=sender-xxx, handle=0, role=false, snd-settle-mode=2, rcv-settle-mode=0, source=@source(40) [address=queue_b, durable=0, timeout=0, dynamic=false], target=@target(41) [address=queue_b, durable=0, timeout=0, dynamic=false], initial-delivery-count=0] [0x151c710]:2 - @attach(18) [name=sender-xxx, handle=0, role=false, snd-settle-mode=2, rcv-settle-mode=0, source=@source(40) [address=queue_c, durable=0, timeout=0, dynamic=false], target=@target(41) [address=queue_c, durable=0, timeout=0, dynamic=false], initial-delivery-count=0] [0x151c710]: - AMQP [0x151c710]:0 - @open(16) [container-id=abab56b0-c25e-427b-9f4f-d63da48d1973] [0x151c710]:0 - @begin(17) [remote-channel=0, next-outgoing-id=0, incoming-window=2147483647, outgoing-window=0] [0x151c710]:1 - @begin(17) [remote-channel=1, next-outgoing-id=0, incoming-window=2147483647, outgoing-window=0] [0x151c710]:2 - @begin(17) [remote-channel=2, next-outgoing-id=0, incoming-window=2147483647, outgoing-window=0] [0x151c710]:0 - @attach(18) [name=sender-xxx, handle=0, role=true, snd-settle-mode=2, rcv-settle-mode=0, source=@source(40) [address=queue_a, durable=0, timeout=0, dynamic=false], target=@target(41) [address=queue_a, durable=0, timeout=0, dynamic=false], initial-delivery-count=0] [0x151c710]:1 - @attach(18) [name=sender-xxx, handle=0, role=true, snd-settle-mode=2, rcv-settle-mode=0, source=@source(40) [address=queue_b, durable=0, timeout=0, dynamic=false], target=@target(41) [address=queue_b, durable=0, timeout=0, dynamic=false], initial-delivery-count=0] [0x151c710]:2 - @attach(18) [name=sender-xxx, handle=0, role=true, snd-settle-mode=2, rcv-settle-mode=0, source=@source(40) [address=queue_c, durable=0, timeout=0, dynamic=false], target=@target(41) [address=queue_c, durable=0, timeout=0, dynamic=false], initial-delivery-count=0] [0x151c710]:0 - @flow(19) [next-incoming-id=0, incoming-window=2147483647, next-outgoing-id=0, outgoing-window=0, handle=0, delivery-count=0, link-credit=341, drain=false] [0x151c710]:1 - @flow(19) [next-incoming-id=0, incoming-window=2147483647, next-outgoing-id=0, outgoing-window=0, handle=0, delivery-count=0, link-credit=341, drain=false] [0x151c710]:2 - @flow(19) [next-incoming-id=0, incoming-window=2147483647, next-outgoing-id=0, outgoing-window=0, handle=0, delivery-count=0, link-credit=341, drain=false] [0x151c710]:0 - @close(24) [error=@error(29) [condition=:amqp:connection:framing-error, description=remote channel 1 is above negotiated channel_max 0.]] [0x151c710]: - EOS [0x151c710]:0 - @close(24) [] [0x151c710]: - EOS {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PROTON-842) proton-c should honor channel_max
[ https://issues.apache.org/jira/browse/PROTON-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish resolved PROTON-842. Resolution: Fixed Last checkin fixed java tests. proton-c should honor channel_max - Key: PROTON-842 URL: https://issues.apache.org/jira/browse/PROTON-842 Project: Qpid Proton Issue Type: Bug Components: proton-j Affects Versions: 0.9, 0.10 Reporter: michael goulish Assignee: michael goulish proton-c code should use transport-channel_max and transport-remote_channel_max to enforce a limit on the maximum number of simultaneously active sessions on a connection. I guess the limit should be the minimum of those two numbers, or, if neither side sets a limit, then 2^16. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PROTON-919) make C impl behave like java wrt channel_max error
michael goulish created PROTON-919: -- Summary: make C impl behave like java wrt channel_max error Key: PROTON-919 URL: https://issues.apache.org/jira/browse/PROTON-919 Project: Qpid Proton Issue Type: Improvement Components: proton-c, python-binding Reporter: michael goulish Assignee: michael goulish Priority: Minor In the Java impl, I made TransportImpl throw an exception if the application tries to change the local channel_max setting after we have already sent the OPEN frame to the remote peer. ( Because at that point we communicate our channel_max limit to the peer -- no fair changing it afterwards.) One reviewer suggested that it would be nice if the C impl worked the same way. That would mean that pn_set_channel_max() would have to return a result code, which the Python binding would detect -- Python binding throws exception, python tests detect it -- so it would work same way as Java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PROTON-919) make C impl behave like java wrt channel_max error
[ https://issues.apache.org/jira/browse/PROTON-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598095#comment-14598095 ] michael goulish commented on PROTON-919: ~~~ NOTE ~~~ The proposed change alters the public API in that it changes pn_transport_set_channel_max() to return an int, rather than void. make C impl behave like java wrt channel_max error -- Key: PROTON-919 URL: https://issues.apache.org/jira/browse/PROTON-919 Project: Qpid Proton Issue Type: Improvement Components: proton-c, python-binding Reporter: michael goulish Assignee: michael goulish Priority: Minor In the Java impl, I made TransportImpl throw an exception if the application tries to change the local channel_max setting after we have already sent the OPEN frame to the remote peer. ( Because at that point we communicate our channel_max limit to the peer -- no fair changing it afterwards.) One reviewer suggested that it would be nice if the C impl worked the same way. That would mean that pn_set_channel_max() would have to return a result code, which the Python binding would detect -- Python binding throws exception, python tests detect it -- so it would work same way as Java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PROTON-842) proton-c should honor channel_max
[ https://issues.apache.org/jira/browse/PROTON-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish resolved PROTON-842. Resolution: Fixed commit e38957ae5115ec023993672ca5b7d5e3df414f7e proton-c should honor channel_max - Key: PROTON-842 URL: https://issues.apache.org/jira/browse/PROTON-842 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.9 Reporter: michael goulish Assignee: michael goulish proton-c code should use transport-channel_max and transport-remote_channel_max to enforce a limit on the maximum number of simultaneously active sessions on a connection. I guess the limit should be the minimum of those two numbers, or, if neither side sets a limit, then 2^16. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PROTON-842) proton-c should honor channel_max
[ https://issues.apache.org/jira/browse/PROTON-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591877#comment-14591877 ] michael goulish commented on PROTON-842: -- please note -- This fix changes API behavior in one way: pn_session can now return NULL if an attempt is made to create more sessions than are allowed by the value of channel_max. Previously, limitation on number of session was enforced by SEGV. proton-c should honor channel_max - Key: PROTON-842 URL: https://issues.apache.org/jira/browse/PROTON-842 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.9 Reporter: michael goulish Assignee: michael goulish proton-c code should use transport-channel_max and transport-remote_channel_max to enforce a limit on the maximum number of simultaneously active sessions on a connection. I guess the limit should be the minimum of those two numbers, or, if neither side sets a limit, then 2^16. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (PROTON-842) proton-c should honor channel_max
[ https://issues.apache.org/jira/browse/PROTON-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish reopened PROTON-842: My fix for proton-c is making trouble for proton-j proton-c should honor channel_max - Key: PROTON-842 URL: https://issues.apache.org/jira/browse/PROTON-842 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.9 Reporter: michael goulish Assignee: michael goulish proton-c code should use transport-channel_max and transport-remote_channel_max to enforce a limit on the maximum number of simultaneously active sessions on a connection. I guess the limit should be the minimum of those two numbers, or, if neither side sets a limit, then 2^16. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PROTON-896) change all static function names to begin with pni_
[ https://issues.apache.org/jira/browse/PROTON-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish reassigned PROTON-896: -- Assignee: michael goulish change all static function names to begin with pni_ --- Key: PROTON-896 URL: https://issues.apache.org/jira/browse/PROTON-896 Project: Qpid Proton Issue Type: Improvement Reporter: michael goulish Assignee: michael goulish Priority: Minor Change all the static function names to start with pni_ , and declare all functions as static that ought to be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PROTON-896) change all statis function names to begin with pni_
michael goulish created PROTON-896: -- Summary: change all statis function names to begin with pni_ Key: PROTON-896 URL: https://issues.apache.org/jira/browse/PROTON-896 Project: Qpid Proton Issue Type: Improvement Reporter: michael goulish Priority: Minor Change all the static function names to start with pni_ , and declare all functions as static that ought to be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PROTON-896) change all static function names to begin with pni_
[ https://issues.apache.org/jira/browse/PROTON-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish updated PROTON-896: --- Summary: change all static function names to begin with pni_ (was: change all statis function names to begin with pni_) change all static function names to begin with pni_ --- Key: PROTON-896 URL: https://issues.apache.org/jira/browse/PROTON-896 Project: Qpid Proton Issue Type: Improvement Reporter: michael goulish Priority: Minor Change all the static function names to start with pni_ , and declare all functions as static that ought to be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PROTON-864) don't crash when channel number goes high
[ https://issues.apache.org/jira/browse/PROTON-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish updated PROTON-864: --- Summary: don't crash when channel number goes high (was: avoid crashes when channel number goes high.) don't crash when channel number goes high - Key: PROTON-864 URL: https://issues.apache.org/jira/browse/PROTON-864 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.9 Reporter: michael goulish Assignee: michael goulish Code in transport.c, and a little in engine.c, looks at the topmost bit in channel numbers to decide if the channels are in use. This causes crashes when the number of channels in a single connection goes beyond 32767. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PROTON-864) avoid crashes when channel number goes high.
[ https://issues.apache.org/jira/browse/PROTON-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish updated PROTON-864: --- Summary: avoid crashes when channel number goes high. (was: don't overload top bit of channel numbers ) avoid crashes when channel number goes high. Key: PROTON-864 URL: https://issues.apache.org/jira/browse/PROTON-864 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.9 Reporter: michael goulish Assignee: michael goulish Code in transport.c, and a little in engine.c, looks at the topmost bit in channel numbers to decide if the channels are in use. This causes crashes when the number of channels in a single connection goes beyond 32767. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PROTON-888) allocate_alias linear search becomes slow at scale
michael goulish created PROTON-888: -- Summary: allocate_alias linear search becomes slow at scale Key: PROTON-888 URL: https://issues.apache.org/jira/browse/PROTON-888 Project: Qpid Proton Issue Type: Improvement Reporter: michael goulish Testing that I have done recently goes to large scale on number of sessions per connection. I noticed that the test was slowing down rapidly over time, in terms of how many sessions were being established per unit time. The function allocate_alias in file transport.c uses a linear search through an array to find the next available channel number for a session (or the next available handle number for a link). In a usage scenario like mine in which many sessions will be established, this becomes very slow as the array fills up. At the beginning of my test, this function is too fast to measure. By the end, it is using more than 82 milliseconds per call. Overall, this function alone is contributing more than 20 seconds to my 3-minute test. This is not an unrealistic scenario -- we already have one potential customer who is interested in going to this kind of scale. (Which is why I was doing this test.) Maybe we can find an implementation that does not slow down the common scale, and yet behaves better at the high end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PROTON-886) make proton enforce handle-max
michael goulish created PROTON-886: -- Summary: make proton enforce handle-max Key: PROTON-886 URL: https://issues.apache.org/jira/browse/PROTON-886 Project: Qpid Proton Issue Type: Bug Reporter: michael goulish Make the code enforce limits on handles (and links) from section 2.7.2 of the AMQP 1.0 spec. The handle-max value is the highest handle value that can be used on the session. A peer MUST NOT attempt to attach a link using a handle value outside the range that its partner can handle. A peer that receives a handle outside the supported range MUST close the connection with the framing-error error-code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PROTON-864) don't overload top bit of channel numbers
michael goulish created PROTON-864: -- Summary: don't overload top bit of channel numbers Key: PROTON-864 URL: https://issues.apache.org/jira/browse/PROTON-864 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.9 Reporter: michael goulish Assignee: michael goulish Code in transport.c, and a little in engine.c, looks at the topmost bit in channel numbers to decide if the channels are in use. This causes crashes when the number of channels in a single connection goes beyond 32767. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PROTON-826) recent checkin causes frequent double-free or corruption crash
[ https://issues.apache.org/jira/browse/PROTON-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336824#comment-14336824 ] michael goulish commented on PROTON-826: It looks like the problem here is just that the error struct used in proton-c/src/error.c is not thread safe -- so I am opening a new Jira for Dispatch. I am leaving this one open for now, however, because other applications using proton will encounter this. Either something could be changed in proton to make this less thread-hostile, or ... it could be publicized better? Please feel free to close when appropriate. recent checkin causes frequent double-free or corruption crash -- Key: PROTON-826 URL: https://issues.apache.org/jira/browse/PROTON-826 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.9 Reporter: michael goulish Priority: Blocker In my dispatch testing I am seeing frequent crashes in proton library that began with proton checkin 01cb00c on 2015-02-15 report read and write errors through the transport The output at crash-time says this: --- *** Error in `/home/mick/dispatch/install/sbin/qdrouterd': double free or corruption (fasttop): 0x020ee880 *** === Backtrace: = /lib64/libc.so.6[0x3e3d875a4f] /lib64/libc.so.6[0x3e3d87cd78] /lib64/libqpid-proton.so.2(pn_error_clear+0x18)[0x7f4f4f4e1f18] /lib64/libqpid-proton.so.2(pn_error_set+0x11)[0x7f4f4f4e1f41] /lib64/libqpid-proton.so.2(pn_error_vformat+0x3e)[0x7f4f4f4e1f9e] /lib64/libqpid-proton.so.2(pn_error_format+0x82)[0x7f4f4f4e2032] /lib64/libqpid-proton.so.2(pn_i_error_from_errno+0x67)[0x7f4f4f4fd737] /lib64/libqpid-proton.so.2(pn_recv+0x5a)[0x7f4f4f4fd16a] /home/mick/dispatch/install/lib64/libqpid-dispatch.so.0(qdpn_connector_process+0xd7)[0x7f4f4f759430] The backtrace from the core file looks like this: #0 0x003e3d835877 in raise () from /lib64/libc.so.6 #1 0x003e3d836f68 in abort () from /lib64/libc.so.6 #2 0x003e3d875a54 in __libc_message () from /lib64/libc.so.6 #3 0x003e3d87cd78 in _int_free () from /lib64/libc.so.6 #4 0x7fbf8a59b2e8 in pn_error_clear (error=error@entry=0x1501140) at /home/mick/rh-qpid-proton/proton-c/src/error.c:56 #5 0x7fbf8a59b311 in pn_error_set (error=error@entry=0x1501140, code=code@entry=-2, text=text@entry=0x7fbf801a69c0 recv: Resource temporarily unavailable) at /home/mick/rh-qpid-proton/proton-c/src/error.c:65 #6 0x7fbf8a59b36e in pn_error_vformat (error=0x1501140, code=-2, fmt=optimized out, ap=ap@entry=0x7fbf801a6de8) at /home/mick/rh-qpid-proton/proton-c/src/error.c:81 #7 0x7fbf8a59b402 in pn_error_format (error=error@entry=0x1501140, code=optimized out, fmt=fmt@entry=0x7fbf8a5bb21e %s: %s) at /home/mick/rh-qpid-proton/proton-c/src/error.c:89 #8 0x7fbf8a5b6797 in pn_i_error_from_errno (error=0x1501140, msg=msg@entry=0x7fbf8a5bbe1a recv) at /home/mick/rh-qpid-proton/proton-c/src/platform.c:119 #9 0x7fbf8a5b61ca in pn_recv (io=0x14e77b0, socket=optimized out, buf=optimized out, size=optimized out) at /home/mick/rh-qpid-proton/proton-c/src/posix/io.c:271 #10 0x7fbf8a812430 in qdpn_connector_process (c=0x7fbf7801c7f0) - And I can prevent the crash from happening, apparently forever, by commenting out this line: free(error-text); in the function pn_error_clear in the file proton-c/src/error.c The error text that is being freed which causes the crash looks like this: $2 = {text = 0x7f66e8104e30 recv: Resource temporarily unavailable, root = 0x0, code = -2} My dispatch test creates a router network and then repeatedly kills and restarts a randomly-selected router. After this proton checkin it almost never gets through 5 iterations without this crash. After I commented out that line, it got through more than 500 iterations before I stopped it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PROTON-826) recent checkin causes frequent double-free or corruption crash
michael goulish created PROTON-826: -- Summary: recent checkin causes frequent double-free or corruption crash Key: PROTON-826 URL: https://issues.apache.org/jira/browse/PROTON-826 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.9 Reporter: michael goulish Priority: Blocker In my dispatch testing I am seeing frequent crashes in proton library that began with proton checkin 01cb00c on 2015-02-15 report read and write errors through the transport The output at crash-time says this: --- *** Error in `/home/mick/dispatch/install/sbin/qdrouterd': double free or corruption (fasttop): 0x020ee880 *** === Backtrace: = /lib64/libc.so.6[0x3e3d875a4f] /lib64/libc.so.6[0x3e3d87cd78] /lib64/libqpid-proton.so.2(pn_error_clear+0x18)[0x7f4f4f4e1f18] /lib64/libqpid-proton.so.2(pn_error_set+0x11)[0x7f4f4f4e1f41] /lib64/libqpid-proton.so.2(pn_error_vformat+0x3e)[0x7f4f4f4e1f9e] /lib64/libqpid-proton.so.2(pn_error_format+0x82)[0x7f4f4f4e2032] /lib64/libqpid-proton.so.2(pn_i_error_from_errno+0x67)[0x7f4f4f4fd737] /lib64/libqpid-proton.so.2(pn_recv+0x5a)[0x7f4f4f4fd16a] /home/mick/dispatch/install/lib64/libqpid-dispatch.so.0(qdpn_connector_process+0xd7)[0x7f4f4f759430] The backtrace from the core file looks like this: #0 0x003e3d835877 in raise () from /lib64/libc.so.6 #1 0x003e3d836f68 in abort () from /lib64/libc.so.6 #2 0x003e3d875a54 in __libc_message () from /lib64/libc.so.6 #3 0x003e3d87cd78 in _int_free () from /lib64/libc.so.6 #4 0x7fbf8a59b2e8 in pn_error_clear (error=error@entry=0x1501140) at /home/mick/rh-qpid-proton/proton-c/src/error.c:56 #5 0x7fbf8a59b311 in pn_error_set (error=error@entry=0x1501140, code=code@entry=-2, text=text@entry=0x7fbf801a69c0 recv: Resource temporarily unavailable) at /home/mick/rh-qpid-proton/proton-c/src/error.c:65 #6 0x7fbf8a59b36e in pn_error_vformat (error=0x1501140, code=-2, fmt=optimized out, ap=ap@entry=0x7fbf801a6de8) at /home/mick/rh-qpid-proton/proton-c/src/error.c:81 #7 0x7fbf8a59b402 in pn_error_format (error=error@entry=0x1501140, code=optimized out, fmt=fmt@entry=0x7fbf8a5bb21e %s: %s) at /home/mick/rh-qpid-proton/proton-c/src/error.c:89 #8 0x7fbf8a5b6797 in pn_i_error_from_errno (error=0x1501140, msg=msg@entry=0x7fbf8a5bbe1a recv) at /home/mick/rh-qpid-proton/proton-c/src/platform.c:119 #9 0x7fbf8a5b61ca in pn_recv (io=0x14e77b0, socket=optimized out, buf=optimized out, size=optimized out) at /home/mick/rh-qpid-proton/proton-c/src/posix/io.c:271 #10 0x7fbf8a812430 in qdpn_connector_process (c=0x7fbf7801c7f0) - And I can prevent the crash from happening, apparently forever, by commenting out this line: free(error-text); in the function pn_error_clear in the file proton-c/src/error.c The error text that is being freed which causes the crash looks like this: $2 = {text = 0x7f66e8104e30 recv: Resource temporarily unavailable, root = 0x0, code = -2} My dispatch test creates a router network and then repeatedly kills and restarts a randomly-selected router. After this proton checkin it almost never gets through 5 iterations without this crash. After I commented out that line, it got through more than 500 iterations before I stopped it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PROTON-703) inlining performance improvements
michael goulish created PROTON-703: -- Summary: inlining performance improvements Key: PROTON-703 URL: https://issues.apache.org/jira/browse/PROTON-703 Project: Qpid Proton Issue Type: Improvement Components: proton-c Reporter: michael goulish Assignee: michael goulish Priority: Minor omnibus jira for any other inlining performance improvements i may find. notes to self: * don't affect public APIs. * don't forget to test Debug build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PROTON-700) small performance improvement from inling one fn.
michael goulish created PROTON-700: -- Summary: small performance improvement from inling one fn. Key: PROTON-700 URL: https://issues.apache.org/jira/browse/PROTON-700 Project: Qpid Proton Issue Type: Improvement Components: proton-c Reporter: michael goulish Assignee: michael goulish Priority: Minor inlining the internal function pn_data_node() improves speed somewhere between 2.6% and 6%, depending on architecture. This is based on testing I did with two C-based clients written at the engine interface level. The higher 6% figure was seen on a more modern machine with recent Intel processors, the lower figure was seen on an older box with AMD processors. But the effect is real: after 5- repetition before the change 50 after, T-test indicates odds of this happening by chance is 2.0e-18 . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PROTON-625) Biggest Backtrace Ever!
[ https://issues.apache.org/jira/browse/PROTON-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051826#comment-14051826 ] michael goulish commented on PROTON-625: Here's what happens, and a fix. 1. pni_map_entry() calls pni_map_ensure() to make sure map has enough capacity. 2. The capacity-increasing loop in pni_map_ensure() has two conditions on it: increase the capacity if map-capacity is too small, or if map 'load' is greater than map-load_factor. ( Map load is ... meaning not obvious to me. ) 3. If pni_map_ensure() returns true, then pni_map_entry() will call itself recursively, and keep doing that until pni_map_ensure() returns false. 'False' means 'I made no change.' 4. But it is possible for pni_map_ensure() to make no change, and yet return true. Here is how it happened in my most recent test: map-capacity 512 capacity 331 pni_map_load(map) 0.75 map-load_factor 0.75 5. Those values made *both* conditions on the capacity- increasing loop in pni_map_ensure() false. So it didn't do anything to change the map. But it returned true. So pni_map_entry() called itself. But nothing had changed. And away we go. FIX Make the test on the if at the top of pni_map_ensure say this: if (capacity = map-capacity load = map-load_factor) { ( Added '=' to the load test. ) After that, I ran twenty tests with no failure. Previously, failure probability on my system was 0.3.So odds of 20 in a row happening by chance is a little less than 1 in 1000. Biggest Backtrace Ever! --- Key: PROTON-625 URL: https://issues.apache.org/jira/browse/PROTON-625 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.8 Reporter: michael goulish I am saving all my stuff so I can repro on demand. It doesn't happen every time, but it's about 50%. -- On one box, I have a dispatch router. On the other box, I have 10 clients: 5 Messenger-based receivers, and 5 qpid-messaging-based senders. Each client will handle 100 addresses, of the form mick/0 ... mick/1 ... c. 100 messages will be sent to each address. I start the 5 receivers first. They start OK. Dispatch router happy stable. Wait a few seconds. I start the 5 senders, from a bash script. The first sender is already sending when the 2nd, 3rd, 4th start. After a few of them start,but before all have finished starting, a few seconds into the script, the crash occurs. ( If they all start up successfully, no crash. ) The crash occurs in the dispatch router. Here is the biggest backtrace ever: #0 0x003cf9879ad1 in _int_malloc (av=0x7f101c20, bytes=16384) at malloc.c:4383 #1 0x003cf987a911 in __libc_malloc (bytes=16384) at malloc.c:3664 #2 0x0039c6c1650a in pni_map_allocate () from /usr/lib64/libqpid-proton.so.2 #3 0x0039c6c16a3a in pni_map_ensure () from /usr/lib64/libqpid-proton.so.2 #4 0x0039c6c16c45 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #5 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #6 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #7 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #8 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #9 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #10 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #11 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #12 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #13 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #14 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 . . . . #93549 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93550 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93551 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93552 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93553 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93554 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93555 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93556 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93557 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93558
[jira] [Commented] (PROTON-625) Biggest Backtrace Ever!
[ https://issues.apache.org/jira/browse/PROTON-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049897#comment-14049897 ] michael goulish commented on PROTON-625: I had some confusion about what libraries were being picked up. Sorry! This bug is *not* present on 0.7 ! I was able to run 0.7-based dispatch-router 10 times with no failure. Then, switching to latest proton trunk code as of today -- 2 out of first 3 tests resulted in this failure. Biggest Backtrace Ever! --- Key: PROTON-625 URL: https://issues.apache.org/jira/browse/PROTON-625 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.8 Reporter: michael goulish I am saving all my stuff so I can repro on demand. It doesn't happen every time, but it's about 50%. -- On one box, I have a dispatch router. On the other box, I have 10 clients: 5 Messenger-based receivers, and 5 qpid-messaging-based senders. Each client will handle 100 addresses, of the form mick/0 ... mick/1 ... c. 100 messages will be sent to each address. I start the 5 receivers first. They start OK. Dispatch router happy stable. Wait a few seconds. I start the 5 senders, from a bash script. The first sender is already sending when the 2nd, 3rd, 4th start. After a few of them start,but before all have finished starting, a few seconds into the script, the crash occurs. ( If they all start up successfully, no crash. ) The crash occurs in the dispatch router. Here is the biggest backtrace ever: #0 0x003cf9879ad1 in _int_malloc (av=0x7f101c20, bytes=16384) at malloc.c:4383 #1 0x003cf987a911 in __libc_malloc (bytes=16384) at malloc.c:3664 #2 0x0039c6c1650a in pni_map_allocate () from /usr/lib64/libqpid-proton.so.2 #3 0x0039c6c16a3a in pni_map_ensure () from /usr/lib64/libqpid-proton.so.2 #4 0x0039c6c16c45 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #5 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #6 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #7 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #8 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #9 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #10 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #11 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #12 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #13 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #14 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 . . . . #93549 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93550 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93551 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93552 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93553 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93554 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93555 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93556 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93557 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93558 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93559 0x0039c6c16dc0 in pn_map_put () from /usr/lib64/libqpid-proton.so.2 #93560 0x0039c6c17226 in pn_hash_put () from /usr/lib64/libqpid-proton.so.2 #93561 0x0039c6c2a643 in pn_delivery_map_push () from /usr/lib64/libqpid-proton.so.2 #93562 0x0039c6c2c44b in pn_do_transfer () from /usr/lib64/libqpid-proton.so.2 #93563 0x0039c6c24385 in pn_dispatch_frame () from /usr/lib64/libqpid-proton.so.2 #93564 0x0039c6c2448f in pn_dispatcher_input () from /usr/lib64/libqpid-proton.so.2 #93565 0x0039c6c2d68b in pn_input_read_amqp () from /usr/lib64/libqpid-proton.so.2 #93566 0x0039c6c3011a in pn_io_layer_input_passthru () from /usr/lib64/libqpid-proton.so.2 #93567 0x0039c6c3011a in pn_io_layer_input_passthru () from /usr/lib64/libqpid-proton.so.2 #93568 0x0039c6c2d275 in transport_consume () from /usr/lib64/libqpid-proton.so.2 #93569 0x0039c6c304cd in pn_transport_process () from /usr/lib64/libqpid-proton.so.2 #93570 0x0039c6c3e40c in pn_connector_process () from /usr/lib64/libqpid-proton.so.2 #93571 0x7f1060c60460 in process_connector () from /home/mick/dispatch/build/libqpid-dispatch.so.0
[jira] [Commented] (PROTON-625) Biggest Backtrace Ever!
[ https://issues.apache.org/jira/browse/PROTON-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051007#comment-14051007 ] michael goulish commented on PROTON-625: Here is a hack that fixes it. A little new code in pni_map_ensure(). Tested this on latest protonics, version 1607485. Without hack: 3 failures out of 10 tests. (similar to what I have been seeing on other versions.) With hack: 0 failures out of 13 tests. ( probability this happened by chance: less that 1% ) So, now I'm trying to see how it should *really* be fixed... --- code --- code --- code --- code --- code --- code --- code --- code --- code --- // This loop is what is already there, in pni_map_ensure. No change. while (map-capacity capacity || pni_map_load(map) map-load_factor) { map-capacity *= 2; map-addressable = (size_t) (0.86 * map-capacity); } /*--- If ever we get past the above while-loop without actually having changed map-cap, we are doomed to eternal torment. So, force it. ---*/ if ( oldcap == map-capacity ) { fprintf ( stderr, Fiery the angels fell; deep thunder rolled around their shores, burning with the fires of Orc!\n ); map-capacity *= 2; map-addressable = (size_t) (0.86 * map-capacity); } Biggest Backtrace Ever! --- Key: PROTON-625 URL: https://issues.apache.org/jira/browse/PROTON-625 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.8 Reporter: michael goulish I am saving all my stuff so I can repro on demand. It doesn't happen every time, but it's about 50%. -- On one box, I have a dispatch router. On the other box, I have 10 clients: 5 Messenger-based receivers, and 5 qpid-messaging-based senders. Each client will handle 100 addresses, of the form mick/0 ... mick/1 ... c. 100 messages will be sent to each address. I start the 5 receivers first. They start OK. Dispatch router happy stable. Wait a few seconds. I start the 5 senders, from a bash script. The first sender is already sending when the 2nd, 3rd, 4th start. After a few of them start,but before all have finished starting, a few seconds into the script, the crash occurs. ( If they all start up successfully, no crash. ) The crash occurs in the dispatch router. Here is the biggest backtrace ever: #0 0x003cf9879ad1 in _int_malloc (av=0x7f101c20, bytes=16384) at malloc.c:4383 #1 0x003cf987a911 in __libc_malloc (bytes=16384) at malloc.c:3664 #2 0x0039c6c1650a in pni_map_allocate () from /usr/lib64/libqpid-proton.so.2 #3 0x0039c6c16a3a in pni_map_ensure () from /usr/lib64/libqpid-proton.so.2 #4 0x0039c6c16c45 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #5 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #6 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #7 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #8 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #9 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #10 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #11 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #12 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #13 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #14 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 . . . . #93549 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93550 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93551 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93552 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93553 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93554 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93555 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93556 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93557 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93558 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93559 0x0039c6c16dc0 in pn_map_put () from /usr/lib64/libqpid-proton.so.2 #93560 0x0039c6c17226 in pn_hash_put () from /usr/lib64/libqpid-proton.so.2 #93561 0x0039c6c2a643 in pn_delivery_map_push () from /usr/lib64/libqpid-proton.so.2 #93562 0x0039c6c2c44b
[jira] [Commented] (PROTON-625) Biggest Backtrace Ever!
[ https://issues.apache.org/jira/browse/PROTON-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048569#comment-14048569 ] michael goulish commented on PROTON-625: BTW -- I kill and restart the router after each test. Biggest Backtrace Ever! --- Key: PROTON-625 URL: https://issues.apache.org/jira/browse/PROTON-625 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.7 Reporter: michael goulish I am saving all my stuff so I can repro on demand. It doesn't happen every time, but it's about 50%. -- On one box, I have a dispatch router. On the other box, I have 10 clients: 5 Messenger-based receivers, and 5 qpid-messaging-based senders. Each client will handle 100 addresses, of the form mick/0 ... mick/1 ... c. 100 messages will be sent to each address. I start the 5 receivers first. They start OK. Dispatch router happy stable. Wait a few seconds. I start the 5 senders, from a bash script. The first sender is already sending when the 2nd, 3rd, 4th start. After a few of them start,but before all have finished starting, a few seconds into the script, the crash occurs. ( If they all start up successfully, no crash. ) The crash occurs in the dispatch router. Here is the biggest backtrace ever: #0 0x003cf9879ad1 in _int_malloc (av=0x7f101c20, bytes=16384) at malloc.c:4383 #1 0x003cf987a911 in __libc_malloc (bytes=16384) at malloc.c:3664 #2 0x0039c6c1650a in pni_map_allocate () from /usr/lib64/libqpid-proton.so.2 #3 0x0039c6c16a3a in pni_map_ensure () from /usr/lib64/libqpid-proton.so.2 #4 0x0039c6c16c45 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #5 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #6 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #7 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #8 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #9 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #10 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #11 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #12 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #13 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #14 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 . . . . #93549 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93550 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93551 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93552 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93553 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93554 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93555 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93556 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93557 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93558 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93559 0x0039c6c16dc0 in pn_map_put () from /usr/lib64/libqpid-proton.so.2 #93560 0x0039c6c17226 in pn_hash_put () from /usr/lib64/libqpid-proton.so.2 #93561 0x0039c6c2a643 in pn_delivery_map_push () from /usr/lib64/libqpid-proton.so.2 #93562 0x0039c6c2c44b in pn_do_transfer () from /usr/lib64/libqpid-proton.so.2 #93563 0x0039c6c24385 in pn_dispatch_frame () from /usr/lib64/libqpid-proton.so.2 #93564 0x0039c6c2448f in pn_dispatcher_input () from /usr/lib64/libqpid-proton.so.2 #93565 0x0039c6c2d68b in pn_input_read_amqp () from /usr/lib64/libqpid-proton.so.2 #93566 0x0039c6c3011a in pn_io_layer_input_passthru () from /usr/lib64/libqpid-proton.so.2 #93567 0x0039c6c3011a in pn_io_layer_input_passthru () from /usr/lib64/libqpid-proton.so.2 #93568 0x0039c6c2d275 in transport_consume () from /usr/lib64/libqpid-proton.so.2 #93569 0x0039c6c304cd in pn_transport_process () from /usr/lib64/libqpid-proton.so.2 #93570 0x0039c6c3e40c in pn_connector_process () from /usr/lib64/libqpid-proton.so.2 #93571 0x7f1060c60460 in process_connector () from /home/mick/dispatch/build/libqpid-dispatch.so.0 #93572 0x7f1060c61017 in thread_run () from /home/mick/dispatch/build/libqpid-dispatch.so.0 #93573 0x003cf9c07851 in start_thread (arg=0x7f1052bfd700) at pthread_create.c:301 #93574 0x003cf98e890d in clone () at
[jira] [Commented] (PROTON-625) Biggest Backtrace Ever!
[ https://issues.apache.org/jira/browse/PROTON-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048573#comment-14048573 ] michael goulish commented on PROTON-625: When I put usleep(1000) after each message sent, I have zero failures in 10 tries. Biggest Backtrace Ever! --- Key: PROTON-625 URL: https://issues.apache.org/jira/browse/PROTON-625 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.7 Reporter: michael goulish I am saving all my stuff so I can repro on demand. It doesn't happen every time, but it's about 50%. -- On one box, I have a dispatch router. On the other box, I have 10 clients: 5 Messenger-based receivers, and 5 qpid-messaging-based senders. Each client will handle 100 addresses, of the form mick/0 ... mick/1 ... c. 100 messages will be sent to each address. I start the 5 receivers first. They start OK. Dispatch router happy stable. Wait a few seconds. I start the 5 senders, from a bash script. The first sender is already sending when the 2nd, 3rd, 4th start. After a few of them start,but before all have finished starting, a few seconds into the script, the crash occurs. ( If they all start up successfully, no crash. ) The crash occurs in the dispatch router. Here is the biggest backtrace ever: #0 0x003cf9879ad1 in _int_malloc (av=0x7f101c20, bytes=16384) at malloc.c:4383 #1 0x003cf987a911 in __libc_malloc (bytes=16384) at malloc.c:3664 #2 0x0039c6c1650a in pni_map_allocate () from /usr/lib64/libqpid-proton.so.2 #3 0x0039c6c16a3a in pni_map_ensure () from /usr/lib64/libqpid-proton.so.2 #4 0x0039c6c16c45 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #5 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #6 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #7 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #8 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #9 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #10 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #11 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #12 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #13 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #14 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 . . . . #93549 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93550 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93551 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93552 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93553 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93554 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93555 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93556 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93557 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93558 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93559 0x0039c6c16dc0 in pn_map_put () from /usr/lib64/libqpid-proton.so.2 #93560 0x0039c6c17226 in pn_hash_put () from /usr/lib64/libqpid-proton.so.2 #93561 0x0039c6c2a643 in pn_delivery_map_push () from /usr/lib64/libqpid-proton.so.2 #93562 0x0039c6c2c44b in pn_do_transfer () from /usr/lib64/libqpid-proton.so.2 #93563 0x0039c6c24385 in pn_dispatch_frame () from /usr/lib64/libqpid-proton.so.2 #93564 0x0039c6c2448f in pn_dispatcher_input () from /usr/lib64/libqpid-proton.so.2 #93565 0x0039c6c2d68b in pn_input_read_amqp () from /usr/lib64/libqpid-proton.so.2 #93566 0x0039c6c3011a in pn_io_layer_input_passthru () from /usr/lib64/libqpid-proton.so.2 #93567 0x0039c6c3011a in pn_io_layer_input_passthru () from /usr/lib64/libqpid-proton.so.2 #93568 0x0039c6c2d275 in transport_consume () from /usr/lib64/libqpid-proton.so.2 #93569 0x0039c6c304cd in pn_transport_process () from /usr/lib64/libqpid-proton.so.2 #93570 0x0039c6c3e40c in pn_connector_process () from /usr/lib64/libqpid-proton.so.2 #93571 0x7f1060c60460 in process_connector () from /home/mick/dispatch/build/libqpid-dispatch.so.0 #93572 0x7f1060c61017 in thread_run () from /home/mick/dispatch/build/libqpid-dispatch.so.0 #93573 0x003cf9c07851 in start_thread (arg=0x7f1052bfd700) at pthread_create.c:301 #93574
[jira] [Created] (PROTON-625) Biggest Backtrace Ever!
michael goulish created PROTON-625: -- Summary: Biggest Backtrace Ever! Key: PROTON-625 URL: https://issues.apache.org/jira/browse/PROTON-625 Project: Qpid Proton Issue Type: Bug Components: proton-c Reporter: michael goulish I am saving all my stuff so I can repro on demand. It doesn't happen every time, but it's about 50%. -- On one box, I have a dispatch router. On the other box, I have 10 clients: 5 Messenger-based receivers, and 5 qpid-messaging-based senders. Each client will handle 100 addresses, of the form mick/0 ... mick/1 ... c. 100 messages will be sent to each address. I start the 5 receivers first. They start OK. Dispatch router happy stable. Wait a few seconds. I start the 5 senders, from a bash script. The first sender is already sending when the 2nd, 3rd, 4th start. After a few of them start,but before all have finished starting, a few seconds into the script, the crash occurs. ( If they all start up successfully, no crash. ) Here is the biggest backtrace ever: #0 0x003cf9879ad1 in _int_malloc (av=0x7f101c20, bytes=16384) at malloc.c:4383 #1 0x003cf987a911 in __libc_malloc (bytes=16384) at malloc.c:3664 #2 0x0039c6c1650a in pni_map_allocate () from /usr/lib64/libqpid-proton.so.2 #3 0x0039c6c16a3a in pni_map_ensure () from /usr/lib64/libqpid-proton.so.2 #4 0x0039c6c16c45 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #5 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #6 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #7 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #8 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #9 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #10 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #11 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #12 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #13 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #14 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 . . . . #93549 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93550 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93551 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93552 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93553 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93554 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93555 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93556 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93557 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93558 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93559 0x0039c6c16dc0 in pn_map_put () from /usr/lib64/libqpid-proton.so.2 #93560 0x0039c6c17226 in pn_hash_put () from /usr/lib64/libqpid-proton.so.2 #93561 0x0039c6c2a643 in pn_delivery_map_push () from /usr/lib64/libqpid-proton.so.2 #93562 0x0039c6c2c44b in pn_do_transfer () from /usr/lib64/libqpid-proton.so.2 #93563 0x0039c6c24385 in pn_dispatch_frame () from /usr/lib64/libqpid-proton.so.2 #93564 0x0039c6c2448f in pn_dispatcher_input () from /usr/lib64/libqpid-proton.so.2 #93565 0x0039c6c2d68b in pn_input_read_amqp () from /usr/lib64/libqpid-proton.so.2 #93566 0x0039c6c3011a in pn_io_layer_input_passthru () from /usr/lib64/libqpid-proton.so.2 #93567 0x0039c6c3011a in pn_io_layer_input_passthru () from /usr/lib64/libqpid-proton.so.2 #93568 0x0039c6c2d275 in transport_consume () from /usr/lib64/libqpid-proton.so.2 #93569 0x0039c6c304cd in pn_transport_process () from /usr/lib64/libqpid-proton.so.2 #93570 0x0039c6c3e40c in pn_connector_process () from /usr/lib64/libqpid-proton.so.2 #93571 0x7f1060c60460 in process_connector () from /home/mick/dispatch/build/libqpid-dispatch.so.0 #93572 0x7f1060c61017 in thread_run () from /home/mick/dispatch/build/libqpid-dispatch.so.0 #93573 0x003cf9c07851 in start_thread (arg=0x7f1052bfd700) at pthread_create.c:301 #93574 0x003cf98e890d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PROTON-625) Biggest Backtrace Ever!
[ https://issues.apache.org/jira/browse/PROTON-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish updated PROTON-625: --- Description: I am saving all my stuff so I can repro on demand. It doesn't happen every time, but it's about 50%. -- On one box, I have a dispatch router. On the other box, I have 10 clients: 5 Messenger-based receivers, and 5 qpid-messaging-based senders. Each client will handle 100 addresses, of the form mick/0 ... mick/1 ... c. 100 messages will be sent to each address. I start the 5 receivers first. They start OK. Dispatch router happy stable. Wait a few seconds. I start the 5 senders, from a bash script. The first sender is already sending when the 2nd, 3rd, 4th start. After a few of them start,but before all have finished starting, a few seconds into the script, the crash occurs. ( If they all start up successfully, no crash. ) The crash occurs in the dispatch router. Here is the biggest backtrace ever: #0 0x003cf9879ad1 in _int_malloc (av=0x7f101c20, bytes=16384) at malloc.c:4383 #1 0x003cf987a911 in __libc_malloc (bytes=16384) at malloc.c:3664 #2 0x0039c6c1650a in pni_map_allocate () from /usr/lib64/libqpid-proton.so.2 #3 0x0039c6c16a3a in pni_map_ensure () from /usr/lib64/libqpid-proton.so.2 #4 0x0039c6c16c45 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #5 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #6 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #7 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #8 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #9 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #10 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #11 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #12 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #13 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #14 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 . . . . #93549 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93550 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93551 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93552 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93553 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93554 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93555 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93556 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93557 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93558 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2 #93559 0x0039c6c16dc0 in pn_map_put () from /usr/lib64/libqpid-proton.so.2 #93560 0x0039c6c17226 in pn_hash_put () from /usr/lib64/libqpid-proton.so.2 #93561 0x0039c6c2a643 in pn_delivery_map_push () from /usr/lib64/libqpid-proton.so.2 #93562 0x0039c6c2c44b in pn_do_transfer () from /usr/lib64/libqpid-proton.so.2 #93563 0x0039c6c24385 in pn_dispatch_frame () from /usr/lib64/libqpid-proton.so.2 #93564 0x0039c6c2448f in pn_dispatcher_input () from /usr/lib64/libqpid-proton.so.2 #93565 0x0039c6c2d68b in pn_input_read_amqp () from /usr/lib64/libqpid-proton.so.2 #93566 0x0039c6c3011a in pn_io_layer_input_passthru () from /usr/lib64/libqpid-proton.so.2 #93567 0x0039c6c3011a in pn_io_layer_input_passthru () from /usr/lib64/libqpid-proton.so.2 #93568 0x0039c6c2d275 in transport_consume () from /usr/lib64/libqpid-proton.so.2 #93569 0x0039c6c304cd in pn_transport_process () from /usr/lib64/libqpid-proton.so.2 #93570 0x0039c6c3e40c in pn_connector_process () from /usr/lib64/libqpid-proton.so.2 #93571 0x7f1060c60460 in process_connector () from /home/mick/dispatch/build/libqpid-dispatch.so.0 #93572 0x7f1060c61017 in thread_run () from /home/mick/dispatch/build/libqpid-dispatch.so.0 #93573 0x003cf9c07851 in start_thread (arg=0x7f1052bfd700) at pthread_create.c:301 #93574 0x003cf98e890d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 was: I am saving all my stuff so I can repro on demand. It doesn't happen every time, but it's about 50%. -- On one box, I have a dispatch router. On the other box, I have 10 clients: 5 Messenger-based receivers, and 5 qpid-messaging-based senders. Each client will handle 100 addresses, of the form mick/0 ... mick/1 ... c. 100 messages will be sent to each address. I start the 5
[jira] [Commented] (PROTON-577) CollectorImpl creates a lot of unnecessary garbage
[ https://issues.apache.org/jira/browse/PROTON-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989829#comment-13989829 ] michael goulish commented on PROTON-577: What the engineer *means* to say is superfluous paraphernalia. CollectorImpl creates a lot of unnecessary garbage -- Key: PROTON-577 URL: https://issues.apache.org/jira/browse/PROTON-577 Project: Qpid Proton Issue Type: Improvement Components: proton-j Affects Versions: 0.7 Reporter: Rafael H. Schloming Assignee: Rafael H. Schloming -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PROTON-566) crash in pn_transport_set_max_frame
michael goulish created PROTON-566: -- Summary: crash in pn_transport_set_max_frame Key: PROTON-566 URL: https://issues.apache.org/jira/browse/PROTON-566 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.7 Environment: 3 boxes. 1 with senders, 1 with receivers, and 1 in the middle with a single router. Reporter: michael goulish Here's what I do: ( I have saved all relevant software so I can repro this. ) 1. On router box, start 1 router. 2. On receiver box, start 1000 receivers. With delays in between each group of 50, so as to avoid backlog problem. 3. After receivers are all started, start 1000 senders also with delays. Senders start up but do not yet begin sending until I manually signal them by touching a file. 4. Short time after sender start sending, qdrouter crashes in proton code, with this traceback: Core was generated by `/home/mick/dispatch/build/router/qdrouterd --config ./config_1/X.conf'. Program terminated with signal 11, Segmentation fault. #0 0x7f29d3c0f3c0 in pn_transport_set_max_frame (transport=0x0, size=65536) at /home/mick/proton/proton-c/src/transport/transport.c:1915 1915transport-local_max_frame = size; #0 0x7f8ad5a613c0 in pn_transport_set_max_frame (transport=0x0, size=65536) at /home/mick/proton/proton-c/src/transport/transport.c:1915 #1 0x7f8ad5cdd4bd in thread_process_listeners (qd_server=0x14f8e10) at /home/mick/dispatch/src/server.c:100 #2 0x7f8ad5cddedb in thread_run (arg=0x1490bf0) at /home/mick/dispatch/src/server.c:416 #3 0x003638c07de3 in start_thread () from /lib64/libpthread.so.0 #4 0x0036388f616d in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PROTON-566) crash in pn_transport_set_max_frame
[ https://issues.apache.org/jira/browse/PROTON-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish updated PROTON-566: --- Description: Here's what I do: ( I have saved all relevant software so I can repro this. ) 1. On router box, start 1 router. 2. On receiver box, start 1000 receivers. With delays in between each group of 50, so as to avoid backlog problem. 3. After receivers are all started, start 1000 senders also with delays. Senders start up but do not yet begin sending until I manually signal them by touching a file. 4. Short time after sender start sending, qdrouter crashes in proton code, with this traceback: Core was generated by `/home/mick/dispatch/build/router/qdrouterd --config ./config_1/X.conf'. Program terminated with signal 11, Segmentation fault. #0 0x7f29d3c0f3c0 in pn_transport_set_max_frame (transport=0x0, size=65536) at /home/mick/proton/proton-c/src/transport/transport.c:1915 1915transport-local_max_frame = size; #0 0x7f8ad5a613c0 in pn_transport_set_max_frame (transport=0x0, size=65536) at /home/mick/proton/proton-c/src/transport/transport.c:1915 #1 0x7f8ad5cdd4bd in thread_process_listeners (qd_server=0x14f8e10) at /home/mick/dispatch/src/server.c:100 #2 0x7f8ad5cddedb in thread_run (arg=0x1490bf0) at /home/mick/dispatch/src/server.c:416 #3 0x003638c07de3 in start_thread () from /lib64/libpthread.so.0 #4 0x0036388f616d in clone () from /lib64/libc.so.6 Looks like this is not a proton problem, but something in dispatch. I'm closing this and moving it was: Here's what I do: ( I have saved all relevant software so I can repro this. ) 1. On router box, start 1 router. 2. On receiver box, start 1000 receivers. With delays in between each group of 50, so as to avoid backlog problem. 3. After receivers are all started, start 1000 senders also with delays. Senders start up but do not yet begin sending until I manually signal them by touching a file. 4. Short time after sender start sending, qdrouter crashes in proton code, with this traceback: Core was generated by `/home/mick/dispatch/build/router/qdrouterd --config ./config_1/X.conf'. Program terminated with signal 11, Segmentation fault. #0 0x7f29d3c0f3c0 in pn_transport_set_max_frame (transport=0x0, size=65536) at /home/mick/proton/proton-c/src/transport/transport.c:1915 1915transport-local_max_frame = size; #0 0x7f8ad5a613c0 in pn_transport_set_max_frame (transport=0x0, size=65536) at /home/mick/proton/proton-c/src/transport/transport.c:1915 #1 0x7f8ad5cdd4bd in thread_process_listeners (qd_server=0x14f8e10) at /home/mick/dispatch/src/server.c:100 #2 0x7f8ad5cddedb in thread_run (arg=0x1490bf0) at /home/mick/dispatch/src/server.c:416 #3 0x003638c07de3 in start_thread () from /lib64/libpthread.so.0 #4 0x0036388f616d in clone () from /lib64/libc.so.6 crash in pn_transport_set_max_frame --- Key: PROTON-566 URL: https://issues.apache.org/jira/browse/PROTON-566 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.7 Environment: 3 boxes. 1 with senders, 1 with receivers, and 1 in the middle with a single router. Reporter: michael goulish Here's what I do: ( I have saved all relevant software so I can repro this. ) 1. On router box, start 1 router. 2. On receiver box, start 1000 receivers. With delays in between each group of 50, so as to avoid backlog problem. 3. After receivers are all started, start 1000 senders also with delays. Senders start up but do not yet begin sending until I manually signal them by touching a file. 4. Short time after sender start sending, qdrouter crashes in proton code, with this traceback: Core was generated by `/home/mick/dispatch/build/router/qdrouterd --config ./config_1/X.conf'. Program terminated with signal 11, Segmentation fault. #0 0x7f29d3c0f3c0 in pn_transport_set_max_frame (transport=0x0, size=65536) at /home/mick/proton/proton-c/src/transport/transport.c:1915 1915transport-local_max_frame = size; #0 0x7f8ad5a613c0 in pn_transport_set_max_frame (transport=0x0, size=65536) at /home/mick/proton/proton-c/src/transport/transport.c:1915 #1 0x7f8ad5cdd4bd in thread_process_listeners (qd_server=0x14f8e10) at /home/mick/dispatch/src/server.c:100 #2 0x7f8ad5cddedb in thread_run (arg=0x1490bf0) at /home/mick/dispatch/src/server.c:416 #3 0x003638c07de3 in start_thread () from /lib64/libpthread.so.0 #4 0x0036388f616d in clone () from /lib64/libc.so.6 Looks like this is not a
[jira] [Closed] (PROTON-566) crash in pn_transport_set_max_frame
[ https://issues.apache.org/jira/browse/PROTON-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish closed PROTON-566. -- Resolution: Fixed It looks like this is not a proton issue, but a dispatch issue. I'm closing this and moving it. crash in pn_transport_set_max_frame --- Key: PROTON-566 URL: https://issues.apache.org/jira/browse/PROTON-566 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.7 Environment: 3 boxes. 1 with senders, 1 with receivers, and 1 in the middle with a single router. Reporter: michael goulish Here's what I do: ( I have saved all relevant software so I can repro this. ) 1. On router box, start 1 router. 2. On receiver box, start 1000 receivers. With delays in between each group of 50, so as to avoid backlog problem. 3. After receivers are all started, start 1000 senders also with delays. Senders start up but do not yet begin sending until I manually signal them by touching a file. 4. Short time after sender start sending, qdrouter crashes in proton code, with this traceback: Core was generated by `/home/mick/dispatch/build/router/qdrouterd --config ./config_1/X.conf'. Program terminated with signal 11, Segmentation fault. #0 0x7f29d3c0f3c0 in pn_transport_set_max_frame (transport=0x0, size=65536) at /home/mick/proton/proton-c/src/transport/transport.c:1915 1915transport-local_max_frame = size; #0 0x7f8ad5a613c0 in pn_transport_set_max_frame (transport=0x0, size=65536) at /home/mick/proton/proton-c/src/transport/transport.c:1915 #1 0x7f8ad5cdd4bd in thread_process_listeners (qd_server=0x14f8e10) at /home/mick/dispatch/src/server.c:100 #2 0x7f8ad5cddedb in thread_run (arg=0x1490bf0) at /home/mick/dispatch/src/server.c:416 #3 0x003638c07de3 in start_thread () from /lib64/libpthread.so.0 #4 0x0036388f616d in clone () from /lib64/libc.so.6 Looks like this is not a proton problem, but something in dispatch. I'm closing this and moving it -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PROTON-452) Ruby API doesn't have pn_messenger_interrupt()
michael goulish created PROTON-452: -- Summary: Ruby API doesn't have pn_messenger_interrupt() Key: PROTON-452 URL: https://issues.apache.org/jira/browse/PROTON-452 Project: Qpid Proton Issue Type: Bug Affects Versions: 0.5 Reporter: michael goulish It looks like the Ruby binding doesn't cover the new-ish C function pn_messenger_interrupt(). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (PROTON-260) Messenger Documentation
[ https://issues.apache.org/jira/browse/PROTON-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797027#comment-13797027 ] michael goulish edited comment on PROTON-260 at 10/16/13 5:30 PM: -- rev 152 -- checked in new C API doxygen comments in messenger.h was (Author: mgoulish): rev r152 -- checked in new C API doxygen comments in messenger.h Messenger Documentation --- Key: PROTON-260 URL: https://issues.apache.org/jira/browse/PROTON-260 Project: Qpid Proton Issue Type: Improvement Components: proton-c Affects Versions: 0.5 Reporter: michael goulish Assignee: michael goulish Write documentation for the Proton Messenger interface, to include: introduction API explanations theory of operation example programs programming idioms tutorials quickstarts troubleshooting Documents should use MarkDown markup language. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PROTON-260) Messenger Documentation
[ https://issues.apache.org/jira/browse/PROTON-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797027#comment-13797027 ] michael goulish commented on PROTON-260: rev r152 -- checked in new C API doxygen comments in messenger.h Messenger Documentation --- Key: PROTON-260 URL: https://issues.apache.org/jira/browse/PROTON-260 Project: Qpid Proton Issue Type: Improvement Components: proton-c Affects Versions: 0.5 Reporter: michael goulish Assignee: michael goulish Write documentation for the Proton Messenger interface, to include: introduction API explanations theory of operation example programs programming idioms tutorials quickstarts troubleshooting Documents should use MarkDown markup language. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (PROTON-300) qpidd --help should show sasl config path default
michael goulish created PROTON-300: -- Summary: qpidd --help should show sasl config path default Key: PROTON-300 URL: https://issues.apache.org/jira/browse/PROTON-300 Project: Qpid Proton Issue Type: Bug Reporter: michael goulish Priority: Minor qpidd --help does not show the sasl config path default, which is /etc/sasl2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PROTON-300) qpidd --help should show sasl config path default
[ https://issues.apache.org/jira/browse/PROTON-300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish updated PROTON-300: --- Assignee: michael goulish qpidd --help should show sasl config path default - Key: PROTON-300 URL: https://issues.apache.org/jira/browse/PROTON-300 Project: Qpid Proton Issue Type: Bug Reporter: michael goulish Assignee: michael goulish Priority: Minor qpidd --help does not show the sasl config path default, which is /etc/sasl2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PROTON-295) recv(-1) + incoming_window == bad
michael goulish created PROTON-295: -- Summary: recv(-1) + incoming_window == bad Key: PROTON-295 URL: https://issues.apache.org/jira/browse/PROTON-295 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.4 Reporter: michael goulish Use of recv(-1) could receive enough messages that some would exceed the incoming window size and be automatically accepted -- with app logic never getting a say in the matter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PROTON-260) Messenger Documentation
[ https://issues.apache.org/jira/browse/PROTON-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621232#comment-13621232 ] michael goulish commented on PROTON-260: rev 1464126 -- new version of message_disposition.md based on Rafi's and Alan's comments. Messenger Documentation --- Key: PROTON-260 URL: https://issues.apache.org/jira/browse/PROTON-260 Project: Qpid Proton Issue Type: Improvement Components: proton-c Affects Versions: 0.5 Reporter: michael goulish Assignee: michael goulish Write documentation for the Proton Messenger interface, to include: introduction API explanations theory of operation example programs programming idioms tutorials quickstarts troubleshooting Documents should use MarkDown markup language. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PROTON-260) Messenger Documentation
[ https://issues.apache.org/jira/browse/PROTON-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602578#comment-13602578 ] michael goulish commented on PROTON-260: Checked in trunk/docs/messenger message_disposition.md -- rev 1456600 . Messenger Documentation --- Key: PROTON-260 URL: https://issues.apache.org/jira/browse/PROTON-260 Project: Qpid Proton Issue Type: Improvement Components: proton-c Affects Versions: 0.5 Reporter: michael goulish Assignee: michael goulish Write documentation for the Proton Messenger interface, to include: introduction API explanations theory of operation example programs programming idioms tutorials quickstarts troubleshooting Documents should use MarkDown markup language. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PROTON-260) Messenger Documentation
michael goulish created PROTON-260: -- Summary: Messenger Documentation Key: PROTON-260 URL: https://issues.apache.org/jira/browse/PROTON-260 Project: Qpid Proton Issue Type: Improvement Components: proton-c Affects Versions: 0.5 Reporter: michael goulish Write documentation for the Proton Messenger interface, to include: introduction API explanations theory of operation example programs programming idioms tutorials quickstarts troubleshooting -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PROTON-260) Messenger Documentation
[ https://issues.apache.org/jira/browse/PROTON-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish updated PROTON-260: --- Description: Write documentation for the Proton Messenger interface, to include: introduction API explanations theory of operation example programs programming idioms tutorials quickstarts troubleshooting Documents should use MarkDown markup language. was: Write documentation for the Proton Messenger interface, to include: introduction API explanations theory of operation example programs programming idioms tutorials quickstarts troubleshooting Messenger Documentation --- Key: PROTON-260 URL: https://issues.apache.org/jira/browse/PROTON-260 Project: Qpid Proton Issue Type: Improvement Components: proton-c Affects Versions: 0.5 Reporter: michael goulish Write documentation for the Proton Messenger interface, to include: introduction API explanations theory of operation example programs programming idioms tutorials quickstarts troubleshooting Documents should use MarkDown markup language. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PROTON-243) 0.4 RC1 libqpid-proton not found
michael goulish created PROTON-243: -- Summary: 0.4 RC1 libqpid-proton not found Key: PROTON-243 URL: https://issues.apache.org/jira/browse/PROTON-243 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.4 Environment: Fedora 18 Reporter: michael goulish All build steps went well, following the README directions, until I got to building the C examples here is what happened then: ( executive summary: I had to set LD_LIBRARY_PATH to get libqpid-proton to be findable at link time. ) cd ../examples/messenger/c cmake . make ./recv ./recv: error while loading shared libraries: libqpid-proton.so.1: cannot open shared object file: No such file or directory # Uh-oh. ldd recv linux-vdso.so.1 = (0x7fff0396) libqpid-proton.so.1 = not found libc.so.6 = /lib64/libc.so.6 (0x7f5dfc48f000) /lib64/ld-linux-x86-64.so.2 (0x7f5dfc851000) export LD_LIBRARY_PATH=/usr/lib ./recv # It's Happy ! ./send Address: amqp://0.0.0.0 Subject: (no subject) Content: Hello World! # Hooray ! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PROTON-200) [Proton-c] Credit distribution by messenger is not balanced across all links
[ https://issues.apache.org/jira/browse/PROTON-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13580317#comment-13580317 ] michael goulish commented on PROTON-200: I think this bug is a release blocker. An implication, that is not immediately obvious, is that if you have one receiver with two senders -- one of the senders will hang until the other receiver calls 'stop'. This makes it impossible to set up any topology except the simplest possible -- one sender, one receiver. [Proton-c] Credit distribution by messenger is not balanced across all links Key: PROTON-200 URL: https://issues.apache.org/jira/browse/PROTON-200 Project: Qpid Proton Issue Type: Bug Components: proton-c Affects Versions: 0.3 Reporter: Ken Giusti Assignee: Ken Giusti Fix For: 0.4 The method used to distribute credit to receiving links may lead to starvation when the number of receiving links is the available credit. The distribution algorithm always starts with the same link - see messenger.c::pn_messenger_flow() -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PROTON-222) pn_messenger_send returns before message data has been written to the wire
[ https://issues.apache.org/jira/browse/PROTON-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576893#comment-13576893 ] michael goulish commented on PROTON-222: I am able to get my example working the way I want to by using a tracker, with window size 1, on the sender, and calling pn_messenger_status() after every message sent. new code: sender = #include proton/message.h #include proton/messenger.h #include stdio.h #include stdlib.h #include string.h int main(int argc, char** argv) { int c; opterr = 0; char addr [ 1000 ]; char content [ 1000 ]; char subject [ 1000 ]; sprintf ( addr, amqp://0.0.0.0:%s, argv[1] ); pn_message_t * message; pn_messenger_t * messenger; message = pn_message(); messenger = pn_messenger(NULL); pn_messenger_set_outgoing_window ( messenger, 1 ); pn_messenger_start(messenger); int n_messages = 10; int sent_count; /*-- Put and send a message every 1 second. --*/ for ( sent_count = 0 ; sent_count n_messages; ++ sent_count ) { sleep ( 1 ); sprintf ( subject, This is message %d., sent_count + 1 ); pn_message_set_address ( message, addr ); pn_message_set_subject ( message, subject ); pn_data_t *body = pn_message_body(message); sprintf ( content, Hello, Proton! ); pn_data_put_string(body, pn_bytes(strlen(content), content)); pn_messenger_put(messenger, message); pn_tracker_t tracker; tracker = pn_messenger_outgoing_tracker ( messenger ); pn_messenger_send(messenger); pn_messenger_status ( messenger, tracker ); fprintf ( stderr, sent %d messages.\n, sent_count + 1 ); } // Countdown to stop, to give me time to see it fprintf ( stderr, Calling stop in ...\n ); for ( int i = 5; i 0; -- i ) { fprintf ( stderr, %d\n, i ); sleep ( 1 ); } fprintf ( stderr, stop.\n); pn_messenger_stop(messenger); pn_messenger_free(messenger); pn_message_free(message); return 0; } = receiver = #include proton/message.h #include proton/messenger.h #include stdio.h #include stdlib.h #include ctype.h #define BUFSIZE 1024 void consume_messages ( pn_messenger_t * messenger, int n, pn_message_t * message ) { for ( int consume_count = 0; consume_count n; ++ consume_count ) { pn_messenger_get ( messenger, message ); size_t bufsize = BUFSIZE; char buffer [ bufsize ]; pn_data_t * body = pn_message_body ( message ); pn_data_format ( body, buffer, bufsize ); printf ( \n\nMessage \n); printf ( Address: %s\n, pn_message_get_address ( message ) ); char const * subject = pn_message_get_subject(message); printf ( Subject: %s\n, subject ? subject : (no subject) ); printf(Content: %s\n\n, buffer); } } int main(int argc, char** argv) { char addr [ 1000 ]; sprintf ( addr, amqp://~0.0.0.0:%s, argv[1] ); pn_message_t * message; pn_messenger_t * messenger; message = pn_message(); messenger = pn_messenger ( NULL ); pn_messenger_start(messenger); pn_messenger_subscribe ( messenger, addr ); int messages_wanted= 10; int total_received = 0; int received_this_time; pn_messenger_set_timeout ( messenger, 700 ); int tries = 0; while ( total_received messages_wanted ) { ++ tries; pn_messenger_recv ( messenger, BUFSIZE ); received_this_time = pn_messenger_incoming ( messenger ); fprintf ( stderr, try: %d received: %d total: %d\n, tries, received_this_time, total_received ); consume_messages ( messenger, received_this_time, message ); total_received += received_this_time; } pn_messenger_stop(messenger); pn_messenger_free(messenger); return 0; } pn_messenger_send returns before message data has been written to the wire -- Key: PROTON-222 URL: https://issues.apache.org/jira/browse/PROTON-222 Project: Qpid Proton Issue Type: Bug Components: proton-c, proton-j Affects Versions: 0.3 Reporter: Rafael H. Schloming Assignee: Ken Giusti Fix For: 0.4 Attachments: transport.patch Currently, pn_messender_send will block until the engine reports there are no queued messages being held. The problem arises because the queued message count only reports message data that is being held by the engine due to insufficient credit to send the messages. Messages may also be sitting in the transport's encoded frame buffer waiting to be