[ 
https://issues.apache.org/jira/browse/DISPATCH-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287804#comment-17287804
 ] 

ASF GitHub Bot commented on DISPATCH-1962:
------------------------------------------

jiridanek opened a new pull request #1051:
URL: https://github.com/apache/qpid-dispatch/pull/1051


   This PR is simply to show that "a solution exists". I have no idea whether I 
am on the right track here, or not.
   
   The original problem is
   
   ```
   14: 2021-02-20 20:54:08.332325 +0100 SERVER (info) [C2] Closing connection 
on shutdown (../src/server.c:1377)
   14: 
   14: <<<<
   14: 
   14: Router QDR.A output file:
   14: >>>>
   14: 
   14: =================================================================
   14: ==28365==ERROR: LeakSanitizer: detected memory leaks
   14: 
   14: Direct leak of 56 byte(s) in 1 object(s) allocated from:
   14:     #0 0x7f8503a7ce8f in __interceptor_malloc 
(/nix/store/g40sl3zh3nv52vj0mrl4iki5iphh5ika-gcc-10.2.0-lib/lib/libasan.so.6+0xace8f)
   14:     #1 0x7f85034afd42 in qd_malloc ../include/qpid/dispatch/ctools.h:229
   14:     #2 0x7f85034afd42 in qdr_core_subscribe 
../src/router_core/route_tables.c:149
   14:     #3 0x7f85034f4962 in qcm_mobile_sync_init_CT 
../src/router_core/modules/mobile_sync/mobile.c:919
   14:     #4 0x7f85034a79d1 in qdr_modules_init 
../src/router_core/router_core_thread.c:120
   14:     #5 0x7f850348e889 in qdr_core_setup_init 
../src/router_core/router_core.c:60
   14:     #6 0x7f8503491971 in qdr_core ../src/router_core/router_core.c:116
   14:     #7 0x7f8503505949 in qd_router_setup_late ../src/router_node.c:2071
   14:     #8 0x7f84fee08abc in ffi_call_unix64 
(/nix/store/m8y5mz1f0al3rg3b56rq5bza49jjxnc0-libffi-3.3/lib/libffi.so.7+0x7abc)
   14:     #9 0x7fff297a166f  ([stack]+0x1e66f)
   14: 
   14: Direct leak of 56 byte(s) in 1 object(s) allocated from:
   14:     #0 0x7f8503a7ce8f in __interceptor_malloc 
(/nix/store/g40sl3zh3nv52vj0mrl4iki5iphh5ika-gcc-10.2.0-lib/lib/libasan.so.6+0xace8f)
   14:     #1 0x7f85034afd42 in qd_malloc ../include/qpid/dispatch/ctools.h:229
   14:     #2 0x7f85034afd42 in qdr_core_subscribe 
../src/router_core/route_tables.c:149
   14:     #3 0x7f85034f49d3 in qcm_mobile_sync_init_CT 
../src/router_core/modules/mobile_sync/mobile.c:921
   14:     #4 0x7f85034a79d1 in qdr_modules_init 
../src/router_core/router_core_thread.c:120
   14:     #5 0x7f850348e889 in qdr_core_setup_init 
../src/router_core/router_core.c:60
   14:     #6 0x7f8503491971 in qdr_core ../src/router_core/router_core.c:116
   14:     #7 0x7f8503505949 in qd_router_setup_late ../src/router_node.c:2071
   14:     #8 0x7f84fee08abc in ffi_call_unix64 
(/nix/store/m8y5mz1f0al3rg3b56rq5bza49jjxnc0-libffi-3.3/lib/libffi.so.7+0x7abc)
   14:     #9 0x7fff297a166f  ([stack]+0x1e66f)
   14: 
   14: -----------------------------------------------------
   14: Suppressions used:
   14:   count      bytes template
   14:       4        224 ^IoAdapter_init$
   14:       4       2560 ^_PyObject_Realloc
   14:     558     883628 ^PyObject_Malloc$
   14:       1         32 ^PyThread_allocate_lock$
   14:       4       9897 ^PyMem_Malloc$
   14:       1        840 ^_PyObject_GC_Resize$
   14:       1       1184 ^list_append$
   14: -----------------------------------------------------
   14: 
   14: SUMMARY: AddressSanitizer: 112 byte(s) leaked in 2 allocation(s).
   ```
   
   If I want to free this, I have to reorder things in `qdr_core_free` so that 
the finalizers don't try to use something that was already freed:
   
   If I don't move `qdr_modules_finalize` before the `discard any left over 
actions` block, I get the following, because the lock is already freed
   
   ```
   16: qdrouterd: ../src/posix/threading.c:58: sys_mutex_lock: Assertion 
`result == 0' failed.
   ```
   
   And if I run `qcm_edge_router_final_CT` 
(`src/router_core/modules/edge_router/module.c:59`} before 
`qdrc_endpoint_do_cleanup_CT` (`src/router_core/core_link_endpoint.c:241`), I 
get
   
   ```
   16: ==16076==ERROR: AddressSanitizer: heap-use-after-free on address 
0x60c0000035f0 at pc 0x7ff5117d60bc bp 0x7fffd457e1f0 sp 0x7fffd457e1e8
   16: READ of size 8 at 0x60c0000035f0 thread T0
   16:     #0 0x7ff5117d60bb in qdrc_endpoint_do_cleanup_CT 
../src/router_core/core_link_endpoint.c:241
   16:     #1 0x7ff51182e3ea in qdr_core_free 
../src/router_core/router_core.c:230
   16:     #2 0x7ff511891ecb in qd_router_free ../src/router_node.c:2108
   16:     #3 0x7ff511732229 in qd_dispatch_free ../src/dispatch.c:368
   16:     #4 0x402625 in main_process ../router/src/main.c:117
   16:     #5 0x403f4b in main ../router/src/main.c:367
   16:     #6 0x7ff510382cbc in __libc_start_main 
(/nix/store/q53f5birhik4dxg3q3r2g5f324n7r5mc-glibc-2.31-74/lib/libc.so.6+0x23cbc)
   16:     #7 0x402419 in _start 
(/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/router/qdrouterd+0x402419)
   16: 
   16: 0x60c0000035f0 is located 112 bytes inside of 120-byte region 
[0x60c000003580,0x60c0000035f8)
   16: freed by thread T0 here:
   16:     #0 0x7ff511e08b6f in __interceptor_free 
(/nix/store/g40sl3zh3nv52vj0mrl4iki5iphh5ika-gcc-10.2.0-lib/lib/libasan.so.6+0xacb6f)
   16:     #1 0x7ff51185abe9 in qcm_edge_addr_proxy_final 
../src/router_core/modules/edge_router/addr_proxy.c:593
   16:     #2 0x7ff511856fcd in qcm_edge_router_final_CT 
../src/router_core/modules/edge_router/module.c:59
   16:     #3 0x7ff511833d61 in qdr_modules_finalize 
../src/router_core/router_core_thread.c:139
   16:     #4 0x7ff51182c3ff in qdr_core_free 
../src/router_core/router_core.c:146
   16:     #5 0x7ff511891ecb in qd_router_free ../src/router_node.c:2108
   16:     #6 0x7ff511732229 in qd_dispatch_free ../src/dispatch.c:368
   16:     #7 0x402625 in main_process ../router/src/main.c:117
   16:     #8 0x403f4b in main ../router/src/main.c:367
   16:     #9 0x7ff510382cbc in __libc_start_main 
(/nix/store/q53f5birhik4dxg3q3r2g5f324n7r5mc-glibc-2.31-74/lib/libc.so.6+0x23cbc)
   16: 
   16: previously allocated by thread T0 here:
   16:     #0 0x7ff511e08e8f in __interceptor_malloc 
(/nix/store/g40sl3zh3nv52vj0mrl4iki5iphh5ika-gcc-10.2.0-lib/lib/libasan.so.6+0xace8f)
   16:     #1 0x7ff51185a75b in qd_malloc ../include/qpid/dispatch/ctools.h:229
   16:     #2 0x7ff51185a75b in qcm_edge_addr_proxy 
../src/router_core/modules/edge_router/addr_proxy.c:545
   16:     #3 0x7ff511857164 in qcm_edge_router_init_CT 
../src/router_core/modules/edge_router/module.c:46
   16:     #4 0x7ff5118339d1 in qdr_modules_init 
../src/router_core/router_core_thread.c:120
   16:     #5 0x7ff51181a889 in qdr_core_setup_init 
../src/router_core/router_core.c:60
   16:     #6 0x7ff51181d971 in qdr_core ../src/router_core/router_core.c:116
   16:     #7 0x7ff511891a29 in qd_router_setup_late ../src/router_node.c:2071
   16:     #8 0x7ff50d10eabc in ffi_call_unix64 
(/nix/store/m8y5mz1f0al3rg3b56rq5bza49jjxnc0-libffi-3.3/lib/libffi.so.7+0x7abc)
   16:     #9 0x7fffd457dd1f  ([stack]+0x1ed1f)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make LSan suppressions more targeted and specific (within reason)
> -----------------------------------------------------------------
>
>                 Key: DISPATCH-1962
>                 URL: https://issues.apache.org/jira/browse/DISPATCH-1962
>             Project: Qpid Dispatch
>          Issue Type: Test
>    Affects Versions: 1.15.0
>            Reporter: Jiri Daněk
>            Priority: Major
>         Attachments: dispatch_leaking_agent.png
>
>
> There are some unfortunate suppressions in {{tests/lsan.supp}} at this moment:
> {code}
> leak:*libwebsockets*
> {code}
> This is way too broad. It suppresses leaks in Dispatch files, since it 
> matches e.g. {{src/http-libwebsockets.c}}.
> The stars at the beginning and end are actually assumed implicitly. If you do 
> not want substring match, you have to do ^foo$. LSan suppression format is 
> simplistic, very unlike Valgrind's.
> {code}
> leak:run_unit_tests.c
> {code}
> Same thing, any leaks revealed by running unit_tests get suppressed. This 
> suppression suppresses all leak traces that include run_unit_tests.c anywhere 
> in the stack. What't the point of running such tests under leak detector, 
> then?
> {code}
> leak:run_unit_tests.c
> leak:^libqpid-proton.so$
> {code}
> Same thing. The patterns suppress all leaks that include Python (or Proton) 
> anywhere in the stacktrace. That means there are huge blind spots where 
> dispatch leaks can hide. This is a weakness of the lsan.supp syntax (Valgrind 
> suppressions can be much more targeted and discerning).
> h3. Python leaks
> Leaks are known and there is ongoing effort to fight them: 
> https://bugs.python.org/issue1635741 (https://bugs.python.org/issue25302) and 
> https://www.python.org/dev/peps/pep-3121
> Here's valgrind suppression file from somebody who actually investigated the 
> Python leaks and identified the harmless ones: 
> https://github.com/libgit2/pygit2/blob/master/misc/valgrind-python.supp
> One example of a hidden leak in dispatch, which is revealed by making Python 
> suppressions more targetted:
> {code}
> 9: Direct leak of 56 byte(s) in 1 object(s) allocated from:
> 9:     #0 0x7f78a3606e8f in __interceptor_malloc 
> (/nix/store/g40sl3zh3nv52vj0mrl4iki5iphh5ika-gcc-10.2.0-lib/lib/libasan.so.6+0xace8f)
> 9:     #1 0x7f78a2d64afb in qd_malloc ../include/qpid/dispatch/ctools.h:229
> 9:     #2 0x7f78a2d657da in qdr_core_subscribe 
> ../src/router_core/route_tables.c:149
> 9:     #3 0x7f78a2c83072 in IoAdapter_init ../src/python_embedded.c:711
> 9:     #4 0x7f78a2353a6c in type_call 
> (/nix/store/r85nxfnwiv45nbmf5yb60jj8ajim4m7w-python3-3.8.5/lib/libpython3.8.so.1.0+0x165a6c)
> {code}
> The problem is in
> {code}
> class Agent:
>     ...
>     def activate(self, address):
>         ...
>         self.io = IoAdapter(self.receive, address, 'L', '0', 
> TREATMENT_ANYCAST_CLOSEST)
> {code}
> IoAdapter refers to Agent (through the bound method reference self.receive) 
> and Agent refers to IoAdapter (through property self.io). Since IoAdapter is 
> implemented in C and does not implement support for Python's cyclic GC, there 
> is no way to break the cycle.
> Heap dump in attachment. The bound method is at the top of the picture. 
> (Ignore the Mock objects, I was trying to simplify the picture while not 
> getting crashes due to too much meddling).
> h3. Random observations
> It is possible to build special Debug build of Python, which has tools to 
> detect leaks, asserts to prevent negative refcounts, etc. 
> https://pythonextensionpatterns.readthedocs.io/en/latest/debugging/debug_python.html#debug-version-of-python-memory-alloc-label
> Use the following to detect python leaks (instead of valgrind)
> https://docs.python.org/3/library/tracemalloc.html
> Use https://pypi.org/project/objgraph (with graphviz) to view heap object 
> trees. The following renders the picture as a png under /tmp and prints the 
> path to stdout.
> {code}
> int ret = PyRun_SimpleString("import objgraph; 
> objgraph.show_backrefs(config.global_agent, max_depth=10)\n\n");
> PyErr_PrintEx(0);
> assert(ret == 0);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to