[jira] [Comment Edited] (ARROW-1380) [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1
[ https://issues.apache.org/jira/browse/ARROW-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588313#comment-16588313 ] Lukasz Bartnik edited comment on ARROW-1380 at 8/22/18 8:48 AM: The first of these warnings could be probably addressed by not calling exit(0) from the signal handler. My impression is that after a signal is caught and exit() is called, main() never returns, and thus destructors for its local objects are not called. Below is the valgrind warning in question. {code:java} ==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2 ==1990== at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned long, char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x198A23: main (store.cc:937) {code} With changes as in https://github.com/lbartnik/arrow/commit/089153d518c081d7b9c1b3fb839463bca9ac1a35 I can reduce warnings to the one below. Looking at the code it's not clear if CreateObject() should be paired with a delete operation of if there is an internal pool/tracking mechanism. {code} pyarrow/tests/test_plasma.py::TestPlasmaClient::test_put_and_get command: valgrind --track-origins=yes --leak-check=full --show-leak-kinds=all --leak-check-heuristics=stdstring --error-exitcode=1 /io/arrow/python/pyarrow/plasma_store_server -s /tmp/test_plasma-k6wtcvi4/plasma.sock -m 1 ==575== Memcheck, a memory error detector ==575== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==575== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==575== Command: /io/arrow/python/pyarrow/plasma_store_server -s /tmp/test_plasma-k6wtcvi4/plasma.sock -m 1 ==575== Allowing the Plasma store to use up to 0.1GB of memory. Starting object store with directory /dev/shm and huge page support disabled PASSED==575== ==575== HEAP SUMMARY: ==575== in use at exit: 552 bytes in 1 blocks ==575== total heap usage: 178 allocs, 177 frees, 143,037 bytes allocated ==575== ==575== 552 bytes in 1 blocks are still reachable in loss record 1 of 1 ==575==at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==575==by 0x567F5F7: fdopen@@GLIBC_2.2.5 (iofdopen.c:122) ==575==by 0x1BD47F: create_buffer(long) (malloc.cc:105) ==575==by 0x1BFF17: fake_mmap (malloc.cc:135) ==575==by 0x1C077B: sys_alloc (dlmalloc.c:4155) ==575==by 0x1C077B: dlmalloc (dlmalloc.c:4680) ==575==by 0x1C2850: internal_memalign.constprop.98 (dlmalloc.c:4917) ==575==by 0x19391A: plasma::PlasmaStore::CreateObject(plasma::UniqueID const&, long, long, int, plasma::Client*, plasma::PlasmaObject*) (store.cc:178) ==575==by 0x197337: plasma::PlasmaStore::ProcessMessage(plasma::Client*) (store.cc:740) ==575==by 0x195E02: plasma::PlasmaStore::ConnectClient(int)::{lambda(int)#1}::operator()(int) const (store.cc:544) ==575==by 0x19927B: std::_Function_handler::_M_invoke(std::_Any_data const&, int&&) (std_function.h:297) ==575==by 0x1B75FD: std::function::operator()(int) const (std_function.h:687) ==575==by 0x1B6F4E: plasma::EventLoop::FileEventCallback(aeEventLoop*, int, void*, int) (events.cc:28) ==575== ==575== LEAK SUMMARY: ==575==definitely lost: 0 bytes in 0 blocks ==575==indirectly lost: 0 bytes in 0 blocks ==575== possibly lost: 0 bytes in 0 blocks ==575==still reachable: 552 bytes in 1 blocks ==575== suppressed: 0 bytes in 0 blocks ==575== ==575== For counts of detected and suppressed errors, rerun with: -v ==575== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) {code} was (Author: lbartnik): The first of these warnings could be probably addressed by not calling exit(0) from the signal handler. My impression is that after a signal is caught and exit() is called, main() never returns, and thus destructors for its local objects are not called. Below is the valgrind warning in question. {code:java} ==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2 ==1990== at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5131321: std::string::
[jira] [Comment Edited] (ARROW-1380) [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1
[ https://issues.apache.org/jira/browse/ARROW-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588313#comment-16588313 ] Lukasz Bartnik edited comment on ARROW-1380 at 8/22/18 8:45 AM: The first of these warnings could be probably addressed by not calling exit(0) from the signal handler. My impression is that after a signal is caught and exit() is called, main() never returns, and thus destructors for its local objects are not called. Below is the valgrind warning in question. {code:java} ==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2 ==1990== at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned long, char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x198A23: main (store.cc:937) {code} With changes as in I can reduce warnings to the one below. Looking at the code it's not clear if CreateObject() should be paired with a delete operation of if there is an internal pool/tracking mechanism. {code} pyarrow/tests/test_plasma.py::TestPlasmaClient::test_put_and_get command: valgrind --track-origins=yes --leak-check=full --show-leak-kinds=all --leak-check-heuristics=stdstring --error-exitcode=1 /io/arrow/python/pyarrow/plasma_store_server -s /tmp/test_plasma-k6wtcvi4/plasma.sock -m 1 ==575== Memcheck, a memory error detector ==575== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==575== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==575== Command: /io/arrow/python/pyarrow/plasma_store_server -s /tmp/test_plasma-k6wtcvi4/plasma.sock -m 1 ==575== Allowing the Plasma store to use up to 0.1GB of memory. Starting object store with directory /dev/shm and huge page support disabled PASSED==575== ==575== HEAP SUMMARY: ==575== in use at exit: 552 bytes in 1 blocks ==575== total heap usage: 178 allocs, 177 frees, 143,037 bytes allocated ==575== ==575== 552 bytes in 1 blocks are still reachable in loss record 1 of 1 ==575==at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==575==by 0x567F5F7: fdopen@@GLIBC_2.2.5 (iofdopen.c:122) ==575==by 0x1BD47F: create_buffer(long) (malloc.cc:105) ==575==by 0x1BFF17: fake_mmap (malloc.cc:135) ==575==by 0x1C077B: sys_alloc (dlmalloc.c:4155) ==575==by 0x1C077B: dlmalloc (dlmalloc.c:4680) ==575==by 0x1C2850: internal_memalign.constprop.98 (dlmalloc.c:4917) ==575==by 0x19391A: plasma::PlasmaStore::CreateObject(plasma::UniqueID const&, long, long, int, plasma::Client*, plasma::PlasmaObject*) (store.cc:178) ==575==by 0x197337: plasma::PlasmaStore::ProcessMessage(plasma::Client*) (store.cc:740) ==575==by 0x195E02: plasma::PlasmaStore::ConnectClient(int)::{lambda(int)#1}::operator()(int) const (store.cc:544) ==575==by 0x19927B: std::_Function_handler::_M_invoke(std::_Any_data const&, int&&) (std_function.h:297) ==575==by 0x1B75FD: std::function::operator()(int) const (std_function.h:687) ==575==by 0x1B6F4E: plasma::EventLoop::FileEventCallback(aeEventLoop*, int, void*, int) (events.cc:28) ==575== ==575== LEAK SUMMARY: ==575==definitely lost: 0 bytes in 0 blocks ==575==indirectly lost: 0 bytes in 0 blocks ==575== possibly lost: 0 bytes in 0 blocks ==575==still reachable: 552 bytes in 1 blocks ==575== suppressed: 0 bytes in 0 blocks ==575== ==575== For counts of detected and suppressed errors, rerun with: -v ==575== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) {code} was (Author: lbartnik): The first of these warnings could be probably addressed by not calling exit(0) from the signal handler. My impression is that after a signal is caught and exit() is called, main() never returns, and thus destructors for its local objects are not called. Below is the valgrind warning in question. {code:java} ==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2 ==1990== at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned long, char const*, unsigned long) (in /usr/
[jira] [Comment Edited] (ARROW-1380) [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1
[ https://issues.apache.org/jira/browse/ARROW-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588313#comment-16588313 ] Lukasz Bartnik edited comment on ARROW-1380 at 8/22/18 7:52 AM: The first of these warnings could be probably addressed by not calling exit(0) from the signal handler. My impression is that after a signal is caught and exit() is called, main() never returns, and thus destructors for its local objects are not called. Below is the valgrind warning in question. {code:java} ==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2 ==1990== at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned long, char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x198A23: main (store.cc:937) {code} I see that SIGTERM comes from Python: "Ensure Valgrind and/or coverage have a clean exit". Does it make sense to set an exit flag in the signal handler and then let the event loop exit on its own in the main call stack? was (Author: lbartnik): The first of these warnings could be probably addressed by not calling exit(0) from the signal handler. My impression is that after a signal is caught and exit() is called, main() never returns, and thus destructors for its local objects are not called. Below is the valgrind warning in question. {code:java} ==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2 ==1990== at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned long, char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x198A23: main (store.cc:937) {code} I tried simply commenting exit() out but that leads to other errors and I assume is not intended. I don't see much other signal handling in plasma and my current guess is that it is ae that gets interrupted and then drops the event loop. Why is there even a SIGTERM in the first place? Where does it come from? I'd be grateful for comments and/or pointers to relevant areas in the code. > [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1 > > > Key: ARROW-1380 > URL: https://issues.apache.org/jira/browse/ARROW-1380 > Project: Apache Arrow > Issue Type: Bug > Components: Plasma (C++) >Reporter: Wes McKinney >Priority: Major > Fix For: 0.11.0 > > Attachments: LastTest.log, valgrind.supp_ > > > I thought I fixed this, but they seem to have recurred: > https://travis-ci.org/apache/arrow/jobs/266421430#L5220 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-1380) [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1
[ https://issues.apache.org/jira/browse/ARROW-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588313#comment-16588313 ] Lukasz Bartnik edited comment on ARROW-1380 at 8/22/18 3:03 AM: The first of these warnings could be probably addressed by not calling exit(0) from the signal handler. My impression is that after a signal is caught and exit() is called, main() never returns, and thus destructors for its local objects are not called. Below is the valgrind warning in question. {code:java} ==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2 ==1990== at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned long, char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x198A23: main (store.cc:937) {code} I tried simply commenting exit() out but that leads to other errors and I assume is not intended. I don't see much other signal handling in plasma and my current guess is that it is ae that gets interrupted and then drops the event loop. Why is there even a SIGTERM in the first place? Where does it come from? I'd be grateful for comments and/or pointers to relevant areas in the code. was (Author: lbartnik): The first of these warnings could be probably addressed by not calling exit(0) from the signal handler. My impression is that after a signal is caught and exit() is called, main() never returns, and thus destructors for its local objects are not called. Below is the valgrind warning in question. {code} ==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2 ==1990== at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned long, char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x198A23: main (store.cc:937) {code} I tried simply commenting exit() out but that leads to other errors and I assume is not intended. I don't see much other signal handling in plasma and my current guess is that it is ae that gets interrupted and then drops the event loop. I'd be grateful for comments and/or pointers to relevant areas in the code. > [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1 > > > Key: ARROW-1380 > URL: https://issues.apache.org/jira/browse/ARROW-1380 > Project: Apache Arrow > Issue Type: Bug > Components: Plasma (C++) >Reporter: Wes McKinney >Priority: Major > Fix For: 0.11.0 > > Attachments: LastTest.log, valgrind.supp_ > > > I thought I fixed this, but they seem to have recurred: > https://travis-ci.org/apache/arrow/jobs/266421430#L5220 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-1380) [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1
[ https://issues.apache.org/jira/browse/ARROW-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588313#comment-16588313 ] Lukasz Bartnik edited comment on ARROW-1380 at 8/22/18 3:01 AM: The first of these warnings could be probably addressed by not calling exit(0) from the signal handler. My impression is that after a signal is caught and exit() is called, main() never returns, and thus destructors for its local objects are not called. Below is the valgrind warning in question. {code} ==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2 ==1990== at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned long, char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x198A23: main (store.cc:937) {code} I tried simply commenting exit() out but that leads to other errors and I assume is not intended. I don't see much other signal handling in plasma and my current guess is that it is ae that gets interrupted and then drops the event loop. I'd be grateful for comments and/or pointers to relevant areas in the code. was (Author: lbartnik): The first of these warnings could be probably addressed by not calling exit(0) from the signal handler. My impression is that after a signal is caught and exit() is called, main() never returns, and thus destructors for its local objects are not called. Below is the valgrind warning in question. ==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2 ==1990== at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned long, char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x198A23: main (store.cc:937) I tried simply commenting exit() out but that leads to other errors and I assume is not intended. I don't see much other signal handling in plasma and my current guess is that it is ae that gets interrupted and then drops the event loop. I'd be grateful for comments and/or pointers to relevant areas in the code. > [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1 > > > Key: ARROW-1380 > URL: https://issues.apache.org/jira/browse/ARROW-1380 > Project: Apache Arrow > Issue Type: Bug > Components: Plasma (C++) >Reporter: Wes McKinney >Priority: Major > Fix For: 0.11.0 > > Attachments: LastTest.log, valgrind.supp_ > > > I thought I fixed this, but they seem to have recurred: > https://travis-ci.org/apache/arrow/jobs/266421430#L5220 -- This message was sent by Atlassian JIRA (v7.6.3#76005)