[ https://issues.apache.org/jira/browse/ARROW-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588313#comment-16588313 ]
Lukasz Bartnik edited comment on ARROW-1380 at 8/22/18 8:45 AM: ---------------------------------------------------------------- The first of these warnings could be probably addressed by not calling exit(0) from the signal handler. My impression is that after a signal is caught and exit() is called, main() never returns, and thus destructors for its local objects are not called. Below is the valgrind warning in question. {code:java} ==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2 ==1990== at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned long, char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x198A23: main (store.cc:937) {code} With changes as in I can reduce warnings to the one below. Looking at the code it's not clear if CreateObject() should be paired with a delete operation of if there is an internal pool/tracking mechanism. {code} pyarrow/tests/test_plasma.py::TestPlasmaClient::test_put_and_get command: valgrind --track-origins=yes --leak-check=full --show-leak-kinds=all --leak-check-heuristics=stdstring --error-exitcode=1 /io/arrow/python/pyarrow/plasma_store_server -s /tmp/test_plasma-k6wtcvi4/plasma.sock -m 100000000 ==575== Memcheck, a memory error detector ==575== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==575== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==575== Command: /io/arrow/python/pyarrow/plasma_store_server -s /tmp/test_plasma-k6wtcvi4/plasma.sock -m 100000000 ==575== Allowing the Plasma store to use up to 0.1GB of memory. Starting object store with directory /dev/shm and huge page support disabled PASSED==575== ==575== HEAP SUMMARY: ==575== in use at exit: 552 bytes in 1 blocks ==575== total heap usage: 178 allocs, 177 frees, 143,037 bytes allocated ==575== ==575== 552 bytes in 1 blocks are still reachable in loss record 1 of 1 ==575== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==575== by 0x567F5F7: fdopen@@GLIBC_2.2.5 (iofdopen.c:122) ==575== by 0x1BD47F: create_buffer(long) (malloc.cc:105) ==575== by 0x1BFF17: fake_mmap (malloc.cc:135) ==575== by 0x1C077B: sys_alloc (dlmalloc.c:4155) ==575== by 0x1C077B: dlmalloc (dlmalloc.c:4680) ==575== by 0x1C2850: internal_memalign.constprop.98 (dlmalloc.c:4917) ==575== by 0x19391A: plasma::PlasmaStore::CreateObject(plasma::UniqueID const&, long, long, int, plasma::Client*, plasma::PlasmaObject*) (store.cc:178) ==575== by 0x197337: plasma::PlasmaStore::ProcessMessage(plasma::Client*) (store.cc:740) ==575== by 0x195E02: plasma::PlasmaStore::ConnectClient(int)::{lambda(int)#1}::operator()(int) const (store.cc:544) ==575== by 0x19927B: std::_Function_handler<void (int), plasma::PlasmaStore::ConnectClient(int)::{lambda(int)#1}>::_M_invoke(std::_Any_data const&, int&&) (std_function.h:297) ==575== by 0x1B75FD: std::function<void (int)>::operator()(int) const (std_function.h:687) ==575== by 0x1B6F4E: plasma::EventLoop::FileEventCallback(aeEventLoop*, int, void*, int) (events.cc:28) ==575== ==575== LEAK SUMMARY: ==575== definitely lost: 0 bytes in 0 blocks ==575== indirectly lost: 0 bytes in 0 blocks ==575== possibly lost: 0 bytes in 0 blocks ==575== still reachable: 552 bytes in 1 blocks ==575== suppressed: 0 bytes in 0 blocks ==575== ==575== For counts of detected and suppressed errors, rerun with: -v ==575== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) {code} was (Author: lbartnik): The first of these warnings could be probably addressed by not calling exit(0) from the signal handler. My impression is that after a signal is caught and exit() is called, main() never returns, and thus destructors for its local objects are not called. Below is the valgrind warning in question. {code:java} ==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2 ==1990== at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned long, char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25) ==1990== by 0x198A23: main (store.cc:937) {code} I see that SIGTERM comes from Python: "Ensure Valgrind and/or coverage have a clean exit". Does it make sense to set an exit flag in the signal handler and then let the event loop exit on its own in the main call stack? > [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1 > -------------------------------------------------------------------- > > Key: ARROW-1380 > URL: https://issues.apache.org/jira/browse/ARROW-1380 > Project: Apache Arrow > Issue Type: Bug > Components: Plasma (C++) > Reporter: Wes McKinney > Priority: Major > Fix For: 0.11.0 > > Attachments: LastTest.log, valgrind.supp_ > > > I thought I fixed this, but they seem to have recurred: > https://travis-ci.org/apache/arrow/jobs/266421430#L5220 -- This message was sent by Atlassian JIRA (v7.6.3#76005)