[jira] [Comment Edited] (ARROW-1380) [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1

2018-08-22 Thread Lukasz Bartnik (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588313#comment-16588313
 ] 

Lukasz Bartnik edited comment on ARROW-1380 at 8/22/18 8:48 AM:


The first of these warnings could be probably addressed by not calling exit(0) 
from the signal handler. My impression is that after a signal is caught and 
exit() is called, main() never returns, and thus destructors for its local 
objects are not called. Below is the valgrind warning in question.

{code:java}
==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2
==1990== at 0x4C3017F: operator new(unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned 
long, std::allocator const&) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, 
unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned 
long, char const*, unsigned long) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x198A23: main (store.cc:937)
{code}

With changes as in 
https://github.com/lbartnik/arrow/commit/089153d518c081d7b9c1b3fb839463bca9ac1a35
 I can reduce warnings to the one below. Looking at the code it's not clear if 
CreateObject() should be paired with a delete operation of if there is an 
internal pool/tracking mechanism.

{code}
pyarrow/tests/test_plasma.py::TestPlasmaClient::test_put_and_get command:  
valgrind --track-origins=yes --leak-check=full --show-leak-kinds=all 
--leak-check-heuristics=stdstring --error-exitcode=1 
/io/arrow/python/pyarrow/plasma_store_server -s 
/tmp/test_plasma-k6wtcvi4/plasma.sock -m 1
==575== Memcheck, a memory error detector
==575== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==575== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==575== Command: /io/arrow/python/pyarrow/plasma_store_server -s 
/tmp/test_plasma-k6wtcvi4/plasma.sock -m 1
==575== 
Allowing the Plasma store to use up to 0.1GB of memory.
Starting object store with directory /dev/shm and huge page support disabled
PASSED==575== 
==575== HEAP SUMMARY:
==575== in use at exit: 552 bytes in 1 blocks
==575==   total heap usage: 178 allocs, 177 frees, 143,037 bytes allocated
==575== 
==575== 552 bytes in 1 blocks are still reachable in loss record 1 of 1
==575==at 0x4C2FB0F: malloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==575==by 0x567F5F7: fdopen@@GLIBC_2.2.5 (iofdopen.c:122)
==575==by 0x1BD47F: create_buffer(long) (malloc.cc:105)
==575==by 0x1BFF17: fake_mmap (malloc.cc:135)
==575==by 0x1C077B: sys_alloc (dlmalloc.c:4155)
==575==by 0x1C077B: dlmalloc (dlmalloc.c:4680)
==575==by 0x1C2850: internal_memalign.constprop.98 (dlmalloc.c:4917)
==575==by 0x19391A: plasma::PlasmaStore::CreateObject(plasma::UniqueID 
const&, long, long, int, plasma::Client*, plasma::PlasmaObject*) (store.cc:178)
==575==by 0x197337: plasma::PlasmaStore::ProcessMessage(plasma::Client*) 
(store.cc:740)
==575==by 0x195E02: 
plasma::PlasmaStore::ConnectClient(int)::{lambda(int)#1}::operator()(int) const 
(store.cc:544)
==575==by 0x19927B: std::_Function_handler::_M_invoke(std::_Any_data
 const&, int&&) (std_function.h:297)
==575==by 0x1B75FD: std::function::operator()(int) const 
(std_function.h:687)
==575==by 0x1B6F4E: plasma::EventLoop::FileEventCallback(aeEventLoop*, int, 
void*, int) (events.cc:28)
==575== 
==575== LEAK SUMMARY:
==575==definitely lost: 0 bytes in 0 blocks
==575==indirectly lost: 0 bytes in 0 blocks
==575==  possibly lost: 0 bytes in 0 blocks
==575==still reachable: 552 bytes in 1 blocks
==575== suppressed: 0 bytes in 0 blocks
==575== 
==575== For counts of detected and suppressed errors, rerun with: -v
==575== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
{code}


was (Author: lbartnik):
The first of these warnings could be probably addressed by not calling exit(0) 
from the signal handler. My impression is that after a signal is caught and 
exit() is called, main() never returns, and thus destructors for its local 
objects are not called. Below is the valgrind warning in question.

{code:java}
==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2
==1990== at 0x4C3017F: operator new(unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned 
long, std::allocator const&) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, 
unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5131321: std::string::

[jira] [Comment Edited] (ARROW-1380) [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1

2018-08-22 Thread Lukasz Bartnik (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588313#comment-16588313
 ] 

Lukasz Bartnik edited comment on ARROW-1380 at 8/22/18 8:45 AM:


The first of these warnings could be probably addressed by not calling exit(0) 
from the signal handler. My impression is that after a signal is caught and 
exit() is called, main() never returns, and thus destructors for its local 
objects are not called. Below is the valgrind warning in question.

{code:java}
==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2
==1990== at 0x4C3017F: operator new(unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned 
long, std::allocator const&) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, 
unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned 
long, char const*, unsigned long) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x198A23: main (store.cc:937)
{code}

With changes as in I can reduce warnings to the one below. Looking at the code 
it's not clear if CreateObject() should be paired with a delete operation of if 
there is an internal pool/tracking mechanism.

{code}
pyarrow/tests/test_plasma.py::TestPlasmaClient::test_put_and_get command:  
valgrind --track-origins=yes --leak-check=full --show-leak-kinds=all 
--leak-check-heuristics=stdstring --error-exitcode=1 
/io/arrow/python/pyarrow/plasma_store_server -s 
/tmp/test_plasma-k6wtcvi4/plasma.sock -m 1
==575== Memcheck, a memory error detector
==575== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==575== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==575== Command: /io/arrow/python/pyarrow/plasma_store_server -s 
/tmp/test_plasma-k6wtcvi4/plasma.sock -m 1
==575== 
Allowing the Plasma store to use up to 0.1GB of memory.
Starting object store with directory /dev/shm and huge page support disabled
PASSED==575== 
==575== HEAP SUMMARY:
==575== in use at exit: 552 bytes in 1 blocks
==575==   total heap usage: 178 allocs, 177 frees, 143,037 bytes allocated
==575== 
==575== 552 bytes in 1 blocks are still reachable in loss record 1 of 1
==575==at 0x4C2FB0F: malloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==575==by 0x567F5F7: fdopen@@GLIBC_2.2.5 (iofdopen.c:122)
==575==by 0x1BD47F: create_buffer(long) (malloc.cc:105)
==575==by 0x1BFF17: fake_mmap (malloc.cc:135)
==575==by 0x1C077B: sys_alloc (dlmalloc.c:4155)
==575==by 0x1C077B: dlmalloc (dlmalloc.c:4680)
==575==by 0x1C2850: internal_memalign.constprop.98 (dlmalloc.c:4917)
==575==by 0x19391A: plasma::PlasmaStore::CreateObject(plasma::UniqueID 
const&, long, long, int, plasma::Client*, plasma::PlasmaObject*) (store.cc:178)
==575==by 0x197337: plasma::PlasmaStore::ProcessMessage(plasma::Client*) 
(store.cc:740)
==575==by 0x195E02: 
plasma::PlasmaStore::ConnectClient(int)::{lambda(int)#1}::operator()(int) const 
(store.cc:544)
==575==by 0x19927B: std::_Function_handler::_M_invoke(std::_Any_data
 const&, int&&) (std_function.h:297)
==575==by 0x1B75FD: std::function::operator()(int) const 
(std_function.h:687)
==575==by 0x1B6F4E: plasma::EventLoop::FileEventCallback(aeEventLoop*, int, 
void*, int) (events.cc:28)
==575== 
==575== LEAK SUMMARY:
==575==definitely lost: 0 bytes in 0 blocks
==575==indirectly lost: 0 bytes in 0 blocks
==575==  possibly lost: 0 bytes in 0 blocks
==575==still reachable: 552 bytes in 1 blocks
==575== suppressed: 0 bytes in 0 blocks
==575== 
==575== For counts of detected and suppressed errors, rerun with: -v
==575== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
{code}


was (Author: lbartnik):
The first of these warnings could be probably addressed by not calling exit(0) 
from the signal handler. My impression is that after a signal is caught and 
exit() is called, main() never returns, and thus destructors for its local 
objects are not called. Below is the valgrind warning in question.
{code:java}
==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2
==1990== at 0x4C3017F: operator new(unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned 
long, std::allocator const&) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, 
unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned 
long, char const*, unsigned long) (in 
/usr/

[jira] [Comment Edited] (ARROW-1380) [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1

2018-08-22 Thread Lukasz Bartnik (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588313#comment-16588313
 ] 

Lukasz Bartnik edited comment on ARROW-1380 at 8/22/18 7:52 AM:


The first of these warnings could be probably addressed by not calling exit(0) 
from the signal handler. My impression is that after a signal is caught and 
exit() is called, main() never returns, and thus destructors for its local 
objects are not called. Below is the valgrind warning in question.
{code:java}
==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2
==1990== at 0x4C3017F: operator new(unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned 
long, std::allocator const&) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, 
unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned 
long, char const*, unsigned long) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x198A23: main (store.cc:937)
{code}
 

I see that SIGTERM comes from Python: "Ensure Valgrind and/or coverage have a 
clean exit". Does it make sense to set an exit flag in the signal handler and 
then let the event loop exit on its own in the main call stack?


was (Author: lbartnik):
The first of these warnings could be probably addressed by not calling exit(0) 
from the signal handler. My impression is that after a signal is caught and 
exit() is called, main() never returns, and thus destructors for its local 
objects are not called. Below is the valgrind warning in question.
{code:java}
==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2
==1990== at 0x4C3017F: operator new(unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned 
long, std::allocator const&) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, 
unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned 
long, char const*, unsigned long) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x198A23: main (store.cc:937)
{code}
 

I tried simply commenting exit() out but that leads to other errors and I 
assume is not intended. I don't see much other signal handling in plasma and my 
current guess is that it is ae that gets interrupted and then drops the event 
loop.

Why is there even a SIGTERM in the first place? Where does it come from?

I'd be grateful for comments and/or pointers to relevant areas in the code.

> [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1
> 
>
> Key: ARROW-1380
> URL: https://issues.apache.org/jira/browse/ARROW-1380
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
> Attachments: LastTest.log, valgrind.supp_
>
>
> I thought I fixed this, but they seem to have recurred:
> https://travis-ci.org/apache/arrow/jobs/266421430#L5220



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-1380) [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1

2018-08-21 Thread Lukasz Bartnik (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588313#comment-16588313
 ] 

Lukasz Bartnik edited comment on ARROW-1380 at 8/22/18 3:03 AM:


The first of these warnings could be probably addressed by not calling exit(0) 
from the signal handler. My impression is that after a signal is caught and 
exit() is called, main() never returns, and thus destructors for its local 
objects are not called. Below is the valgrind warning in question.
{code:java}
==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2
==1990== at 0x4C3017F: operator new(unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned 
long, std::allocator const&) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, 
unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned 
long, char const*, unsigned long) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x198A23: main (store.cc:937)
{code}
 

I tried simply commenting exit() out but that leads to other errors and I 
assume is not intended. I don't see much other signal handling in plasma and my 
current guess is that it is ae that gets interrupted and then drops the event 
loop.

Why is there even a SIGTERM in the first place? Where does it come from?

I'd be grateful for comments and/or pointers to relevant areas in the code.


was (Author: lbartnik):
The first of these warnings could be probably addressed by not calling exit(0) 
from the signal handler. My impression is that after a signal is caught and 
exit() is called, main() never returns, and thus destructors for its local 
objects are not called. Below is the valgrind warning in question.

{code}
==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2
==1990== at 0x4C3017F: operator new(unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned 
long, std::allocator const&) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, 
unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned 
long, char const*, unsigned long) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x198A23: main (store.cc:937)
{code}
 

I tried simply commenting exit() out but that leads to other errors and I 
assume is not intended. I don't see much other signal handling in plasma and my 
current guess is that it is ae that gets interrupted and then drops the event 
loop.

I'd be grateful for comments and/or pointers to relevant areas in the code.

> [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1
> 
>
> Key: ARROW-1380
> URL: https://issues.apache.org/jira/browse/ARROW-1380
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
> Attachments: LastTest.log, valgrind.supp_
>
>
> I thought I fixed this, but they seem to have recurred:
> https://travis-ci.org/apache/arrow/jobs/266421430#L5220



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-1380) [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1

2018-08-21 Thread Lukasz Bartnik (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588313#comment-16588313
 ] 

Lukasz Bartnik edited comment on ARROW-1380 at 8/22/18 3:01 AM:


The first of these warnings could be probably addressed by not calling exit(0) 
from the signal handler. My impression is that after a signal is caught and 
exit() is called, main() never returns, and thus destructors for its local 
objects are not called. Below is the valgrind warning in question.

{code}
==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2
==1990== at 0x4C3017F: operator new(unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned 
long, std::allocator const&) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, 
unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned 
long, char const*, unsigned long) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x198A23: main (store.cc:937)
{code}
 

I tried simply commenting exit() out but that leads to other errors and I 
assume is not intended. I don't see much other signal handling in plasma and my 
current guess is that it is ae that gets interrupted and then drops the event 
loop.

I'd be grateful for comments and/or pointers to relevant areas in the code.


was (Author: lbartnik):
The first of these warnings could be probably addressed by not calling exit(0) 
from the signal handler. My impression is that after a signal is caught and 
exit() is called, main() never returns, and thus destructors for its local 
objects are not called. Below is the valgrind warning in question.

==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2
==1990== at 0x4C3017F: operator new(unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned 
long, std::allocator const&) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, 
unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned 
long, char const*, unsigned long) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x198A23: main (store.cc:937)

 

I tried simply commenting exit() out but that leads to other errors and I 
assume is not intended. I don't see much other signal handling in plasma and my 
current guess is that it is ae that gets interrupted and then drops the event 
loop.

I'd be grateful for comments and/or pointers to relevant areas in the code.

> [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1
> 
>
> Key: ARROW-1380
> URL: https://issues.apache.org/jira/browse/ARROW-1380
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
> Attachments: LastTest.log, valgrind.supp_
>
>
> I thought I fixed this, but they seem to have recurred:
> https://travis-ci.org/apache/arrow/jobs/266421430#L5220



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)