[jira] [Commented] (QPID-8134) qpid::client::Message::send multiple memory leaks

Alan Conway (JIRA) Tue, 07 Aug 2018 07:22:44 -0700


    [ 
https://issues.apache.org/jira/browse/QPID-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571743#comment-16571743
 ]


Alan Conway commented on QPID-8134:
-----------------------------------

Hi Dan,

I'll keep trying to reproduce based on all the info you've given me, if you 
come up with anything to make it easier that will be much appreciated.

> the 'options.vexit' path must be used for analysis.

Yes, gospout.sh sets that option and I do see "still reachable" blocks like you 
do, but they're not growing when I send more messages. If I run without --vexit 
no memory is leaked. I'm convinced I'm missing something from your scenario, 
just not sure what yet.

I recommend 'valgrind --tool=massif' as well. It tracks memory use while the 
process is running, not just leaks on exit. You can get reports dumped on exit 
or periodically from a long-running process. Install the separate package 
"massif-visualizer" to view the results, the built-in text reports are hard to 
read.

> Is it possible to turn OFF the policy which attempts to out-wit the underling 
> malloc implementation by caching unused memory in the QPID C++ library?

Sender/Receiver data structures can grow up to the message-count limit 
established by setCapacity(), so you can reduce memory use by lowering the 
capacity. Theoretically the max useful buffer size is based on the 
latency-bandwidth product for the total send-ack round-trip - i.e. the number 
of messages you can send before the first one gets an ack. In practice it's 
usually best to experiment and measure the latency/throughput/memory trade-offs.

However, you are seeing unbounded growth over time which is definitely a bug, 
not the expected memory caching. That's what I need to track down. Some 
questions:

I believe you still use the 0-10 protocol - please confirm. This will be a 
different code path under the 1.0 protocol.

What value do you use for setCapacity() on your Senders and Receivers, or do 
you use the default capacity (50)?

Do you know if the memory leaks you are seeing are related to Sender only, 
Receiver only, both or unsure?

How often do you acknowledge() received messages - approx what is the delay 
between receiving and acknowledging a message? Do you use the sync=true on 
acknowledge?

If you can, run your application for a short period - 100 messages or so - with 
env QPID_TRACE=1 and send me the trace, it may give me some idea what's 
happening differently in your system.

I'll keep digging on my end.

 

> qpid::client::Message::send multiple memory leaks
> -------------------------------------------------
>
>                 Key: QPID-8134
>                 URL: https://issues.apache.org/jira/browse/QPID-8134
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: qpid-cpp-1.37.0, qpid-cpp-1.38.0
>         Environment: *CentOS* Linux release 7.4.1708 (Core)
> Linux localhost.novalocal 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 
> UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> *qpid*-qmf-1.37.0-1.el7.x86_64
> *qpid*-dispatch-debuginfo-1.0.0-1.el7.x86_64
> python-*qpid*-1.37.0-1.el7.noarch
> *qpid*-proton-c-0.18.1-1.el7.x86_64
> python-*qpid*-qmf-1.37.0-1.el7.x86_64
> *qpid*-proton-debuginfo-0.18.1-1.el7.x86_64
> *qpid*-cpp-debuginfo-1.37.0-1.el7.x86_64
> *qpid*-cpp-client-devel-1.37.0-1.el7.x86_64
> *qpid*-cpp-server-1.37.0-1.el7.x86_64
> *qpid*-cpp-client-1.37.0-1.el7.x86_64
>  
>            Reporter: dan clark
>            Assignee: Alan Conway
>            Priority: Blocker
>              Labels: leak, maven
>             Fix For: qpid-cpp-1.39.0
>
>         Attachments: drain.cpp, godrain.sh, gospout.sh, qpid-8134.tgz, 
> qpid-stat.out, spout.cpp, spout.log
>
>   Original Estimate: 40h
>  Remaining Estimate: 40h
>
> There may be multiple leaks of the outgoing message structure and associated 
> fields when using the qpid::client::amqp0_10::SenderImpl::send function to 
> publish messages under certain setups. I will concede that there may be 
> options that are beyond my ken to ameliorate the leak of messages structures, 
> especially since there is an indication that under prolonged runs (a 
> demonized version of an application like spout) that the statistics for quidd 
> indicate increased acquires with zero releases.
> The basic notion is illustrated with the test application spout (and drain).  
> Consider a long running daemon reducing the overhead of open/send/close by 
> keeping the message connection open for long periods of time.  Then the logic 
> would be: start application/open connection.  In a loop send data (and never 
> reach a close).  Thus the drain application illustrates the behavior and 
> demonstrates the leak using valgrind by sending the data followed by an 
> exit(0).  
> Note also the lack of 'releases' associated with the 'acquires' in the stats 
> output.
> Capturing the leaks using the test applications spout/drain required adding 
> an 'exit()' prior to the close, as during normal operations of a daemon, the 
> connection remains open for a sustained period of time, thus the leak of 
> structures within the c++ client library are found as structures still 
> tracked by the library and cleaned up on 'connection.close()', but they 
> should be cleaned up as a result of the completion of the send/receive ack or 
> the termination of the life of the message based on the TTL of the message, 
> which ever comes first.  I have witnessed growth of the leaked structures 
> into the millions of messages lasting more than 24hours with short (300sec) 
> TTL of the messages based on scenarios attached using spout/drain as test 
> vehicle.
> The attached spout.log uses a short 10message test and the spout.log contains 
> 5 sets of different structures leaked (found with the 'bytes in 10 blocks are 
> still reachable' lines, that are in line with much more sustained leaks when 
> running the application for multiple days with millions of messages.
> The leaks seem to be associated with structures allocation 'stdstrings' to 
> save the "subject" and the "payload" for string based messages using send for 
> amq.topic output.
> Suggested work arounds are welcome based on application level changes to 
> spout/drain (if they are missing key components) or changes to the 
> address/setup of the queues for amq.topic messages (see the 'gospout.sh and 
> godrain.sh' test drivers providing the specific address structures being used.
> For example, the following is one of the 5 different categories of leaked 
> data from 'spout.log' on a valgrind analysis of the output post the send and 
> session.sync but prior connection.close():
>  
> ==3388== 3,680 bytes in 10 blocks are still reachable in loss record 233 of 
> 234
> ==3388==    at 0x4C2A203: operator new(unsigned long) 
> (vg_replace_malloc.c:334)
> ==3388==    by 0x4EB046C: qpid::client::Message::Message(std::string const&, 
> std::string const&) (Message.cpp:31)
> ==3388==    by 0x51742C1: 
> qpid::client::amqp0_10::OutgoingMessage::OutgoingMessage() 
> (OutgoingMessage.cpp:167)
> ==3388==    by 0x5186200: 
> qpid::client::amqp0_10::SenderImpl::sendImpl(qpid::messaging::Message const&) 
> (SenderImpl.cpp:140)
> ==3388==    by 0x5186485: operator() (SenderImpl.h:114)
> ==3388==    by 0x5186485: execute<qpid::client::amqp0_10::SenderImpl::Send> 
> (SessionImpl.h:102)
> ==3388==    by 0x5186485: 
> qpid::client::amqp0_10::SenderImpl::send(qpid::messaging::Message const&, 
> bool) (SenderImpl.cpp:49)
> ==3388==    by 0x40438D: main (spout.cpp:185)
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

[jira] [Commented] (QPID-8134) qpid::client::Message::send multiple memory leaks

Reply via email to