Hi Lars,

Thank you for comments.

> > 
> > And when it passes more than a full day....
> > 
> > * node1
> > 32126 ?        SLs   79:52      0   182 71189 24328  0.1 heartbeat: master 
> > control process                        
> > 
> > * node2
> > 31928 ?        SLs   77:01      0   182 70869 24008  0.1 heartbeat: master 
> > control process
> 
> Oh, I see.
> 
> This is a "design choice" (maybe not even intentional) of the Gmain_*
> wrappers used throughout the heartbeat code.
> 
> The "real" glib g_timeout_add_full(), and most other similar functions,
> internally do
>  id = g_source_attach(source, ...);
>  g_source_unref(source);
>  return id;
> 
> Thus in g_main_dispatch, the
>  need_destroy = ! dispatch (...)
>  if (need_destroy)
>      g_source_destroy_internal()
> 
> in fact ends up destroying it,
> if dispatch() returns FALSE,
> as documented: 
>     The function is called repeatedly until it returns FALSE, at
>     which point the timeout is automatically destroyed and the
>     function will not be called again.
> 
> Not so with the heartbeat/glue/libplumbing Gmain_timeout_add_full.
> It does not g_source_unref(), so we keep the extra reference around
> until someone explicitly calls Gmain_timeout_remove().
> 
> Talk about principle of least surprise :(
> 
> Changing this behaviour to match glib's, i.e. unref'ing after
> g_source_attach, would seem like the "correct" thing to do,
> but is likely to break other pieces of code in subtle ways,
> so it may not be the "right" thing to do at this point.

Thank you for detailed explanation.
If you found the method that is appropriate than the correction that I 
suggested, I approve of it.

> I'm going to take your patch more or less as is.
> If it does not show up soon, prod me again.
> 

All right.

Many Thanks!
Hideo Yamauchi.  


> Thank you for tracking this down.
> 
> > Best Regards,
> > Hideo Yamauchi.
> > 
> > 
> > --- On Tue, 2012/5/1, renayama19661...@ybb.ne.jp 
> > <renayama19661...@ybb.ne.jp> wrote:
> > 
> > > Hi Lars,
> > > 
> > > We confirmed that this problem occurred with v1 mode of Heartbeat.
> > >  * The problem happens with the v2 mode in the same way.
> > > 
> > > We confirmed a problem in the next procedure.
> > > 
> > > Step 1) Put a special device extinguishing a communication packet of 
> > > Heartbeat in the network.
> > > 
> > > Step 2) Between nodes, the retransmission of the message is carried out 
> > > repeatedly.
> > > 
> > > Step 3) Then the memory of the master process increases little by little.
> > > 
> > > 
> > > -------- As a result of the ps command of the master process ----------
> > > * node1
> > > (start)
> > > 32126 ?        SLs    0:00      0   182 53989  7128  0.0 heartbeat: 
> > > master control process
> > > (One hour later)
> > > 32126 ?        SLs    0:03      0   182 54729  7868  0.0 heartbeat: 
> > > master control process
> > > (Two hour later)
> > > 32126 ?        SLs    0:08      0   182 55317  8456  0.0 heartbeat: 
> > > master control process
> > > (Four hours later)
> > > 32126 ?        SLs    0:24      0   182 56673  9812  0.0 heartbeat: 
> > > master control process 
> > > 
> > > * node2
> > > (start)
> > > 31928 ?        SLs    0:00      0   182 53989  7128  0.0 heartbeat: 
> > > master control process
> > > (One hour later)
> > > 31928 ?        SLs    0:02      0   182 54481  7620  0.0 heartbeat: 
> > > master control process
> > > (Two hour later)
> > > 31928 ?        SLs    0:08      0   182 55353  8492  0.0 heartbeat: 
> > > master control process
> > > (Four hours later)
> > > 31928 ?        SLs    0:23      0   182 56689  9828  0.0 heartbeat: 
> > > master control process
> > > 
> > > 
> > > The state of the memory leak seems to vary according to a node with the 
> > > quantity of the retransmission.
> > > 
> > > The increase of this memory disappears by applying my patch.
> > > 
> > > And the similar correspondence seems to be necessary in 
> > > send_reqnodes_msg(), but this is like little leak.
> > > 
> > > Best Regards,
> > > Hideo Yamauchi.
> > > 
> > > 
> > > --- On Sat, 2012/4/28, renayama19661...@ybb.ne.jp 
> > > <renayama19661...@ybb.ne.jp> wrote:
> > > 
> > > > Hi Lars,
> > > > 
> > > > Thank you for comments.
> > > > 
> > > > > Have you actually been able to measure that memory leak you observed,
> > > > > and you can confirm this patch will fix it?
> > > > > 
> > > > > Because I don't think this patch has any effect.
> > > > 
> > > > Yes.
> > > > I really measured leak.
> > > > I can show a result next week.
> > > > #Japan is a holiday until Tuesday.
> > > > 
> > > > > 
> > > > > send_rexmit_request() is only used as paramter to
> > > > > Gmain_timeout_add_full, and it returns FALSE always,
> > > > > which should cause the respective sourceid to be auto-removed.
> > > > 
> > > > It seems to be necessary to release gsource somehow or other.
> > > > The similar liberation seems to be carried out in lrmd.
> > > > 
> > > > Best Regards,
> > > > Hideo Yamauchi.
> > > > 
> > > > 
> > > > --- On Fri, 2012/4/27, Lars Ellenberg <lars.ellenb...@linbit.com> wrote:
> > > > 
> > > > > On Thu, Apr 26, 2012 at 10:56:30AM +0900, renayama19661...@ybb.ne.jp 
> > > > > wrote:
> > > > > > Hi All,
> > > > > > 
> > > > > > We gave test that assumed remote cluster environment.
> > > > > > And we tested packet lost.
> > > > > > 
> > > > > > The retransmission timer of Heartbeat causes memory leak.
> > > > > > 
> > > > > > I donate a patch.
> > > > > > Please confirm the contents of the patch.
> > > > > > And please reflect a patch in a repository of Heartbeat.
> > > > > 
> > > > > Have you actually been able to measure that memory leak you observed,
> > > > > and you can confirm this patch will fix it?
> > > > > 
> > > > > Because I don't think this patch has any effect.
> > > > > 
> > > > > send_rexmit_request() is only used as paramter to
> > > > > Gmain_timeout_add_full, and it returns FALSE always,
> > > > > which should cause the respective sourceid to be auto-removed.
> > > > > 
> > > > > 
> > > > > > diff -r 106ca984041b heartbeat/hb_rexmit.c
> > > > > > --- a/heartbeat/hb_rexmit.c    Thu Apr 26 19:28:26 2012 +0900
> > > > > > +++ b/heartbeat/hb_rexmit.c    Thu Apr 26 19:31:44 2012 +0900
> > > > > > @@ -164,6 +164,8 @@
> > > > > >      seqno_t seq = (seqno_t) ri->seq;
> > > > > >      struct node_info* node = ri->node;
> > > > > >      struct ha_msg*    hmsg;
> > > > > > +    unsigned long           sourceid;
> > > > > > +    gpointer value;
> > > > > >  
> > > > > >      if (STRNCMP_CONST(node->status, UPSTATUS) != 0 &&
> > > > > >          STRNCMP_CONST(node->status, ACTIVESTATUS) !=0) {
> > > > > > @@ -196,11 +198,17 @@
> > > > > >      
> > > > > >      node->track.last_rexmit_req = time_longclock();    
> > > > > >      
> > > > > > -    if (!g_hash_table_remove(rexmit_hash_table, ri)){
> > > > > > -        cl_log(LOG_ERR, "%s: entry not found in rexmit_hash_table"
> > > > > > -               "for seq/node(%ld %s)",                
> > > > > > -               __FUNCTION__, ri->seq, ri->node->nodename);
> > > > > > -        return FALSE;
> > > > > > +    value = g_hash_table_lookup(rexmit_hash_table, ri);
> > > > > > +    if ( value != NULL) {
> > > > > > +        sourceid = (unsigned long) value;
> > > > > > +        Gmain_timeout_remove(sourceid);
> > > > > > +
> > > > > > +        if (!g_hash_table_remove(rexmit_hash_table, ri)){
> > > > > > +            cl_log(LOG_ERR, "%s: entry not found in 
> > > > > > rexmit_hash_table"
> > > > > > +                       "for seq/node(%ld %s)",                
> > > > > > +                       __FUNCTION__, ri->seq, ri->node->nodename);
> > > > > > +            return FALSE;
> > > > > > +        }
> > > > > >      }
> > > > > >      
> > > > > >      schedule_rexmit_request(node, seq, max_rexmit_delay);
> 
> -- 
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
> 
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
> 
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to