Hi Lars,

And when it passes more than a full day....

* node1
32126 ?        SLs   79:52      0   182 71189 24328  0.1 heartbeat: master 
control process                        

* node2
31928 ?        SLs   77:01      0   182 70869 24008  0.1 heartbeat: master 
control process


Best Regards,
Hideo Yamauchi.


--- On Tue, 2012/5/1, [email protected] <[email protected]> 
wrote:

> Hi Lars,
> 
> We confirmed that this problem occurred with v1 mode of Heartbeat.
>  * The problem happens with the v2 mode in the same way.
> 
> We confirmed a problem in the next procedure.
> 
> Step 1) Put a special device extinguishing a communication packet of 
> Heartbeat in the network.
> 
> Step 2) Between nodes, the retransmission of the message is carried out 
> repeatedly.
> 
> Step 3) Then the memory of the master process increases little by little.
> 
> 
> -------- As a result of the ps command of the master process ----------
> * node1
> (start)
> 32126 ?        SLs    0:00      0   182 53989  7128  0.0 heartbeat: master 
> control process
> (One hour later)
> 32126 ?        SLs    0:03      0   182 54729  7868  0.0 heartbeat: master 
> control process
> (Two hour later)
> 32126 ?        SLs    0:08      0   182 55317  8456  0.0 heartbeat: master 
> control process
> (Four hours later)
> 32126 ?        SLs    0:24      0   182 56673  9812  0.0 heartbeat: master 
> control process 
> 
> * node2
> (start)
> 31928 ?        SLs    0:00      0   182 53989  7128  0.0 heartbeat: master 
> control process
> (One hour later)
> 31928 ?        SLs    0:02      0   182 54481  7620  0.0 heartbeat: master 
> control process
> (Two hour later)
> 31928 ?        SLs    0:08      0   182 55353  8492  0.0 heartbeat: master 
> control process
> (Four hours later)
> 31928 ?        SLs    0:23      0   182 56689  9828  0.0 heartbeat: master 
> control process
> 
> 
> The state of the memory leak seems to vary according to a node with the 
> quantity of the retransmission.
> 
> The increase of this memory disappears by applying my patch.
> 
> And the similar correspondence seems to be necessary in send_reqnodes_msg(), 
> but this is like little leak.
> 
> Best Regards,
> Hideo Yamauchi.
> 
> 
> --- On Sat, 2012/4/28, [email protected] 
> <[email protected]> wrote:
> 
> > Hi Lars,
> > 
> > Thank you for comments.
> > 
> > > Have you actually been able to measure that memory leak you observed,
> > > and you can confirm this patch will fix it?
> > > 
> > > Because I don't think this patch has any effect.
> > 
> > Yes.
> > I really measured leak.
> > I can show a result next week.
> > #Japan is a holiday until Tuesday.
> > 
> > > 
> > > send_rexmit_request() is only used as paramter to
> > > Gmain_timeout_add_full, and it returns FALSE always,
> > > which should cause the respective sourceid to be auto-removed.
> > 
> > It seems to be necessary to release gsource somehow or other.
> > The similar liberation seems to be carried out in lrmd.
> > 
> > Best Regards,
> > Hideo Yamauchi.
> > 
> > 
> > --- On Fri, 2012/4/27, Lars Ellenberg <[email protected]> wrote:
> > 
> > > On Thu, Apr 26, 2012 at 10:56:30AM +0900, [email protected] 
> > > wrote:
> > > > Hi All,
> > > > 
> > > > We gave test that assumed remote cluster environment.
> > > > And we tested packet lost.
> > > > 
> > > > The retransmission timer of Heartbeat causes memory leak.
> > > > 
> > > > I donate a patch.
> > > > Please confirm the contents of the patch.
> > > > And please reflect a patch in a repository of Heartbeat.
> > > 
> > > Have you actually been able to measure that memory leak you observed,
> > > and you can confirm this patch will fix it?
> > > 
> > > Because I don't think this patch has any effect.
> > > 
> > > send_rexmit_request() is only used as paramter to
> > > Gmain_timeout_add_full, and it returns FALSE always,
> > > which should cause the respective sourceid to be auto-removed.
> > > 
> > > 
> > > > diff -r 106ca984041b heartbeat/hb_rexmit.c
> > > > --- a/heartbeat/hb_rexmit.c    Thu Apr 26 19:28:26 2012 +0900
> > > > +++ b/heartbeat/hb_rexmit.c    Thu Apr 26 19:31:44 2012 +0900
> > > > @@ -164,6 +164,8 @@
> > > >      seqno_t seq = (seqno_t) ri->seq;
> > > >      struct node_info* node = ri->node;
> > > >      struct ha_msg*    hmsg;
> > > > +    unsigned long           sourceid;
> > > > +    gpointer value;
> > > >  
> > > >      if (STRNCMP_CONST(node->status, UPSTATUS) != 0 &&
> > > >          STRNCMP_CONST(node->status, ACTIVESTATUS) !=0) {
> > > > @@ -196,11 +198,17 @@
> > > >      
> > > >      node->track.last_rexmit_req = time_longclock();    
> > > >      
> > > > -    if (!g_hash_table_remove(rexmit_hash_table, ri)){
> > > > -        cl_log(LOG_ERR, "%s: entry not found in rexmit_hash_table"
> > > > -               "for seq/node(%ld %s)",                
> > > > -               __FUNCTION__, ri->seq, ri->node->nodename);
> > > > -        return FALSE;
> > > > +    value = g_hash_table_lookup(rexmit_hash_table, ri);
> > > > +    if ( value != NULL) {
> > > > +        sourceid = (unsigned long) value;
> > > > +        Gmain_timeout_remove(sourceid);
> > > > +
> > > > +        if (!g_hash_table_remove(rexmit_hash_table, ri)){
> > > > +            cl_log(LOG_ERR, "%s: entry not found in rexmit_hash_table"
> > > > +                       "for seq/node(%ld %s)",                
> > > > +                       __FUNCTION__, ri->seq, ri->node->nodename);
> > > > +            return FALSE;
> > > > +        }
> > > >      }
> > > >      
> > > >      schedule_rexmit_request(node, seq, max_rexmit_delay);
> > > 
> > > 
> > > -- 
> > > : Lars Ellenberg
> > > : LINBIT | Your Way to High Availability
> > > : DRBD/HA support and consulting http://www.linbit.com
> > > 
> > > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> > > _______________________________________________________
> > > Linux-HA-Dev: [email protected]
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > > Home Page: http://linux-ha.org/
> > > 
> > _______________________________________________________
> > Linux-HA-Dev: [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > Home Page: http://linux-ha.org/
> > 
> _______________________________________________________
> Linux-HA-Dev: [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
> 
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to