Hi Alan,

Thank you for comments.

> FYI: there is code in the heartbeat communication layer which is quite happy 
> to simulate lost packets.
> 
> I made it difficult to turn on accidentally.  Read the code for details if 
> you're interested.

All right.

Many Thanks,
Hideo Yamauchi.

--- On Tue, 2012/5/8, Alan Robertson <al...@unix.sh> wrote:

> FYI: there is code in the heartbeat communication layer which is quite happy 
> to simulate lost packets.
> 
> I made it difficult to turn on accidentally.  Read the code for details if 
> you're interested.
> 
> 
> 
> On 04/30/2012 10:21 PM, renayama19661...@ybb.ne.jp wrote:
> > Hi Lars,
> > 
> > We confirmed that this problem occurred with v1 mode of Heartbeat.
> >   * The problem happens with the v2 mode in the same way.
> > 
> > We confirmed a problem in the next procedure.
> > 
> > Step 1) Put a special device extinguishing a communication packet of 
> > Heartbeat in the network.
> > 
> > Step 2) Between nodes, the retransmission of the message is carried out 
> > repeatedly.
> > 
> > Step 3) Then the memory of the master process increases little by little.
> > 
> > 
> > -------- As a result of the ps command of the master process ----------
> > * node1
> > (start)
> > 32126 ?        SLs    0:00      0   182 53989  7128  0.0 heartbeat: master 
> > control process
> > (One hour later)
> > 32126 ?        SLs    0:03      0   182 54729  7868  0.0 heartbeat: master 
> > control process
> > (Two hour later)
> > 32126 ?        SLs    0:08      0   182 55317  8456  0.0 heartbeat: master 
> > control process
> > (Four hours later)
> > 32126 ?        SLs    0:24      0   182 56673  9812  0.0 heartbeat: master 
> > control process
> > 
> > * node2
> > (start)
> > 31928 ?        SLs    0:00      0   182 53989  7128  0.0 heartbeat: master 
> > control process
> > (One hour later)
> > 31928 ?        SLs    0:02      0   182 54481  7620  0.0 heartbeat: master 
> > control process
> > (Two hour later)
> > 31928 ?        SLs    0:08      0   182 55353  8492  0.0 heartbeat: master 
> > control process
> > (Four hours later)
> > 31928 ?        SLs    0:23      0   182 56689  9828  0.0 heartbeat: master 
> > control process
> > 
> > 
> > The state of the memory leak seems to vary according to a node with the 
> > quantity of the retransmission.
> > 
> > The increase of this memory disappears by applying my patch.
> > 
> > And the similar correspondence seems to be necessary in 
> > send_reqnodes_msg(), but this is like little leak.
> > 
> > Best Regards,
> > Hideo Yamauchi.
> > 
> > 
> > --- On Sat, 2012/4/28, 
> > renayama19661...@ybb.ne.jp<renayama19661...@ybb.ne.jp>  wrote:
> > 
> >> Hi Lars,
> >> 
> >> Thank you for comments.
> >> 
> >>> Have you actually been able to measure that memory leak you observed,
> >>> and you can confirm this patch will fix it?
> >>> 
> >>> Because I don't think this patch has any effect.
> >> Yes.
> >> I really measured leak.
> >> I can show a result next week.
> >> #Japan is a holiday until Tuesday.
> >> 
> >>> send_rexmit_request() is only used as paramter to
> >>> Gmain_timeout_add_full, and it returns FALSE always,
> >>> which should cause the respective sourceid to be auto-removed.
> >> It seems to be necessary to release gsource somehow or other.
> >> The similar liberation seems to be carried out in lrmd.
> >> 
> >> Best Regards,
> >> Hideo Yamauchi.
> >> 
> >> 
> >> --- On Fri, 2012/4/27, Lars Ellenberg<lars.ellenb...@linbit.com>  wrote:
> >> 
> >>> On Thu, Apr 26, 2012 at 10:56:30AM +0900, renayama19661...@ybb.ne.jp 
> >>> wrote:
> >>>> Hi All,
> >>>> 
> >>>> We gave test that assumed remote cluster environment.
> >>>> And we tested packet lost.
> >>>> 
> >>>> The retransmission timer of Heartbeat causes memory leak.
> >>>> 
> >>>> I donate a patch.
> >>>> Please confirm the contents of the patch.
> >>>> And please reflect a patch in a repository of Heartbeat.
> >>> Have you actually been able to measure that memory leak you observed,
> >>> and you can confirm this patch will fix it?
> >>> 
> >>> Because I don't think this patch has any effect.
> >>> 
> >>> send_rexmit_request() is only used as paramter to
> >>> Gmain_timeout_add_full, and it returns FALSE always,
> >>> which should cause the respective sourceid to be auto-removed.
> >>> 
> >>> 
> >>>> diff -r 106ca984041b heartbeat/hb_rexmit.c
> >>>> --- a/heartbeat/hb_rexmit.c    Thu Apr 26 19:28:26 2012 +0900
> >>>> +++ b/heartbeat/hb_rexmit.c    Thu Apr 26 19:31:44 2012 +0900
> >>>> @@ -164,6 +164,8 @@
> >>>>        seqno_t seq = (seqno_t) ri->seq;
> >>>>        struct node_info* node = ri->node;
> >>>>        struct ha_msg*    hmsg;
> >>>> +    unsigned long           sourceid;
> >>>> +    gpointer value;
> >>>>          if (STRNCMP_CONST(node->status, UPSTATUS) != 0&&
> >>>>            STRNCMP_CONST(node->status, ACTIVESTATUS) !=0) {
> >>>> @@ -196,11 +198,17 @@
> >>>>              node->track.last_rexmit_req = time_longclock();         -   
> >>>> if (!g_hash_table_remove(rexmit_hash_table, ri)){
> >>>> -        cl_log(LOG_ERR, "%s: entry not found in rexmit_hash_table"
> >>>> -               "for seq/node(%ld %s)",                -               
> >>>> __FUNCTION__, ri->seq, ri->node->nodename);
> >>>> -        return FALSE;
> >>>> +    value = g_hash_table_lookup(rexmit_hash_table, ri);
> >>>> +    if ( value != NULL) {
> >>>> +        sourceid = (unsigned long) value;
> >>>> +        Gmain_timeout_remove(sourceid);
> >>>> +
> >>>> +        if (!g_hash_table_remove(rexmit_hash_table, ri)){
> >>>> +            cl_log(LOG_ERR, "%s: entry not found in rexmit_hash_table"
> >>>> +                       "for seq/node(%ld %s)",                +         
> >>>>               __FUNCTION__, ri->seq, ri->node->nodename);
> >>>> +            return FALSE;
> >>>> +        }
> >>>>        }
> >>>>              schedule_rexmit_request(node, seq, max_rexmit_delay);
> >>> 
> >>> -- : Lars Ellenberg
> >>> : LINBIT | Your Way to High Availability
> >>> : DRBD/HA support and consulting http://www.linbit.com
> >>> 
> >>> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> >>> _______________________________________________________
> >>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> >>> Home Page: http://linux-ha.org/
> >>> 
> >> _______________________________________________________
> >> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> >> Home Page: http://linux-ha.org/
> >> 
> > _______________________________________________________
> > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > Home Page: http://linux-ha.org/
> 
> 
> --     Alan Robertson<al...@unix.sh>  - @OSSAlanR
> 
> "Openness is the foundation and preservative of friendship...  Let me claim 
> from you at all times your undisguised opinions." - William Wilberforce
> 
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to