Hi Alan, Thank you for comments.
> FYI: there is code in the heartbeat communication layer which is quite happy > to simulate lost packets. > > I made it difficult to turn on accidentally. Read the code for details if > you're interested. All right. Many Thanks, Hideo Yamauchi. --- On Tue, 2012/5/8, Alan Robertson <al...@unix.sh> wrote: > FYI: there is code in the heartbeat communication layer which is quite happy > to simulate lost packets. > > I made it difficult to turn on accidentally. Read the code for details if > you're interested. > > > > On 04/30/2012 10:21 PM, renayama19661...@ybb.ne.jp wrote: > > Hi Lars, > > > > We confirmed that this problem occurred with v1 mode of Heartbeat. > > * The problem happens with the v2 mode in the same way. > > > > We confirmed a problem in the next procedure. > > > > Step 1) Put a special device extinguishing a communication packet of > > Heartbeat in the network. > > > > Step 2) Between nodes, the retransmission of the message is carried out > > repeatedly. > > > > Step 3) Then the memory of the master process increases little by little. > > > > > > -------- As a result of the ps command of the master process ---------- > > * node1 > > (start) > > 32126 ? SLs 0:00 0 182 53989 7128 0.0 heartbeat: master > > control process > > (One hour later) > > 32126 ? SLs 0:03 0 182 54729 7868 0.0 heartbeat: master > > control process > > (Two hour later) > > 32126 ? SLs 0:08 0 182 55317 8456 0.0 heartbeat: master > > control process > > (Four hours later) > > 32126 ? SLs 0:24 0 182 56673 9812 0.0 heartbeat: master > > control process > > > > * node2 > > (start) > > 31928 ? SLs 0:00 0 182 53989 7128 0.0 heartbeat: master > > control process > > (One hour later) > > 31928 ? SLs 0:02 0 182 54481 7620 0.0 heartbeat: master > > control process > > (Two hour later) > > 31928 ? SLs 0:08 0 182 55353 8492 0.0 heartbeat: master > > control process > > (Four hours later) > > 31928 ? SLs 0:23 0 182 56689 9828 0.0 heartbeat: master > > control process > > > > > > The state of the memory leak seems to vary according to a node with the > > quantity of the retransmission. > > > > The increase of this memory disappears by applying my patch. > > > > And the similar correspondence seems to be necessary in > > send_reqnodes_msg(), but this is like little leak. > > > > Best Regards, > > Hideo Yamauchi. > > > > > > --- On Sat, 2012/4/28, > > renayama19661...@ybb.ne.jp<renayama19661...@ybb.ne.jp> wrote: > > > >> Hi Lars, > >> > >> Thank you for comments. > >> > >>> Have you actually been able to measure that memory leak you observed, > >>> and you can confirm this patch will fix it? > >>> > >>> Because I don't think this patch has any effect. > >> Yes. > >> I really measured leak. > >> I can show a result next week. > >> #Japan is a holiday until Tuesday. > >> > >>> send_rexmit_request() is only used as paramter to > >>> Gmain_timeout_add_full, and it returns FALSE always, > >>> which should cause the respective sourceid to be auto-removed. > >> It seems to be necessary to release gsource somehow or other. > >> The similar liberation seems to be carried out in lrmd. > >> > >> Best Regards, > >> Hideo Yamauchi. > >> > >> > >> --- On Fri, 2012/4/27, Lars Ellenberg<lars.ellenb...@linbit.com> wrote: > >> > >>> On Thu, Apr 26, 2012 at 10:56:30AM +0900, renayama19661...@ybb.ne.jp > >>> wrote: > >>>> Hi All, > >>>> > >>>> We gave test that assumed remote cluster environment. > >>>> And we tested packet lost. > >>>> > >>>> The retransmission timer of Heartbeat causes memory leak. > >>>> > >>>> I donate a patch. > >>>> Please confirm the contents of the patch. > >>>> And please reflect a patch in a repository of Heartbeat. > >>> Have you actually been able to measure that memory leak you observed, > >>> and you can confirm this patch will fix it? > >>> > >>> Because I don't think this patch has any effect. > >>> > >>> send_rexmit_request() is only used as paramter to > >>> Gmain_timeout_add_full, and it returns FALSE always, > >>> which should cause the respective sourceid to be auto-removed. > >>> > >>> > >>>> diff -r 106ca984041b heartbeat/hb_rexmit.c > >>>> --- a/heartbeat/hb_rexmit.c Thu Apr 26 19:28:26 2012 +0900 > >>>> +++ b/heartbeat/hb_rexmit.c Thu Apr 26 19:31:44 2012 +0900 > >>>> @@ -164,6 +164,8 @@ > >>>> seqno_t seq = (seqno_t) ri->seq; > >>>> struct node_info* node = ri->node; > >>>> struct ha_msg* hmsg; > >>>> + unsigned long sourceid; > >>>> + gpointer value; > >>>> if (STRNCMP_CONST(node->status, UPSTATUS) != 0&& > >>>> STRNCMP_CONST(node->status, ACTIVESTATUS) !=0) { > >>>> @@ -196,11 +198,17 @@ > >>>> node->track.last_rexmit_req = time_longclock(); - > >>>> if (!g_hash_table_remove(rexmit_hash_table, ri)){ > >>>> - cl_log(LOG_ERR, "%s: entry not found in rexmit_hash_table" > >>>> - "for seq/node(%ld %s)", - > >>>> __FUNCTION__, ri->seq, ri->node->nodename); > >>>> - return FALSE; > >>>> + value = g_hash_table_lookup(rexmit_hash_table, ri); > >>>> + if ( value != NULL) { > >>>> + sourceid = (unsigned long) value; > >>>> + Gmain_timeout_remove(sourceid); > >>>> + > >>>> + if (!g_hash_table_remove(rexmit_hash_table, ri)){ > >>>> + cl_log(LOG_ERR, "%s: entry not found in rexmit_hash_table" > >>>> + "for seq/node(%ld %s)", + > >>>> __FUNCTION__, ri->seq, ri->node->nodename); > >>>> + return FALSE; > >>>> + } > >>>> } > >>>> schedule_rexmit_request(node, seq, max_rexmit_delay); > >>> > >>> -- : Lars Ellenberg > >>> : LINBIT | Your Way to High Availability > >>> : DRBD/HA support and consulting http://www.linbit.com > >>> > >>> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. > >>> _______________________________________________________ > >>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > >>> Home Page: http://linux-ha.org/ > >>> > >> _______________________________________________________ > >> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > >> Home Page: http://linux-ha.org/ > >> > > _______________________________________________________ > > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > > Home Page: http://linux-ha.org/ > > > -- Alan Robertson<al...@unix.sh> - @OSSAlanR > > "Openness is the foundation and preservative of friendship... Let me claim > from you at all times your undisguised opinions." - William Wilberforce > _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/