Re: [Openais] stale CPG members in confchg callback

Jan Friesse Thu, 25 Feb 2010 00:52:43 -0800

Dietmar,
thanks for test. With your test I was ABLE to reproduce very fast. This
is definitively something for BZ. I will try to work on that issue, and
let you/others now.


Regards,
  Honza

Dietmar Maurer wrote:
>> Best application for such test is testcpg.c. If there is really bug,
>> can you please create BZ (ideally with way to reproduce, because I'm really
>> not able to reproduce such behavior).
> 
> I still wait for a BZ account, so I post it here. The attached 
> program 'cpgtest' reproduces the problem. Compile with:
> 
> # gcc -Wall cpgtest.c $(shell pkg-config --cflags --libs libcpg libcoroipcc) 
> -o cpgtest
> 
> It executes a simple loop:
> 
> start:
>   cpg_initialize
>   cpg_join
>   cpg_dispatch
>   send one message in confchg_callback
>   cpg_finalize after receiving that message
>   goto start
>  
> When I run that it executes several successful iterations, but sometime
> the join fails:
> 
> # cgptest
> ...
> starting cpgtest
> calling cpg_initialize
> calling cpg_join
> cpg_join failed: 14
> 
> An worse, sometimes it hangs in main loop:
> 
> # cpgtest
> ...
> starting cpgtest
> calling cpg_initialize
> calling cpg_join
> starting main loop (hangs here)
> 
> When that happens, I abort with CTRL-C. After that there is
> such a stale CPG member. After several runs I get:
> 
> # corosync-cpgtool
> TESTGROUP\x00
>                     4610               3 (192.168.2.8)
>                    27678               3 (192.168.2.8)
>                    21828               3 (192.168.2.8)
>                    16841               3 (192.168.2.8)
>                    10901               3 (192.168.2.8)
>                    10773               3 (192.168.2.8)
>                    10496               3 (192.168.2.8)
>                     9866               3 (192.168.2.8)
>                     8552               3 (192.168.2.8)
>                     7439               3 (192.168.2.8)
>                     6782               3 (192.168.2.8)
> 
> Not a single of those PIDs exist! I currently run on Debian squeeze,
> kernel 2.6.32 and corosync 1.2.0.
> 
> Is somebody able to reproduce that issue?
> 
> - Dietmar
> 
>> Regards,
>>   Honza
>>
>> Dietmar Maurer wrote:
>>> Just found the following commit:
>>>
>>>
>> http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;
>> h=bcc5fdef8473d99399c624a7bc15423a2af645c1
>>> The problematic test case looks very similar to my tests - maybe that
>> problem still exists?
>>>> It's strange, but the problem only occurs when fencing is involved,
>>>> and cman kills a node. I will try to write a minimal CPG application
>>>> which
>>>> triggers that bug.
>>>>
>>>> btw, can a memory corruption inside my application cause such
>> behavior?
>>>> - Dietmar
>>>>
>>>>> Dietmar,
>>>>> process *should* be removed after IPC is finished.
>>>>>
>>>>> Maybe it is bug. Do you have any reproduces?
>>>>>
>>>>> Thanks,
>>>>>   Honza
>>>>>
>>>>> Dietmar Maurer wrote:
>>>>>>> Inside my CPG application, The confchg callback is called with
>>>>> 'dead'
>>>>>>> members:
>>>>>>>
>>>>>>> [debug] cpg member node 3 pid 1132
>>>>>>> [debug] cpg member node 3 pid 14640
>>>>>>>
>>>>>>> for example process 1132 does not exists any longer on node 3.
>> Any
>>>>> idea
>>>>>>> what
>>>>>>> can cause such 'ghost' entries?
>>>>>> If I run corosync-cpgtool on the node I get:
>>>>>>
>>>>>>> # corosync-cpgtool
>>>>>>> Group Name             PID         Node ID
>>>>>>> mygroup
>>>>>>>                       1132               3 (192.168.2.8)
>>>>>>>                      14887               3 (192.168.2.8)
>>>>>> But process 1132 does not exists? How can that happen? I thought a
>>>>> process
>>>>>> is automatically removed from the CPG member list if it exits (or
>>>>> crash)?
>>>>>> - Dietmar
>>>>>>
>>>>>> _______________________________________________
>>>>>> Openais mailing list
>>>>>> Openais@lists.linux-foundation.org
>>>>>> https://lists.linux-foundation.org/mailman/listinfo/openais
>>>
> 

_______________________________________________
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Re: [Openais] stale CPG members in confchg callback

Reply via email to