Re: [Linux-HA] Antw: Re: beating a dead horse: cLVM, OCFS2 and TOTEM
Seeing the high dropping quote... (just compare this to the other NIC) - have you tried a new cable? Maybe it's a cheap hardware problem... -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Lars Marowsky-Bree Gesendet: Donnerstag, 11. Juli 2013 11:20 An: General Linux-HA mailing list Betreff: Re: [Linux-HA] Antw: Re: beating a dead horse: cLVM, OCFS2 and TOTEM On 2013-07-11T08:41:33, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: For a really silly idea, but can you swap the network cards for a test? Say, with Intel NICs, or even another Broadcom model? Unfortunately no: The 4-way NIC is onboard, and all slots are full. Too bad. But then you could really try raising a support request about the network driver, perhaps one of the kernel/networking gurus has an idea. RX packet drops. Maybe the bug is in the bonding code... bond0: RX packets:211727910 errors:0 dropped:18996906 overruns:0 frame:0 eth1: RX packets:192885954 errors:0 dropped:21 overruns:0 frame:0 eth4: RX packets:18841956 errors:0 dropped:18841956 overruns:0 frame:0 Both cards are identical. I wonder: If bonding mode is fault-tolerance (active-backup), is it normal then to see such statistics. ethtool -S reports a high number for rx_filtered_packets... Possibly. It'd be interesting to know what packets get dropped; this means you have approx. 10% of your traffic on the backup link. I wonder if all the nodes/switches/etc agree on what is the backup port and what isn't ...? If 10% of the communication ends up on the wrong NIC, that surely would mess up a number of recovery protocols. An alternative test case would be to see how the system behaves if you disable bonding - or if the names should stay the same, only one NIC in the bond. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] crm resource restart is broken (d4de3af6dd33)
Hi Dejan, It seems like resource restart does not work any longer. # crm resource restart test01-vm INFO: ordering test01-vm to stop Traceback (most recent call last): File /usr/sbin/crm, line 44, in module main.run() File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 442, in run File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 349, in do_work File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 150, in parse_line File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 149, in lambda File /usr/lib64/python2.6/site-packages/crmsh/ui.py, line 894, in restart File /usr/lib64/python2.6/site-packages/crmsh/utils.py, line 429, in wait4dc File /usr/lib64/python2.6/site-packages/crmsh/utils.py, line 544, in crm_msec File /usr/lib64/python2.6/re.py, line 137, in match TypeError: expected string or buffer Vladislav ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] crm resource restart is broken (d4de3af6dd33)
12.07.2013 12:06, Vladislav Bogdanov wrote: Hi Dejan, It seems like resource restart does not work any longer. Ah, this seems to be fixed by bb39cce17f20. Sorry for noise. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: beating a dead horse: cLVM, OCFS2 and TOTEM
On 2013-07-12T11:05:32, Wengatz Herbert herbert.weng...@baaderbank.de wrote: Seeing the high dropping quote... (just compare this to the other NIC) - have you tried a new cable? Maybe it's a cheap hardware problem... The drop rate is normal. A slave NIC in a bonded active/passive configuration will drop all packets. I do wonder why there's so much traffic on a supposedly passive NIC, though. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: beating a dead horse: cLVM, OCFS2 and TOTEM
Hmmm. Please correct me, if I'm wrong: As I understand it, you have a number of packets, that go to BOTH NICs. Depending, on which one is the active or the passive one, the sum of all dropped packets should be equal to the number of received packets (plusminus some drops for other reasons). So if one card drops 10% of the packets, the other should drop 90% of the packets. - This is not the case here. Regards, Herbert -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Lars Marowsky-Bree Gesendet: Freitag, 12. Juli 2013 11:09 An: General Linux-HA mailing list Betreff: Re: [Linux-HA] Antw: Re: beating a dead horse: cLVM, OCFS2 and TOTEM On 2013-07-12T11:05:32, Wengatz Herbert herbert.weng...@baaderbank.de wrote: Seeing the high dropping quote... (just compare this to the other NIC) - have you tried a new cable? Maybe it's a cheap hardware problem... The drop rate is normal. A slave NIC in a bonded active/passive configuration will drop all packets. I do wonder why there's so much traffic on a supposedly passive NIC, though. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] crm resource restart is broken (d4de3af6dd33)
Hi, On Fri, Jul 12, 2013 at 12:08:12PM +0300, Vladislav Bogdanov wrote: 12.07.2013 12:06, Vladislav Bogdanov wrote: Hi Dejan, It seems like resource restart does not work any longer. Yes, I still wonder how did that one slip. Ah, this seems to be fixed by bb39cce17f20. Sorry for noise. No need to be sorry, I appreciate every bug report. Cheers, Dejan ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Antw: crm resource restart is broken (d4de3af6dd33)
Vladislav Bogdanov bub...@hoster-ok.com schrieb am 12.07.2013 um 11:06 in Nachricht 51dfc715.2030...@hoster-ok.com: Hi Dejan, It seems like resource restart does not work any longer. BTW: The way resource restart is implemented (i.e.: stop wait, then start) has a major problem: If stop causes to fence the node where the crm command is running, the resource will remain stopped even after the node restarted. # crm resource restart test01-vm INFO: ordering test01-vm to stop Traceback (most recent call last): File /usr/sbin/crm, line 44, in module main.run() File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 442, in run File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 349, in do_work File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 150, in parse_line File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 149, in lambda File /usr/lib64/python2.6/site-packages/crmsh/ui.py, line 894, in restart File /usr/lib64/python2.6/site-packages/crmsh/utils.py, line 429, in wait4dc File /usr/lib64/python2.6/site-packages/crmsh/utils.py, line 544, in crm_msec File /usr/lib64/python2.6/re.py, line 137, in match TypeError: expected string or buffer Vladislav ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: beating a dead horse: cLVM, OCFS2 and TOTEM
Lars Marowsky-Bree l...@suse.com schrieb am 12.07.2013 um 11:08 in Nachricht 20130712090853.gm19...@suse.de: On 2013-07-12T11:05:32, Wengatz Herbert herbert.weng...@baaderbank.de wrote: Seeing the high dropping quote... (just compare this to the other NIC) - have you tried a new cable? Maybe it's a cheap hardware problem... The drop rate is normal. A slave NIC in a bonded active/passive configuration will drop all packets. I do wonder why there's so much traffic on a supposedly passive NIC, though. Lars, that depends on the uptime. I think our network guys had updated the firmware of some switches, casuing a switch reboot and a switch to a different bonding slave, I guess. Regards, Ulrich Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] crm node delete
01.07.2013 17:29, Vladislav Bogdanov wrote: Hi, I'm trying to look if it is now safe to delete non-running nodes (corosync 2.3, pacemaker HEAD, crmsh tip). # crm node delete v02-d WARNING: 2: crm_node bad format: 7 v02-c WARNING: 2: crm_node bad format: 8 v02-d WARNING: 2: crm_node bad format: 5 v02-a WARNING: 2: crm_node bad format: 6 v02-b INFO: 2: node v02-d not found by crm_node INFO: 2: node v02-d deleted # So, I expect that crmsh still doesn't follow latest changes to 'crm_node -l'. Although node seems to be deleted correctly. For reference, output of crm_node -l is: 7 v02-c 8 v02-d 5 v02-a 6 v02-b With latest merge of Andrew's public and private trees and crmsh tip everything works as expected. The only (minor but confusing) issue is: [root@vd01-a ~]# crm_node -l 3 vd01-c 4 vd01-d 1 vd01-a 2 vd01-b [root@vd01-a ~]# crm_node -p vd01-c vd01-a vd01-b [root@vd01-a ~]# crm node delete vd01-d WARNING: crm_node --force -R vd01-d failed, rc=1 Looks like missing crm_exit(pcmk_ok) for -R in try_corosync(). ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: crm resource restart is broken (d4de3af6dd33)
On 2013-07-12T12:19:40, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: BTW: The way resource restart is implemented (i.e.: stop wait, then start) has a major problem: If stop causes to fence the node where the crm command is running, the resource will remain stopped even after the node restarted. Yes. That's a limitation that's difficult to overcome. restart is a multi-phase command, so if something happens to the node where it runs, you have a problem. But the need for a manual restart should be rare. If the resource is running and healthy according to monitor, why should it be necessary? ;-) (Another way to trigger a restart is to modify the instance parameters. Set __manual_restart=1 and it'll restart.) Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: beating a dead horse: cLVM, OCFS2 and TOTEM (dropped packets on binding device)
Wengatz Herbert herbert.weng...@baaderbank.de schrieb am 12.07.2013 um 11:19 in Nachricht e0a8d3556d452c42977202b2d60934660431fae...@msx2.baag: Hmmm. Please correct me, if I'm wrong: As I understand it, you have a number of packets, that go to BOTH NICs. Depending, on which one is the active or the passive one, the sum of all dropped packets should be equal to the number of received packets (plusminus some drops for other reasons). So if one card drops 10% of the packets, the other should drop 90% of the packets. - This is not the case here. I haven't added all the numbers, but it's also quite confusing that the dropped packets are pushed up to the bonding master: If dropped packets is part of the bonding implementation, the number of dropped packets should be hidden at the bonding level. If you have a bonding device with four slaves in active/passive (being paranoid), you should see three times as much dropped packets than received packets, right? (I adjusted the subject for this discussion) Regards, Herbert -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Lars Marowsky-Bree Gesendet: Freitag, 12. Juli 2013 11:09 An: General Linux-HA mailing list Betreff: Re: [Linux-HA] Antw: Re: beating a dead horse: cLVM, OCFS2 and TOTEM On 2013-07-12T11:05:32, Wengatz Herbert herbert.weng...@baaderbank.de wrote: Seeing the high dropping quote... (just compare this to the other NIC) - have you tried a new cable? Maybe it's a cheap hardware problem... The drop rate is normal. A slave NIC in a bonded active/passive configuration will drop all packets. I do wonder why there's so much traffic on a supposedly passive NIC, though. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: crm resource restart is broken (d4de3af6dd33)
Lars Marowsky-Bree l...@suse.com schrieb am 12.07.2013 um 12:23 in Nachricht 20130712102340.go19...@suse.de: On 2013-07-12T12:19:40, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: BTW: The way resource restart is implemented (i.e.: stop wait, then start) has a major problem: If stop causes to fence the node where the crm command is running, the resource will remain stopped even after the node restarted. Yes. That's a limitation that's difficult to overcome. restart is a multi-phase command, so if something happens to the node where it runs, you have a problem. But the need for a manual restart should be rare. If the resource is running and healthy according to monitor, why should it be necessary? ;-) (Another way to trigger a restart is to modify the instance parameters. Set __manual_restart=1 and it'll restart.) once? ;-) ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: crm resource restart is broken (d4de3af6dd33)
On 2013-07-12T12:26:18, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: (Another way to trigger a restart is to modify the instance parameters. Set __manual_restart=1 and it'll restart.) once? ;-) Keep increasing it. ;-) -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] crm node delete
On 12/07/2013, at 8:23 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote: 01.07.2013 17:29, Vladislav Bogdanov wrote: Hi, I'm trying to look if it is now safe to delete non-running nodes (corosync 2.3, pacemaker HEAD, crmsh tip). # crm node delete v02-d WARNING: 2: crm_node bad format: 7 v02-c WARNING: 2: crm_node bad format: 8 v02-d WARNING: 2: crm_node bad format: 5 v02-a WARNING: 2: crm_node bad format: 6 v02-b INFO: 2: node v02-d not found by crm_node INFO: 2: node v02-d deleted # So, I expect that crmsh still doesn't follow latest changes to 'crm_node -l'. Although node seems to be deleted correctly. For reference, output of crm_node -l is: 7 v02-c 8 v02-d 5 v02-a 6 v02-b With latest merge of Andrew's public and private trees and crmsh tip everything works as expected. The only (minor but confusing) issue is: [root@vd01-a ~]# crm_node -l 3 vd01-c 4 vd01-d 1 vd01-a 2 vd01-b [root@vd01-a ~]# crm_node -p vd01-c vd01-a vd01-b [root@vd01-a ~]# crm node delete vd01-d WARNING: crm_node --force -R vd01-d failed, rc=1 Looks like missing crm_exit(pcmk_ok) for -R in try_corosync(). Done. Thanks for testing. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Adding node in advance
Hi, I wanted to add new node into CIB in advance, before it is powered on (to power it on in a standby mode while cl#5169 is not implemented). So, I did == [root@vd01-a tmp]# cat u node $id=4 vd01-d \ attributes standby=on virtualization=true [root@vd01-a tmp]# crm configure load update u ERROR: 4: invalid object id == Exactly the same syntax is accepted for already-known node. this is corosync-2.3.1 with nodelist/udpu, pacemaker master and crmsh tip. Vladislav ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: crm resource restart is broken (d4de3af6dd33)
On 12/07/2013, at 8:23 PM, Lars Marowsky-Bree l...@suse.com wrote: On 2013-07-12T12:19:40, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: BTW: The way resource restart is implemented (i.e.: stop wait, then start) has a major problem: If stop causes to fence the node where the crm command is running, the resource will remain stopped even after the node restarted. Yes. That's a limitation that's difficult to overcome. restart is a multi-phase command, so if something happens to the node where it runs, you have a problem. crm_resource --force-stop -r resource_name and look for the recovery? You can even add -V if you want to go blind But the need for a manual restart should be rare. If the resource is running and healthy according to monitor, why should it be necessary? ;-) (Another way to trigger a restart is to modify the instance parameters. Set __manual_restart=1 and it'll restart.) Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems