On 09/07/2013, at 10:29 PM, Martin Gazak <martin.ga...@microstep-mis.sk> wrote:
> Dňa 7/9/2013 12:56 PM Andrew Beekhof wrote / napísal(a): >> >> On 09/07/2013, at 8:49 PM, Martin Gazak <martin.ga...@microstep-mis.sk> >> wrote: >> >>> Dňa 7/9/2013 12:42 PM Andrew Beekhof wrote / napísal(a): >>>> >>>> On 09/07/2013, at 5:05 PM, Martin Gazak <martin.ga...@microstep-mis.sk> >>>> wrote: >> >> It looks to be a bug in 1.1.7, you'll want to contact SUSE so they can get >> the fix from upstream. > > Dear Andrew, > thanks for your effort. > > May I have 3 questions: > > - what version did you use to detect a bug ? - you labeled it just > "current version" ? 1.1.10-rc6 > > - we have downloaded corosync SuSE packages 1.1.8 and 1.1.9 - could you > please confirm one (or both) SuSE versions have this bug fixed ? I have no idea. If you install them and run: crm_simulate -Sx /var/lib/pengine/pe-input-2819.bz2 and it returns the same as what I got, then its fixed. > Or you need the package itself as attachment to inspect it ? > Or is there a way how to check our package has the bug fixed ? > > - we are going to test the package 1.1.9 anyway with the stress tests. > As I wrote you, such situation happened extremely rarely on the testing > cluster (however often enough to make troubles in production environment). > Do you have any idea how to reproduce this situation in a deterministic > way ? It might be a timing issue. > Just blind killing of master instance of the application from cron does > not help - the system survived correct 70+ failovers over the weekend. > > Best regards > > Martin Gazak > > >> >> Your version: >> >> Jul 04 23:45:02 ims0 pengine: [3933]: WARN: unpack_rsc_op: Processing failed >> op ims:0_last_failure_0 on ims0: not running (7) >> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Recover ims:0 >> (Master ims0) >> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Restart ims-ip >> (Started ims0) >> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Restart ims-ip-src >> (Started ims0) >> Jul 04 23:45:02 ims0 pengine: [3933]: notice: process_pe_message: Transition >> 4036: PEngine Input stored in: /var/lib/pengine/pe-input-2819.bz2 >> >> vs. the current version: >> >> notice: LogActions: Demote ims:0 (Master -> Stopped ims0) >> notice: LogActions: Promote ims:1 (Slave -> Master ims1) >> notice: LogActions: Start ims-ip (ims1) >> notice: LogActions: Start ims-ip-src (ims1) >> >> and >> >> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Recover ims:0 >> (Master ims0) >> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Restart ims-ip >> (Started ims0) >> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Start ims-ip-src >> (ims0) >> Jul 04 23:45:02 ims0 pengine: [3933]: notice: process_pe_message: Transition >> 4037: PEngine Input stored in: /var/lib/pengine/pe-input-2820.bz2 >> >> >> vs. the current version: >> >> notice: LogActions: Demote ims:0 (Master -> Stopped ims0) >> notice: LogActions: Promote ims:1 (Slave -> Master ims1) >> notice: LogActions: Start ims-ip (ims1) >> notice: LogActions: Start ims-ip-src (ims1) >> > > > -- > > Regards, > > Martin Gazak > MicroStep-MIS, spol. s r.o. > System Development Manager > Tel.: +421 2 602 00 128 > Fax: +421 2 602 00 180 > martin.ga...@microstep-mis.sk > http://www.microstep-mis.com _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org