hi,

On 02/21/2011 10:36 AM, Lars Ellenberg wrote:
> Fix your fence-peer helper,
> that may be the cause of trouble there.

which actuall is 'your' fence-peer helper, right? :)

Feb 16 03:13:45 c02n01 kernel: [3675911.371516] block drbd0: updated
UUIDs A9AE9E56A0D5D66F:0000000000000000:3E9700A8847A37AD:3E9600A8847A37AD
Feb 16 03:13:45 c02n01 kernel: [3675911.371635] block drbd0: conn(
SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
Feb 16 03:13:45 c02n01 kernel: [3675911.505550] block drbd0: bitmap
WRITE of 3050 pages took 34 jiffies
Feb 16 03:13:45 c02n01 kernel: [3675911.505615] block drbd0: 0 KB (0
bits) marked out-of-sync by on disk bit-map.
Feb 16 03:13:45 c02n01 cibadmin: [14957]: info: Invoked: cibadmin -Q -t 1
Feb 16 03:13:45 c02n01 crm-fence-peer.sh[14918]: WARNING peer is
Secondary, did not place the constraint!
Feb 16 03:13:45 c02n01 kernel: [3675912.019501] block drbd0: helper
command: /sbin/drbdadm fence-peer minor-0 exit code 1 (0x100)
Feb 16 03:13:45 c02n01 kernel: [3675912.019622] block drbd0: fence-peer
helper broken, returned 1
Feb 16 03:13:45 c02n01 kernel: [3675912.019687] block drbd0: pdsk(
UpToDate -> DUnknown )
Feb 16 03:13:45 c02n01 kernel: [3675912.019768] block drbd0: new current
UUID 6798C570121477F1:A9AE9E56A0D5D66F:3E9700A8847A37AD:3E9600A8847A37AD

thus, basically coming back to [1] where florian asks:
> Look at your paste. You have no node where DRBD is Secondary. What do
> you expect the agent to do? 

(i know, i talked about the agent in this email. but the the agent and
crm-fence-peer.sh are closely tied, aren't they?)

looking at crm-fence-peer.sh's source, i see:
>         Secondary|Primary)
>                 # WTF? We are supposed to fence the peer,
>                 # but the replication link is just fine?
>                 echo WARNING "peer is $DRBD_peer, did not place the 
> constraint!"
>                 rc=0
>                 return
>                 ;;
>         esac

so, this should actually be obsoleted by fixing the following bug,
right?

on the other hand, what's wrong in trying to disconnect and reconnect
the resources and see what happens? (e.g. via a tiny contraint that is
only valid for PT1M?

> Feb 16 06:25:04 c02n01 kernel: [3687390.947555] block drbd1: pdsk( UpToDate 
> -> DUnknown )
> 
> This should not have happened, either:
> We must not change the pdsk state to DUnknown while keeping conn state at 
> Connected.
> That's nonsense.
> 
> Feb 16 06:25:04 c02n01 kernel: [3687390.947633] block drbd1: new current UUID 
> 89084B22FE454C03:3C1DADF6B38C1AD7:E7E50184F3F3AC0B:E7E40184F3F3AC0B 

please let me know if you need any further input from my side.

thanks,
raoul

[1] http://www.gossamer-threads.com/lists/drbd/users/20605#20605
-- 
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc.          email.          r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG          web.          http://www.ipax.at
Barawitzkagasse 10/2/2/11           email.            off...@ipax.at
1190 Wien                           tel.               +43 1 3670030
FN 277995t HG Wien                  fax.            +43 1 3670030 15
____________________________________________________________________
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to