Hi Simon.

 

AFAIK, the Ping Ack error means your replication network links are either
down or subject to sufficient errors to prevent both nodes to reach each
other in a timely manner. I had the occasion to experience such behavior
because of bad optical fibers for instance, generating huge number of
network errors. You also have “network failure” messages in your logs and
it’s “Waiting for connection”. In your case I’d say the first thing to do is
to test this network : Can both nodes ping each other address on this
network ? Does an ifconfig of each address report errors ? Etc… I bet when
your replication network is up again, your cluster will run fine.

 

Pascal.

 

De : drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] De la part de simon
Envoyé : samedi 18 août 2012 03:37
À : drbd-user@lists.linbit.com
Objet : [DRBD-user] Drbd : PingAsk timeout, about 10 mins.

 

Hi all,

 

I used drbd 8.3.7 on HA. When Master host is dead and HA swatches from
Master to Slave, the drbd can’t switch because it spends 10 minutes to mount
its partition. But the time is timeout to HA.(in HA, default overtime is 2
miniutes).

 

Why does drbd spent that long time? 

 

The log is:

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739458] block drbd1: peer(
Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate ->
DUnknown ) 

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739468] block drbd1: asender
terminated

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739470] block drbd1: Terminating
asender thread

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739526] block drbd1: short read
expecting header on sock: r=-512

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739666] block drbd1: Connection
closed

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739672] block drbd1: conn(
NetworkFailure -> Unconnected ) 

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739678] block drbd1: receiver
terminated

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739680] block drbd1: Restarting
receiver thread

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739683] block drbd1: receiver
(re)started

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739687] block drbd1: conn(
Unconnected -> WFConnection ) 

Jul 22 21:06:39 QD-CS-MDC-B pengine: [17776]: info: crm_log_init: Changed
active directory to /usr/var/lib/heartbeat/cores/root

Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.727331] NET: Registered protocol
family 17

Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.768912] block drbd0: role(
Secondary -> Primary ) 

Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.772742] block drbd1: role(
Secondary -> Primary ) 

Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.772997] block drbd1: Creating
new current UUID

Jul 22 21:08:47 QD-CS-MDC-B su: (to hitv) root on none

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032485] block drbd0: PingAck did
not arrive in time.

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032493] block drbd0: peer(
Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate ->
DUnknown ) 

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032503] block drbd0: asender
terminated

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032506] block drbd0: Terminating
asender thread

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032514] block drbd0: Creating
new current UUID

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032567] block drbd0: short read
expecting header on sock: r=-512

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032868] block drbd0: Connection
closed

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032875] block drbd0: conn(
NetworkFailure -> Unconnected ) 

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032879] block drbd0: receiver
terminated

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032881] block drbd0: Restarting
receiver thread

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032884] block drbd0: receiver
(re)started

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032888] block drbd0: conn(
Unconnected -> WFConnection )

Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.600888] kjournald starting.
Commit interval 15 seconds

Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.600956] EXT3-fs warning: maximal
mount count reached, running e2fsck is recommended

Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601330] EXT3 FS on drbd0,
internal journal

Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601334] EXT3-fs: recovery
complete.

Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601392] EXT3-fs: mounted
filesystem with ordered data mode.                    

                                                          

According to the log, the timeout is PingAsk operation.

 

 

Thanks your help.

          

 
simon

 

 

 

_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to