Hi Pasical,
Thanks your reply.
Yes, the network was bad. Master host was dead so that Slave host took over
its work and mount the drbd partition on Slave host. When mounting , the
timeout issued. But the default timeout of network of drdb is 6 senconds (it
can be set in drbd.conf). But it failed to take effect. why?
Do you have a good idea to make it switch immediately in the condition?
Thanks.
Simon
-----原始邮件-----
发件人: "Pascal BERTON" <[email protected]>
发送时间: 2012年8月18日 星期六
收件人: 'simon' <[email protected]>, [email protected]
抄送:
主题: RE: [DRBD-user] Drbd : PingAsk timeout, about 10 mins.
Hi Simon.
AFAIK, the Ping Ack error means your replication network links are either down
or subject to sufficient errors to prevent both nodes to reach each other in a
timely manner. I had the occasion to experience such behavior because of bad
optical fibers for instance, generating huge number of network errors. You also
have “network failure” messages in your logs and it’s “Waiting for connection”.
In your case I’d say the first thing to do is to test this network : Can both
nodes ping each other address on this network ? Does an ifconfig of each
address report errors ? Etc… I bet when your replication network is up again,
your cluster will run fine.
Pascal.
De :[email protected]
[mailto:[email protected]] De la part de simon
Envoyé : samedi 18 août 2012 03:37
À :[email protected]
Objet : [DRBD-user] Drbd : PingAsk timeout, about 10 mins.
Hi all,
I used drbd 8.3.7 on HA. When Master host is dead and HA swatches from Master
to Slave, the drbd can’t switch because it spends 10 minutes to mount its
partition. But the time is timeout to HA.(in HA, default overtime is 2
miniutes).
Why does drbd spent that long time?
The log is:
Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739458] block drbd1: peer( Primary
-> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739468] block drbd1: asender
terminated
Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739470] block drbd1: Terminating
asender thread
Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739526] block drbd1: short read
expecting header on sock: r=-512
Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739666] block drbd1: Connection
closed
Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739672] block drbd1: conn(
NetworkFailure -> Unconnected )
Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739678] block drbd1: receiver
terminated
Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739680] block drbd1: Restarting
receiver thread
Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739683] block drbd1: receiver
(re)started
Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739687] block drbd1: conn(
Unconnected -> WFConnection )
Jul 22 21:06:39 QD-CS-MDC-B pengine: [17776]: info: crm_log_init: Changed
active directory to /usr/var/lib/heartbeat/cores/root
Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.727331] NET: Registered protocol
family 17
Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.768912] block drbd0: role(
Secondary -> Primary )
Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.772742] block drbd1: role(
Secondary -> Primary )
Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.772997] block drbd1: Creating new
current UUID
Jul 22 21:08:47 QD-CS-MDC-B su: (to hitv) root on none
Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032485] block drbd0: PingAck did
not arrive in time.
Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032493] block drbd0: peer( Primary
-> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032503] block drbd0: asender
terminated
Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032506] block drbd0: Terminating
asender thread
Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032514] block drbd0: Creating new
current UUID
Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032567] block drbd0: short read
expecting header on sock: r=-512
Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032868] block drbd0: Connection
closed
Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032875] block drbd0: conn(
NetworkFailure -> Unconnected )
Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032879] block drbd0: receiver
terminated
Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032881] block drbd0: Restarting
receiver thread
Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032884] block drbd0: receiver
(re)started
Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032888] block drbd0: conn(
Unconnected -> WFConnection )
Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.600888] kjournald starting. Commit
interval 15 seconds
Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.600956] EXT3-fs warning: maximal
mount count reached, running e2fsck is recommended
Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601330] EXT3 FS on drbd0, internal
journal
Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601334] EXT3-fs: recovery complete.
Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601392] EXT3-fs: mounted filesystem
with ordered data mode.
According to the log, the timeout is PingAsk operation.
Thanks your help.
simon
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user