Re: [DRBD-user] Drbd : PingAsk timeout, about 10 mins.

roberto . fastec Sun, 19 Aug 2012 00:07:21 -0700

Dear friends

I would like to suggest you to edit your messages before to issuing them to the 
list.


Nowadays emails are often read on mobile devices, such as smartphones and so on.

The editing phase should focus on remove (as example, in this thread), such 
kilometric log text: 
what should be the sense of keeping multiples and multiples repetitions of in 
ALL the replies?

Thank you for understanding my critic that wants to be constructive as much as 
possible.

Kind regards and thank you really much for sharing your experiences.

Robert

Le mail ti raggiungono ovunque con BlackBerry® from Vodafone!

-----Original Message-----
From: [email protected]
Sender: [email protected]
Date: Sat, 18 Aug 2012 16:24:45 
To: <[email protected]>
Reply-To: [email protected]
Subject: drbd-user Digest, Vol 97, Issue 18

Send drbd-user mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        http://lists.linbit.com/mailman/listinfo/drbd-user
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of drbd-user digest..."


Today's Topics:

   1. Re: Drbd : PingAsk timeout, about 10 mins. (Pascal BERTON)
   2. Re: Drbd : PingAsk timeout, about 10 mins. (?? (??))


----------------------------------------------------------------------

Message: 1
Date: Sat, 18 Aug 2012 12:46:01 +0200
From: "Pascal BERTON" <[email protected]>
Subject: Re: [DRBD-user] Drbd : PingAsk timeout, about 10 mins.
To: "'simon'" <[email protected]>,     <[email protected]>
Message-ID: <000f01cd7d2e$a9caa4d0$fd5fee70$@[email protected]>
Content-Type: text/plain; charset="iso-8859-1"

Hi Simon.

 

AFAIK, the Ping Ack error means your replication network links are either
down or subject to sufficient errors to prevent both nodes to reach each
other in a timely manner. I had the occasion to experience such behavior
because of bad optical fibers for instance, generating huge number of
network errors. You also have ?network failure? messages in your logs and
it?s ?Waiting for connection?. In your case I?d say the first thing to do is
to test this network : Can both nodes ping each other address on this
network ? Does an ifconfig of each address report errors ? Etc? I bet when
your replication network is up again, your cluster will run fine.

 

Pascal.

 

De : [email protected]
[mailto:[email protected]] De la part de simon
Envoy? : samedi 18 ao?t 2012 03:37
? : [email protected]
Objet : [DRBD-user] Drbd : PingAsk timeout, about 10 mins.

 

Hi all,

 

I used drbd 8.3.7 on HA. When Master host is dead and HA swatches from
Master to Slave, the drbd can?t switch because it spends 10 minutes to mount
its partition. But the time is timeout to HA.(in HA, default overtime is 2
miniutes).

 

Why does drbd spent that long time? 

 

The log is:

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739458] block drbd1: peer(
Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate ->
DUnknown ) 

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739468] block drbd1: asender
terminated

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739470] block drbd1: Terminating
asender thread

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739526] block drbd1: short read
expecting header on sock: r=-512

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739666] block drbd1: Connection
closed

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739672] block drbd1: conn(
NetworkFailure -> Unconnected ) 

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739678] block drbd1: receiver
terminated

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739680] block drbd1: Restarting
receiver thread

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739683] block drbd1: receiver
(re)started

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739687] block drbd1: conn(
Unconnected -> WFConnection ) 

Jul 22 21:06:39 QD-CS-MDC-B pengine: [17776]: info: crm_log_init: Changed
active directory to /usr/var/lib/heartbeat/cores/root

Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.727331] NET: Registered protocol
family 17

Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.768912] block drbd0: role(
Secondary -> Primary ) 

Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.772742] block drbd1: role(
Secondary -> Primary ) 

Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.772997] block drbd1: Creating
new current UUID

Jul 22 21:08:47 QD-CS-MDC-B su: (to hitv) root on none

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032485] block drbd0: PingAck did
not arrive in time.

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032493] block drbd0: peer(
Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate ->
DUnknown ) 

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032503] block drbd0: asender
terminated

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032506] block drbd0: Terminating
asender thread

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032514] block drbd0: Creating
new current UUID

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032567] block drbd0: short read
expecting header on sock: r=-512

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032868] block drbd0: Connection
closed

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032875] block drbd0: conn(
NetworkFailure -> Unconnected ) 

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032879] block drbd0: receiver
terminated

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032881] block drbd0: Restarting
receiver thread

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032884] block drbd0: receiver
(re)started

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032888] block drbd0: conn(
Unconnected -> WFConnection )

Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.600888] kjournald starting.
Commit interval 15 seconds

Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.600956] EXT3-fs warning: maximal
mount count reached, running e2fsck is recommended

Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601330] EXT3 FS on drbd0,
internal journal

Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601334] EXT3-fs: recovery
complete.

Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601392] EXT3-fs: mounted
filesystem with ordered data mode.                    

                                                          

According to the log, the timeout is PingAsk operation.

 

 

Thanks your help.

          

 
simon

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.linbit.com/pipermail/drbd-user/attachments/20120818/cb6ca975/attachment-0001.htm>

------------------------------

Message: 2
Date: Sat, 18 Aug 2012 22:24:14 +0800 (CST)
From: ??(??) <[email protected]>
Subject: Re: [DRBD-user] Drbd : PingAsk timeout, about 10 mins.
To: "Pascal BERTON" <[email protected]>
Cc: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset="utf-8"

Hi Pasical,

Thanks your reply.

Yes, the network was bad.  Master host was dead so that Slave host took over 
its work and mount the drbd partition on Slave host. When mounting , the 
timeout issued.  But the default timeout of network of drdb is 6 senconds (it 
can be set in drbd.conf). But it failed to take effect. why?

Do you have a good idea to make it switch immediately in the condition? 

Thanks.

                                       Simon 

-----????-----
???: "Pascal BERTON" <[email protected]>
????: 2012?8?18? ???
???: 'simon' <[email protected]>, [email protected]
??:
??: RE: [DRBD-user] Drbd : PingAsk timeout, about 10 mins.



Hi Simon.

 

AFAIK, the Ping Ack error means your replication network links are either down 
or subject to sufficient errors to prevent both nodes to reach each other in a 
timely manner. I had the occasion to experience such behavior because of bad 
optical fibers for instance, generating huge number of network errors. You also 
have ?network failure? messages in your logs and it?s ?Waiting for connection?. 
In your case I?d say the first thing to do is to test this network : Can both 
nodes ping each other address on this network ? Does an ifconfig of each 
address report errors ? Etc? I bet when your replication network is up again, 
your cluster will run fine.

 

Pascal.

 

De :[email protected] 
[mailto:[email protected]] De la part de simon
Envoy? : samedi 18 ao?t 2012 03:37
? :[email protected]
Objet : [DRBD-user] Drbd : PingAsk timeout, about 10 mins.

 

Hi all,

 

I used drbd 8.3.7 on HA. When Master host is dead and HA swatches from Master 
to Slave, the drbd can?t switch because it spends 10 minutes to mount its 
partition. But the time is timeout to HA.(in HA, default overtime is 2 
miniutes).

 

Why does drbd spent that long time?

 

The log is:

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739458] block drbd1: peer( Primary 
-> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739468] block drbd1: asender 
terminated

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739470] block drbd1: Terminating 
asender thread

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739526] block drbd1: short read 
expecting header on sock: r=-512

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739666] block drbd1: Connection 
closed

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739672] block drbd1: conn( 
NetworkFailure -> Unconnected )

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739678] block drbd1: receiver 
terminated

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739680] block drbd1: Restarting 
receiver thread

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739683] block drbd1: receiver 
(re)started

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739687] block drbd1: conn( 
Unconnected -> WFConnection )

Jul 22 21:06:39 QD-CS-MDC-B pengine: [17776]: info: crm_log_init: Changed 
active directory to /usr/var/lib/heartbeat/cores/root

Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.727331] NET: Registered protocol 
family 17

Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.768912] block drbd0: role( 
Secondary -> Primary )

Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.772742] block drbd1: role( 
Secondary -> Primary )

Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.772997] block drbd1: Creating new 
current UUID

Jul 22 21:08:47 QD-CS-MDC-B su: (to hitv) root on none

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032485] block drbd0: PingAck did 
not arrive in time.

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032493] block drbd0: peer( Primary 
-> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032503] block drbd0: asender 
terminated

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032506] block drbd0: Terminating 
asender thread

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032514] block drbd0: Creating new 
current UUID

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032567] block drbd0: short read 
expecting header on sock: r=-512

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032868] block drbd0: Connection 
closed

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032875] block drbd0: conn( 
NetworkFailure -> Unconnected )

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032879] block drbd0: receiver 
terminated

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032881] block drbd0: Restarting 
receiver thread

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032884] block drbd0: receiver 
(re)started

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032888] block drbd0: conn( 
Unconnected -> WFConnection )

Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.600888] kjournald starting.  Commit 
interval 15 seconds

Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.600956] EXT3-fs warning: maximal 
mount count reached, running e2fsck is recommended

Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601330] EXT3 FS on drbd0, internal 
journal

Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601334] EXT3-fs: recovery complete.

Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601392] EXT3-fs: mounted filesystem 
with ordered data mode.                   

                                                          

According to the log, the timeout is PingAsk operation.

 

 

Thanks your help.

         

                                                                              
simon

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.linbit.com/pipermail/drbd-user/attachments/20120818/c5f788f1/attachment.htm>

------------------------------

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user


End of drbd-user Digest, Vol 97, Issue 18
*****************************************
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Drbd : PingAsk timeout, about 10 mins.

Reply via email to