Mystery appears to be solved!

The Ethernet card being used for DRBD replication was flaky in the old
secondary. Apparently replication was sometimes going super slow. That
explains why BOTH nodes had the high iowait problem when they were
primary, but NEITHER had high iowait when they were secondary. We're
using Protocol C, so processes on the primary kept queuing up waiting
for io calls to complete because DRBD could not write them to the other
node fast enough. 

It also explains my other question about the resync continually
stalling. Seriously, did NOBODY in the list notice when I said the
resync was going at a max of about 80K? When the NIC issue was resolved,
resyncs now happen at 30,000K. :-)  

I discovered the problem when I rebooted the server again and it said
"PCIe training error, slot3" and the system halted. You guessed it, slot
3 was the NIC doing the replication. I reseated the card and it came up
fine and now replication is fast and I do not expect any more iowaits.
Next task... Replace that NIC.

Thanks for everyone's help and suggestions.

--
Eric Robinson







Disclaimer - February 18, 2011 
This email and any files transmitted with it are confidential and intended 
solely for [email protected]. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physicians' Managed Care or Physician Select Management. 
Warning: Although Physicians' Managed Care or Physician Select Management has 
taken reasonable precautions to ensure no viruses are present in this email, 
the company cannot accept responsibility for any loss or damage arising from 
the use of this email or attachments. 
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to