Hi, 

I'm testing setting up a Xen DomU with a DRBD storage for easy failover. Most 
of the time, immediately after booting the DomU, I get an IO error: 

[ 3.153370] EXT3-fs (xvda2): using internal journal 
[ 3.277115] ip_tables: (C) 2000-2006 Netfilter Core Team 
[ 3.336014] nf_conntrack version 0.5.0 (3899 buckets, 15596 max) 
[ 3.515604] init: failsafe main process (397) killed by TERM signal 
[ 3.801589] blkfront: barrier: write xvda2 op failed 
[ 3.801597] blkfront: xvda2: barrier or flush: disabled 
[ 3.801611] end_request: I/O error, dev xvda2, sector 52171168 
[ 3.801630] end_request: I/O error, dev xvda2, sector 52171168 
[ 3.801642] Buffer I/O error on device xvda2, logical block 6521396 
[ 3.801652] lost page write due to I/O error on xvda2 
[ 3.801755] Aborting journal on device xvda2. 
[ 3.804415] EXT3-fs (xvda2): error: ext3_journal_start_sb: Detected aborted 
journal 
[ 3.804434] EXT3-fs (xvda2): error: remounting filesystem read-only 
[ 3.814754] journal commit I/O error 
[ 6.973831] init: udev-fallback-graphics main process (538) terminated with 
status 1 
[ 6.992267] init: plymouth-splash main process (546) terminated with status 1 

The manpage of drbdsetup says that LVM (which I use) doesn't support barriers 
(better known as "tagged command queuing" or "native command queuing"), so I 
configured the DRBD device not to use barriers. This can be seen in /proc/drbd 
(by "wo:f, meaning flush, the next method drbd chooses after barrier): 

3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r---- 
ns:2160152 nr:520204 dw:2680344 dr:2678107 al:3549 bm:9183 lo:0 pe:0 ua:0 ap:0 
ep:1 wo:f oos:0 

And on the other host: 

3: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r---- 
ns:0 nr:2160152 dw:2160152 dr:0 al:0 bm:8052 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f 
oos:0 

I also enabled the option disable_sendpage, as per the DRBD docs: 

cat /sys/module/drbd/parameters/disable_sendpage 
Y 

I also tried adding barriers=0 to fstab as mount option. Still it says: 

[ 58.603896] blkfront: barrier: write xvda2 op failed 
[ 58.603903] blkfront: xvda2: barrier or flush: disabled 

I don't even know if ext3 has a nobarrier option, but it does seem to work. 
But, because only one of my storage systems is battery backed, it would not be 
smart. 

Why does it still compain about barriers when I disabled that? 

Both hosts are: 

Debian: 6.0.4 
uname -a: Linux 2.6.32-5-xen-amd64 
drbd: 8.3.7 
Xen: 4.0.1 

Guest: 

Ubuntu 12.04 LTS 
uname -a: Linux 3.2.0-24-generic pvops 

drbd resource: 

resource drbdvm 
{ 
meta-disk internal; 
device /dev/drbd3; 

startup 
{ 
# The timeout value when the last known state of the other side was available. 
0 means infinite. 
wfc-timeout 0; 

# Timeout value when the last known state was disconnected. 0 means infinite. 
degr-wfc-timeout 180; 
} 

syncer 
{ 
# This is recommended only for low-bandwidth lines, to only send those 
# blocks which really have changed. 
#csums-alg md5; 

# Set to about half your net speed 
rate 60M; 

# It seems that this option moved to the 'net' section in drbd 8.4. (later 
release than Debian has currently) 
verify-alg md5; 
} 

net 
{ 
# The manpage says this is recommended only in pre-production (because of its 
performance), to determine 
# if your LAN card has a TCP checksum offloading bug. 
#data-integrity-alg md5; 
} 

disk 
{ 
# Detach causes the device to work over-the-network-only after the 
# underlying disk fails. Detach is not default for historical reasons, but is 
# recommended by the docs. 
# However, the Debian defaults in drbd.conf suggest the machine will reboot in 
that event... 
on-io-error detach; 

# LVM doesn't support barriers, so disabling it. It will revert to flush. Check 
wo: in /proc/drbd. If you don't disable it, you get IO errors. 
no-disk-barrier; 
} 

on host1 
{ 
# universe is a VG 
disk /dev/universe/drbdvm-disk; 
address 10.0.0.1:7792; 
} 

on host2 
{ 
# universe is a VG 
disk /dev/universe/drbdvm-disk; 
address 10.0.0.2:7792; 
} 
} 


In my test setup: the primary host's storage is 9650SE SATA-II RAID PCIe with 
battery. The secondary is software RAID1. 


Isn't DRBD+Xen widely used? With these problems, it's not going to work. 


Any help welcome. 

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to