Re: [CentOS] Very unresponsive, sometimes stalling domU (5.4, x86_64)

2010-03-02 Thread Pasi Kärkkäinen
On Tue, Mar 02, 2010 at 09:30:50AM +0100, Timo Schoeler wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Hi list,
> 
> please forgive cross posting, but I cannot specify the problem enough to
> say whether list it fits perfectly, so I'll ask on both.
> 
> I have some machines based with following specs (see at the end of the
> email).
> 
> They run CentOS 5.4 x86_64 with the latest patches applied, Xen-enabled
> and should host one or more domUs. I put the domUs' storage on LVM, as I
> learnt ages ago (what never caused any problems) and is way faster than
> using file-based 'images'.
> 
> However, there's something special about these machines: They have the
> new WD EARS series drives, which use 4K sector sizes. So, I booted a
> rescue system and used fdisk to start at sector 64 instead of 63 (long
> story made short: Due to overhead causing the drive to do much more,
> inefficient writes when starting at sector 63, the performance
> collapses; with 'normal' geometry (sector 63), the drive achieves about
> 25MiByte/sec writes, with starting at sector 64 partition, it achieves
> almost 100MiByte/sec writes):
> 
> [r...@server2 ~]# fdisk -ul /dev/sda
> 
> Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
> Units = sectors of 1 * 512 = 512 bytes
> 
>Device Boot  Start End  Blocks   Id  System
> /dev/sda1   *  64 2097223 1048580   fd  Linux raid
> autodetect
> Partition 1 does not end on cylinder boundary.
> /dev/sda2 209722418876487 8389632   82  Linux swap / Solaris
> /dev/sda318876488  1953525167   967324340   fd  Linux raid
> autodetect
> 
> On top of those (two per machine) WD EARS HDs there's ``md'' providing
> two RAID1, /boot and LVM, as well as swap per HD (i.e. non-RAIDed). LVM
> provides the / partition as well as LVs for Xen domUs.
> 
> I have about 60 machines running that style and never had any problems.
> They run like a charm. On these machines, however, domUs are *very*
> slow, have a steady (!) load of about two -- 50% stating in 'wait' --
> and all operations take ages, e.g. a ``yum update'' with the recently
> released updates.
> 
> Now, can that be due to 4K issues I didn't see, nestet now in LVM?
> 
> Help is very appreciated.
> 

Maybe the default LVM alignment is wrong for these drives.. 
did you check/verify that? 

See:
http://thunk.org/tytso/blog/2009/02/20/aligning-filesystems-to-an-ssds-erase-block-size/

Especially the "--metadatasize" option.

-- Pasi

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] Very unresponsive, sometimes stalling domU (5.4, x86_64)

2010-03-02 Thread Timo Schoeler
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi list,

please forgive cross posting, but I cannot specify the problem enough to
say whether list it fits perfectly, so I'll ask on both.

I have some machines based with following specs (see at the end of the
email).

They run CentOS 5.4 x86_64 with the latest patches applied, Xen-enabled
and should host one or more domUs. I put the domUs' storage on LVM, as I
learnt ages ago (what never caused any problems) and is way faster than
using file-based 'images'.

However, there's something special about these machines: They have the
new WD EARS series drives, which use 4K sector sizes. So, I booted a
rescue system and used fdisk to start at sector 64 instead of 63 (long
story made short: Due to overhead causing the drive to do much more,
inefficient writes when starting at sector 63, the performance
collapses; with 'normal' geometry (sector 63), the drive achieves about
25MiByte/sec writes, with starting at sector 64 partition, it achieves
almost 100MiByte/sec writes):

[r...@server2 ~]# fdisk -ul /dev/sda

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot  Start End  Blocks   Id  System
/dev/sda1   *  64 2097223 1048580   fd  Linux raid
autodetect
Partition 1 does not end on cylinder boundary.
/dev/sda2 209722418876487 8389632   82  Linux swap / Solaris
/dev/sda318876488  1953525167   967324340   fd  Linux raid
autodetect

On top of those (two per machine) WD EARS HDs there's ``md'' providing
two RAID1, /boot and LVM, as well as swap per HD (i.e. non-RAIDed). LVM
provides the / partition as well as LVs for Xen domUs.

I have about 60 machines running that style and never had any problems.
They run like a charm. On these machines, however, domUs are *very*
slow, have a steady (!) load of about two -- 50% stating in 'wait' --
and all operations take ages, e.g. a ``yum update'' with the recently
released updates.

Now, can that be due to 4K issues I didn't see, nestet now in LVM?

Help is very appreciated.

Cheers,

Timo

- ---

Linux server2.blah.org 2.6.18-164.11.1.el5xen #1 SMP Wed Jan 20 08:06:04
EST 2010 x86_64 x86_64 x86_64 GNU/Linux

- ---

[r...@server2 ~]# cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Core(TM)2 Quad CPUQ9400  @ 2.66GHz
stepping: 10
cpu MHz : 1998.000
cache size  : 3072 KB
physical id : 0
siblings: 1
core id : 0
cpu cores   : 1
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36
clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc
pni monitor ds_cpl vmx smx est tm2 cx16 xtpr lahf_lm
bogomips: 6668.58
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Core(TM)2 Quad CPUQ9400  @ 2.66GHz
stepping: 10
cpu MHz : 1998.000
cache size  : 3072 KB
physical id : 1
siblings: 1
core id : 0
cpu cores   : 1
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36
clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc
pni monitor ds_cpl vmx smx est tm2 cx16 xtpr lahf_lm
bogomips: 6668.58
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Core(TM)2 Quad CPUQ9400  @ 2.66GHz
stepping: 10
cpu MHz : 1998.000
cache size  : 3072 KB
physical id : 2
siblings: 1
core id : 0
cpu cores   : 1
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36
clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc
pni monitor ds_cpl vmx smx est tm2 cx16 xtpr lahf_lm
bogomips: 6668.58
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Core(TM)2 Quad CPUQ9400  @ 2.66GHz
stepping: 10
cpu MHz : 1998.000
cache size  : 3072 KB
physical id : 3
siblings: 1
core id : 0
cpu cores   : 1
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu tsc msr pae mce cx8 apic mtrr mca cmo