Re: [linux-lvm] cache on SSD makes system unresponsive

lejeczek Tue, 24 Oct 2017 07:52:49 -0700


On 19/10/17 18:54, Oleg Cherkasov wrote:

Hi,
Recently I have decided to try out LVM cache feature onone of our Dell NX3100 servers running CentOS 7.4.1708with 110Tb disk array (hardware RAID5 with H710 and H830Dell adapters). Two SSD disks each 256Gb are in hardwareRAID1 using H710 adapter with primary and extendedpartitions so I decided to make ~240Gb LVM cache to see ifsystem I/O may be improved. The server is running Bareosstorage daemon and beside sshd and Dell OpenManagemonitoring does not have any other services. Unfortunatelytesting went not as I expected nonetheless at the endsystem is up and running with no data corrupted.
Initially I have tried the default writethrough mode andafter running dd reading test with 250Gb file got systemunresponsive for roughly 15min with cache allocationaround 50%. Writing to disks it seems speed up the systemhowever marginally, so around 10% on my tests and I didmanage to pull more than 32Tb via backup from differenthosts and once system became unresponsive to ssh and icmprequests however for a very short time.
I though it may be something with cache mode so switchedto writeback via lvconvert and run dd reading test againwith 250Gb file however that time everything wentcompletely unexpected. System started to slow respondingfor simple user interactions like list files and run top.And then became completely unresponsive for about half anhours. Switching to main console via iLO I saw a lot ofOOM messages and kernel tried to survive thereforerandomly killed almost all processes. Eventually I didmanage to reboot and immediately uncached the array.
My question is about very strange behavior of LVM cache. Well, I may expect no performance boost or even I/Odegradation however I do not expect run out of memory andthan OOM kicks in. That server has only 12Gb RAM howeverit does run only sshd, bareos SD daemon and OpenManangejava based monitoring system so no RAM problems werenotices for last few years running with our LVM cache.
Any ideas what may be wrong? I have second NX3200 serverwith similar hardware setup and it would be switch toFreeBSD 11.1 with ZFS very time soon however I may try toinstall CentOS 7.4 first and see if the problem may bereproduced.
LVM2 installed is version lvm2-2.02.171-8.el7.x86_64.


Thank you!
Oleg

I realized that same day I replied, mailman disabled mysubscription, so in case it did not get through, again:

hi

not much of an explanation nor insight as to what might begoing wrong with your setup/system but, instead my ownconclusions/suggestions as a result of bits of myexperience, I will share...

I would - if bigger part of a storage subsystem resides inthe hardware - stick to the hardware, use CacheCade, let thehardware do the lot.

On LVM - similarly, stick to LVM, let LVM manage the wholelot (you will loose ~50% of a single average core(opteron6376) with raid5). Use the simplest HBAs(dell have such), noraid, not even JBOD. If disks are in same one enclosure, orsimply under same one HBA(even though it's just a HBA) - do*not *mix SATA & SAS(it may work, but better not, from myexperience)

Last one, keep that freaking firmware updated, everywherepossible, disks too(my latest experience with Seagate 2TBSAS, over hundred of those in two enclosures - I cannot,update does not work - Seagate's off the website techsupport => useless = stay away from Seagate.)

I'll keep my fingers crossed for you - On luck - never toomuch of it.


_______________________________________________
linux-lvm mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


_______________________________________________
linux-lvm mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] cache on SSD makes system unresponsive

Reply via email to