Hi.

I have been experiencing same issues on both nodes over the past 2 days (never both nodes at the same time). It seems the issue occurs after some time when copying a large number of files to CephFS on my client node (I dont use RBD yet).

These are new HP servers and the memory does not seem to have any issues in mem test. I use SSD for OS and normal drives for OSD. I think that the issue is not related to drives as it would be too much coincident to have 6 drives with bad blocks on both nodes.

I will also disable the snapshots and report back after few days.

Thx Jiri


On 5/01/2015 01:33, Dyweni - Ceph-Users wrote:


On 2015-01-04 08:21, Jiri Kanicky wrote:

More googling took me to the following post:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-June/040279.html

Linux 3.14.1 is affected by serious Btrfs regression(s) that were fixed in
later releases.

Unfortunately even latest Linux can crash and corrupt Btrfs file system if OSDs are using snapshots (which is the default). Due to kernel bugs related to Btrfs snapshots I also lost some OSDs until I found that snapshotting can be
disabled with "filestore btrfs snap = false".


I am wondering if this can be the problem.



Very interesting... I think I was just hit with that over night. :)

Yes, I would definitely recommend turning off snapshots. I'm going to do that myself now.

Have you tested the memory in your server lately? Memtest86+ on the ram, and badblocks on the SSD swap partition?




_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to