I upgraded to 0.94.1 from 0.94 on Monday, and everything had been going
pretty well.

Then, about noon today, we had an mds crash. And then the failover mds
crashed. And this cascaded through all 4 mds servers we have.

If I try to start it ('service ceph start mds' on CentOS 7.1), it appears
to be OK for a little while. ceph -w goes through 'replay' 'reconnect'
'rejoin' 'clientreplay' and 'active' but nearly immediately after getting
to 'active', it crashes again.

I have the mds log at
http://people.beocat.cis.ksu.edu/~kylehutson/ceph-mds.hobbit01.log

For the possibly, but not necessarily, useful background info.
- Yesterday we took our erasure coded pool and increased both pg_num and
pgp_num from 2048 to 4096. We still have several objects misplaced (~17%),
but those seem to be continuing to clean themselves up.
- We are in the midst of a large (300+ TB) rsync from our old (non-ceph)
filesystem to this filesystem.
- Before we realized the mds crashes, we had just changed the size of our
metadata pool from 2 to 4.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to