Re: domino-style OSD crash
Le 06/07/2012 19:01, Gregory Farnum a écrit : On Fri, Jul 6, 2012 at 12:19 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: Le 05/07/2012 23:32, Gregory Farnum a écrit : [...] ok, so as all nodes were identical, I probably have hit a btrfs bug (like a erroneous out of space ) in more or less the same time. And when 1 osd was out, OH , I didn't finish the sentence... When 1 osd was out, missing data was copied on another nodes, probably speeding btrfs problem on those nodes (I suspect erroneous out of space conditions) Ah. How full are/were the disks? The OSD nodes were below 50 % (all are 5 To volumes): osd.0 : 31% osd.1 : 31% osd.2 : 39% osd.3 : 65% no osd.4 :) osd.5 : 35% osd.6 : 60% osd.7 : 42% osd.8 : 34% all the volumes were using btrfs with lzo compress. [...] Oh, interesting. Are the broken nodes all on the same set of arrays? No. There are 4 completely independant raid arrays, in 4 different locations. They are similar (same brand model, but slighltly different disks, and 1 different firmware), all arrays are multipathed. I don't think the raid array is the problem. We use those particular models since 2/3 years, and in the logs I don't see any problem that can be caused by the storage itself (like scsi or multipath errors) I must have misunderstood then. What did you mean by 1 Array for 2 OSD nodes? I have 8 osd nodes, in 4 different locations (several km away). In each location I have 2 nodes and 1 raid Array. On each location, each raid array has 16 2To disks, 2 controllers with 4x 8 Gb FC channels each. The 16 disks are organized in Raid 5 (8 disks for one, 7 disks for the orher). Each raid set is primary attached to 1 controller, and each osd node on the location has acces to the controller with 2 distinct paths. There were no correlation between failed nodes raid array. Cheers, -- Yann Dupont - Service IRTS, DSI Université de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : yann.dup...@univ-nantes.fr -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mds fails to start on SL6
Hi Greg On Fri, Jul 6, 2012 at 5:38 PM, Gregory Farnum g...@inktank.com wrote: Do you have more in the log? It looks like it's being instructed to shut down before it's fully come up (thus the error in the Objecter http://tracker.newdream.net/issues/2740, but is not the root cause), but I can't see why. -Greg That's all that appeared in the log, if you need additional information I was following a wiki page from http://ceph.com/wiki/Installing_on_RedHat_or_CentOS to do the installation and also made some updated notes of what I did to install it which I can provide if you need it (they are updated installation notes for the mentioned wiki page which others will find useful anyway). I also had followed the 5min quick guide to testing out ceph on a single machine. Unfortunately I'm going to be away for the next week or two so I won't be able to do much for a few weeks on this. Jimmy -- Senior Software Engineer, Digital Repository of Ireland (DRI) Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. http://www.tchpc.tcd.ie/ -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: mkcephfs failing on v0.48 argonaut
Hi Sage Confirming running the commands from a root prompt in the same sequence as requested: mkfs.btrfs /dev/sdc btrfs device scan mount /dev/sdc /srv/osd.0 root@dsanb1-coy:~# mkfs.btrfs /dev/sdc WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using fs created label (null) on /dev/sdc nodesize 4096 leafsize 4096 sectorsize 4096 size 1.82TB Btrfs Btrfs v0.19 root@dsanb1-coy:~# btrfs device scan Scanning for Btrfs filesystems root@dsanb1-coy:~# mount /dev/sdc /srv/osd.0 mount: wrong fs type, bad option, bad superblock on /dev/sdc, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so root@dsanb1-coy:~# mount | grep btrfs root@dsanb1-coy:~# mount -t btrfs /dev/sdc /srv/osd.0 root@dsanb1-coy:~# mount | grep btrfs /dev/sdc on /srv/osd.0 type btrfs (rw) So - you can see I had to explicitly set the flag to mount -t btrfs. It seems that when mkcepfs is running the command mount -t btrfs -o noatime /dev/sdc /srv/osd.0 it is doing so with the effect that the -t btrfs was not present in the line, but it is (symptom the same at least). Crazy. Secondly, as requested I cannot run the command sh -x /sbin/mkcephfs -d /tmp/mkcephfs.xgk025tjkQ --prepare-osdfs osd.0 because the contents of the /tmp/ directory are deleted each time mkcephfs finishes its run. I have however called the overall mkcephfs command with sh -x so you can see what is occurring. See below: root@dsanb1-coy:~# sh -x /sbin/mkcephfs -c /etc/ceph/ceph.conf --allhosts --mkbtrfs -k /etc/ceph/keyring --crushmapsrc crushfile.txt -v SNIP === osd.0 === + return 0 + [ -n ] + rdir=/tmp/mkcephfs.kJjIwsEnfZ + [ 0 -eq 0 ] + cp /tmp/mkcephfs.kJjIwsEnfZ/conf /etc/ceph/ceph.conf + [ 1 -eq 1 ] + [ osd = osd ] + do_root_cmd /sbin/mkcephfs -d /tmp/mkcephfs.kJjIwsEnfZ --prepare-osdfs osd.0 + [ -z ] + [ 1 -eq 1 ] + echo --- dsanb1-coy# /sbin/mkcephfs -d /tmp/mkcephfs.kJjIwsEnfZ --prepare-osdfs osd.0 --- dsanb1-coy# /sbin/mkcephfs -d /tmp/mkcephfs.kJjIwsEnfZ --prepare-osdfs osd.0 + ulimit -c unlimited + whoami + whoami=root + [ root = root ] + bash -c /sbin/mkcephfs -d /tmp/mkcephfs.kJjIwsEnfZ --prepare-osdfs osd.0 umount: /srv/osd.0: not mounted umount: /dev/sdc: not mounted /sbin/mkfs.btrfs RUNNING: mkfs.btrfs /dev/sdc WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using fs created label (null) on /dev/sdc nodesize 4096 leafsize 4096 sectorsize 4096 size 1.82TB Btrfs Btrfs v0.19 Scanning for Btrfs filesystems /bin/mount RUNNING: mount -t btrfs -o noatime /dev/sdc /srv/osd.0 mount: wrong fs type, bad option, bad superblock on /dev/sdc, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so + echo failed: '/sbin/mkcephfs -d /tmp/mkcephfs.kJjIwsEnfZ --prepare-osdfs osd.0' failed: '/sbin/mkcephfs -d /tmp/mkcephfs.kJjIwsEnfZ --prepare-osdfs osd.0' + exit 1 + rm -rf /tmp/mkcephfs.kJjIwsEnfZ + exit -Original Message- From: Sage Weil [mailto:s...@inktank.com] Sent: Saturday, 7 July 2012 2:20 PM To: Paul Pettigrew Cc: ceph-devel@vger.kernel.org Subject: RE: mkcephfs failing on v0.48 argonaut On Sat, 7 Jul 2012, Paul Pettigrew wrote: Hi again Sage This is very perplexing. Confirming this system is a stock Ubuntu 12.04 x64, with no custom kernel or anything else, fully apt-get dist-upgrade'd up to date. root@dsanb1-coy:~# uname -r 3.2.0-26-generic I have added in the suggestions you made to the script, we now have: modprobe btrfs || true which mkfs.btrfs echo RUNNING: mkfs.btrfs $btrfs_devs mkfs.btrfs $btrfs_devs btrfs device scan || btrfsctl -a which mount echo RUNNING: mount -t btrfs $btrfs_opt $first_dev $btrfs_path mount -t btrfs $btrfs_opt $first_dev $btrfs_path echo DID I GET HERE - OR CRASH OUT WITH mount ABOVE? chown $osd_user $btrfs_path See below that the same command within the mkcephfs that is failing, is working fine on a standard command line: Weirdness! === osd.0 === --- dsanb1-coy# /sbin/mkcephfs -d /tmp/mkcephfs.xgk025tjkQ --prepare-osdfs osd.0 Can you run this command with -x to see exactly what bash is doing? sh -x /sbin/mkcephfs -d /tmp/mkcephfs.xgk025tjkQ --prepare-osdfs osd.0 In particular, I'm curious if you do mkfs.btrfs /dev/sdc btrfs device scan mount /dev/sdc /srv/osd.0 (or whatever the exact sequence that mkcephfs does is) from the command line, does it give you the same error? sage umount: /srv/osd.0: not mounted umount: /dev/sdc: not mounted /sbin/mkfs.btrfs RUNNING: mkfs.btrfs /dev/sdc WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using fs created label (null) on /dev/sdc nodesize 4096 leafsize 4096 sectorsize 4096 size 1.82TB Btrfs Btrfs v0.19 Scanning for Btrfs filesystems /bin/mount