rbd map fail when the crushmap algorithm changed to tree
Hi all: Here is the original crushmap, I change the algorithm of host to tree and set back to ceph cluster. However, when I try to map one imge to rados block device (RBD), it would hang and no response until I press ctrl-c. ( rbd map = then hang) Is there any wrong in the crushmap? Thanks for help. = # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 device 5 osd.5 device 6 osd.6 device 7 osd.7 device 8 osd.8 device 9 osd.9 device 10 osd.10 device 11 osd.11 # types type 0 osd type 1 host type 2 rack type 3 row type 4 room type 5 datacenter type 6 pool # buckets host store-001 { id -2 # do not change unnecessarily # weight 12.000 alg straw hash 0 # rjenkins1 item osd.0 weight 1.000 item osd.1 weight 1.000 item osd.10 weight 1.000 item osd.11 weight 1.000 item osd.2 weight 1.000 item osd.3 weight 1.000 item osd.4 weight 1.000 item osd.5 weight 1.000 item osd.6 weight 1.000 item osd.7 weight 1.000 item osd.8 weight 1.000 item osd.9 weight 1.000 } rack unknownrack { id -3 # do not change unnecessarily # weight 12.000 alg straw hash 0 # rjenkins1 item store-001 weight 12.000 } pool default { id -1 # do not change unnecessarily # weight 12.000 alg straw hash 0 # rjenkins1 item unknownrack weight 12.000 } # rules rule data { ruleset 0 type replicated min_size 1 max_size 10 step take default step choose firstn 0 type osd step emit } rule metadata { ruleset 1 type replicated min_size 1 max_size 10 step take default step choose firstn 0 type osd step emit } rule rbd { ruleset 2 type replicated min_size 1 max_size 10 step take default step choose firstn 0 type osd step emit } -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
Le 05/07/2012 23:32, Gregory Farnum a écrit : [...] ok, so as all nodes were identical, I probably have hit a btrfs bug (like a erroneous out of space ) in more or less the same time. And when 1 osd was out, OH , I didn't finish the sentence... When 1 osd was out, missing data was copied on another nodes, probably speeding btrfs problem on those nodes (I suspect erroneous out of space conditions) I've reformatted OSD with xfs. Performance is slightly worse for the moment (well, depend on the workload, and maybe lack of syncfs is to blame), but at least I hope to have the storage layer rock-solid. BTW, I've managed to keep the faulty btrfs volumes . [...] I wonder if maybe there's a confounding factor here — are all your nodes similar to each other, Yes. I designed the cluster that way. All nodes are identical hardware (powerEdge M610, 10G intel ethernet + emulex fibre channel attached to storage (1 Array for 2 OSD nodes, 1 controller dedicated for each OSD) Oh, interesting. Are the broken nodes all on the same set of arrays? No. There are 4 completely independant raid arrays, in 4 different locations. They are similar (same brand model, but slighltly different disks, and 1 different firmware), all arrays are multipathed. I don't think the raid array is the problem. We use those particular models since 2/3 years, and in the logs I don't see any problem that can be caused by the storage itself (like scsi or multipath errors) Cheers, -- Yann Dupont - Service IRTS, DSI Université de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : yann.dup...@univ-nantes.fr -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD doesn't start
On 2012. July 5. 16:12:42 Székelyi Szabolcs wrote: On 2012. July 4. 09:34:04 Gregory Farnum wrote: Hrm, it looks like the OSD data directory got a little busted somehow. How did you perform your upgrade? (That is, how did you kill your daemons, in what order, and when did you bring them back up.) Since it would be hard and long to describe in text, I've collected the relevant log entries, sorted by time at http://pastebin.com/Ev3M4DQ9 . The short story is that after seeing that the OSDs won't start, I tried to bring down the whole cluster and start it up from scratch. It didn't change anything, so I rebooted the two machines (running all three daemons), to see if it changes anything. It didn't and I gave up. My ceph config is available at http://pastebin.com/KKNjmiWM . Since this is my test cluster, I'm not very concerned about the data on it. But the other one, with the same config, is dying I think. ceph-fuse is eating around 75% CPU on the sole monitor (cc) node. The monitor about 15%. On the other two nodes, the OSD eats around 50%, the MDS 15%, the monitor another 10%. No Ceph filesystem activity is going on at the moment. Blktrace reports about 1kB/s disk traffic on the partition hosting the OSD data dir. The data seems to be accessible at the moment, but I'm afraid that my production cluster will end up in a similar situation after upgrade, so I don't dare to touch it. Do you have any suggestion what I should check? Yes, it definitely looks like dying. Besides the above symptoms all clients' ceph-fuse burn the CPU, there are unreadable files on the fs (tar blocks on them infinitely), the FUSE clients emit messages like ceph-fuse: 2012-07-05 23:21:41.583692 7f444dfd5700 0 -- client_ip:0/1181 send_message dropped message ping v1 because of no pipe on con 0x1034000 every 5 seconds. I tried to backup the data on it, but it got blocked in the middle. Since then I'm unable to get any data out of it, not even by killing ceph-fuse and remounting the fs. -- cc -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: speedup ceph / scaling / find the bottleneck
Am 06.07.2012 um 05:50 schrieb Alexandre DERUMIER aderum...@odiso.com: Hi, Stefan is on vacation for the moment,I don't know if he can reply you. Thanks! But I can reoly for him for the kvm part (as we do same tests together in parallel). - kvm is 1.1 - rbd 0.48 - drive option rbd:pool/volume:auth_supported=cephx;none;keyring=/etc/pve/priv/ceph/ceph.keyring:mon_host=X.X.X.X; -using writeback writeback tuning in ceph.conf on the kvm host rbd_cache_size = 33554432 rbd_cache_max_age = 2.0 Correct benchmark use in kvm guest: fio --filename=$DISK --direct=1 --rw=randwrite --bs=4k --size=200G --numjobs=50 --runtime=90 --group_reporting --name=file1 results show max 14000io/s with 1 vm, 7000io/s by vm with 2vm,... so it doesn't scale Correct too (bench is with directio, so maybe writeback cache don't help) hardware for ceph , is 3 nodes with 4 intel ssd each. (1 drive can handle 4io/s randwrite locally) 3 but still enough Stefan - Alexandre - Mail original - De: Gregory Farnum g...@inktank.com À: Stefan Priebe s.pri...@profihost.ag Cc: ceph-devel@vger.kernel.org, Sage Weil s...@inktank.com Envoyé: Jeudi 5 Juillet 2012 23:33:18 Objet: Re: speedup ceph / scaling / find the bottleneck Could you send over the ceph.conf on your KVM host, as well as how you're configuring KVM to use rbd? On Tue, Jul 3, 2012 at 11:20 AM, Stefan Priebe s.pri...@profihost.ag wrote: I'm sorry but this is the KVM Host Machine there is no ceph running on this machine. If i change the admin socket to: admin_socket=/var/run/ceph_$name.sock i don't have any socket at all ;-( Am 03.07.2012 17:31, schrieb Sage Weil: On Tue, 3 Jul 2012, Stefan Priebe - Profihost AG wrote: Hello, Am 02.07.2012 22:30, schrieb Josh Durgin: If you add admin_socket=/path/to/admin_socket for your client running qemu (in that client's ceph.conf section or manually in the qemu command line) you can check that caching is enabled: ceph --admin-daemon /path/to/admin_socket show config | grep rbd_cache And see statistics it generates (look for cache) with: ceph --admin-daemon /path/to/admin_socket perfcounters_dump This doesn't work for me: ceph --admin-daemon /var/run/ceph.sock show config read only got 0 bytes of 4 expected for response length; invalid command?2012-07-03 09:46:57.931821 7fa75d129700 -1 asok(0x8115a0) AdminSocket: request 'show config' not defined Oh, it's 'config show'. Also, 'help' will list the supported commands. Also perfcounters does not show anything: # ceph --admin-daemon /var/run/ceph.sock perfcounters_dump {} There may be another daemon that tried to attach to the same socket file. You might want to set 'admin socket = /var/run/ceph/$name.sock' or something similar, or whatever else is necessary to make it a unique file. ~]# ceph -v ceph version 0.48argonaut-2-gb576faa (commit:b576faa6f24356f4d3ec7205e298d58659e29c68) Out of curiousity, what patches are you applying on top of the release? sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- -- Alexandre D e rumier Ingénieur Systèmes et Réseaux Fixe : 03 20 68 88 85 Fax : 03 20 68 90 88 45 Bvd du Général Leclerc 59100 Roubaix 12 rue Marivaux 75002 Paris -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Generate URL-safe base64 strings for keys.
On 07/05/2012 04:31 PM, Sage Weil wrote: On Thu, 5 Jul 2012, Wido den Hollander wrote: On 04-07-12 18:18, Sage Weil wrote: On Wed, 4 Jul 2012, Wido den Hollander wrote: On Wed, 4 Jul 2012, Wido den Hollander wrote: By using this we prevent scenarios where cephx keys are not accepted in various situations. Replacing the + and / by - and _ we generate URL-safe base64 keys Signed-off-by: Wido den Hollander w...@widodh.nl Do already properly decode URL-sage base64 encoding? Yes, it decodes URL-safe base64 as well. See the if statements for 62 and 63, + and - are treated equally, just like / and _. Oh, got it. The commit description confused me... I thought this was related encoding only. I think we should break the encode and decode patches into separate versions, and apply the decode to a stable branch (argonaut) and the encode to the master. That should avoid most problems with a rolling/staggered upgrade... I just submitted a patch for decoding only. Applied, thanks! During some tests I did I found out that libvirt uses GNUlib and won't handle URL-safe base64 encoded keys. So, as long as Ceph allows them we're good. Users can always replace the + and / in their key knowing it will be accepted by Ceph. This works for me for now. The exact switch to base64url should be done at a later stage I think. The RFC on this: http://tools.ietf.org/html/rfc4648#page-7 We could: - submit a patch for gnulib; someday it'll support it I already did, but IF they accept anything else than RFC4648 they'll implement a lot of the other format as well. That will be some work. - kludge the secret generation code in ceph so that it rejects secrets with problematic encoding... :/ (radosgw-admin does something similar with +'s in the s3-style user keys.) Seems the easy way out, but it will work though. Wido sage Wido sage Wido sage --- src/common/armor.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/common/armor.c b/src/common/armor.c index d1d5664..7f73da1 100644 --- a/src/common/armor.c +++ b/src/common/armor.c @@ -9,7 +9,7 @@ * base64 encode/decode. */ -const char *pem_key = ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/; +const char *pem_key = ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_; static int encode_bits(int c) { @@ -24,9 +24,9 @@ static int decode_bits(char c) return c - 'a' + 26; if (c = '0' c = '9') return c - '0' + 52; -if (c == '+') +if (c == '+' || c == '-') return 62; -if (c == '/') +if (c == '/' || c == '_') return 63; if (c == '=') return 0; /* just non-negative, please */ -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] librados: Bump the version to 0.48
On 07/06/2012 12:33 AM, Gregory Farnum wrote: On Wed, Jul 4, 2012 at 9:33 AM, Sage Weil s...@inktank.com wrote: On Wed, 4 Jul 2012, Gregory Farnum wrote: Hmmm ÿÿ we generally try to modify these versions when the API changes, not on every sprint. It looks to me like Sage added one function in 0.45 where we maybe should have bumped it, but that was a long time ago and at this point we should maybe just eat it? Yeah, I went ahead and applied this to stable (argonaut) since it's as good a reference point as any. Moving forward, we should try to sync this up with API changes as they happen. Hmm, like that assert ObjectOperation that just went into master... Yep, should probably bump it to .49 then! (Since that's the version it will be part of, and nobody will get confused and try to bump it again before that release.) On Thu, Jul 5, 2012 at 12:26 AM, Wido den Hollander w...@widodh.nl wrote: That was my reasoning. I compiled phprados against 0.48 and saw that librados was reporting 0.44 as version. That could confuse users and they might think they still have an old library in place. Imho the version numbering should be totally different from Ceph if you only want to bump the version on an API change. Well, the problem with bumping it on every Ceph version is that it becomes a lot harder for tools to sync up to a known version of the API. Perhaps we should have divorced it from the Ceph versioning completely, but I don't know if we can still do that in a reasonable way or not. :/ You could always say: 0.48 was the stable release, from here on we are only going to bump the librados version if any API change happends. Sent out an e-mail on the devel list and write a blogpost? I don't think we have a lot of developers using native librados at the moment. Wido -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
oops in rbd module (con_work in libceph)
Hello. Bug happens in rbd client, at least in Kernel 3.4.4 . I have a completely reproductible bug. here is the oops : Jul 6 10:16:52 label5.u14.univ-nantes.prive kernel: [ 329.456285] EXT4-fs (rbd1): mounted filesystem with ordered data mode. Opts: (null) Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.709145] libceph: osd1 172.20.14.131:6801 socket closed Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.715245] BUG: unable to handle kernel NULL pointer dereference at 0048 Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.715430] IP: [a08488f0] con_work+0xfb0/0x20b0 [libceph] Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.715554] PGD a094cb067 PUD a0a7a7067 PMD 0 Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.715758] Oops: [#1] SMP Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.715914] CPU 0 Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.715963] Modules linked in: ext4 jbd2 crc16 rbd libceph drbd lru_cache cn ip6table_filter ip6_tables iptable_filt Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.720338] Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.720406] Pid: 1007, comm: kworker/0:2 Not tainted 3.4.4-dsiun-120521 #111 Dell Inc. PowerEdge M610/0V56FN Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.720637] RIP: 0010:[a08488f0] [a08488f0] con_work+0xfb0/0x20b0 [libceph] Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.720779] RSP: :880a1036dd50 EFLAGS: 00010246 Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.720851] RAX: RBX: RCX: 00031000 Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.720925] RDX: RSI: 880a1092c5a0 RDI: 880a1092c598 Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.721002] RBP: 0004f000 R08: 0020 R09: Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.721100] R10: 0010 R11: 880a122e0f08 R12: 0001 Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.721173] R13: 880a1092c500 R14: ea001430e300 R15: 880a0990f030 Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.721247] FS: () GS:880a2fc0() knlGS: Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.721337] CS: 0010 DS: ES: CR0: 8005003b Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.721409] CR2: 0048 CR3: 000a10823000 CR4: 07f0 Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.721483] DR0: DR1: DR2: Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.721557] DR3: DR6: 0ff0 DR7: 0400 Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.721632] Process kworker/0:2 (pid: 1007, threadinfo 880a1036c000, task 880a10b2f2c0) Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.721721] Stack: Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.721784] 0002 880a1036ddfc 0400 880a Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.722050] 880a1036ddd8 0004f000 880a0004f000 880a Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.722315] 880a0990f420 880a1092c5a0 880a0990f308 880a0990f1a8 Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.722581] Call Trace: Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.722653] [810534d2] ? process_one_work+0x122/0x3f0 Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.722728] [a0847940] ? ceph_con_revoke_message+0xc0/0xc0 [libceph] Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.722819] [81054c65] ? worker_thread+0x125/0x2e0 Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.722892] [81054b40] ? manage_workers.isra.25+0x1f0/0x1f0 Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.722969] [81059b85] ? kthread+0x85/0x90 Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.723042] [813baee4] ? kernel_thread_helper+0x4/0x10 Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.723116] [81059b00] ? flush_kthread_worker+0x80/0x80 Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.723189] [813baee0] ? gs_change+0x13/0x13 Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.723258] Code: ea f4 ff ff 0f 1f 80 00 00 00 00 49 83 bd 90 00 00 00 00 0f 84 ca 03 00 00 49 63 85 a0 00 00 00 49 Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.727478] RIP [a08488f0] con_work+0xfb0/0x20b0 [libceph] Jul 6 10:18:38 label5.u14.univ-nantes.prive kernel: [ 434.727599] RSP
unpackaged files in rpmbuild of 0.48
Hi All, I'm not sure if this is intentional or not, but during a rpm build of 0.48 gives the following error: Installed (but unpackaged) file(s) found: /sbin/ceph-disk-activate /sbin/ceph-disk-prepare RPM build errors: Installed (but unpackaged) file(s) found: /sbin/ceph-disk-activate /sbin/ceph-disk-prepare The following will fix the spec file diff -u ceph.spec.in.orig ceph.spec.in --- ceph.spec.in.orig 2012-07-06 15:38:59.298497719 +0100 +++ ceph.spec.in2012-07-06 15:39:45.423560177 +0100 @@ -326,6 +326,8 @@ /usr/sbin/rcceph %{_libdir}/rados-classes/libcls_rbd.so* %{_libdir}/rados-classes/libcls_rgw.so* +/sbin/ceph-disk-activate +/sbin/ceph-disk-prepare # %files fuse I could of course ignore Regards, Jimmy Tang -- Senior Software Engineer, Digital Repository of Ireland (DRI) Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. http://www.tchpc.tcd.ie/ -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: oops in rbd module (con_work in libceph)
Le 06/07/2012 10:31, Yann Dupont a écrit : Hello. Bug happens in rbd client, at least in Kernel 3.4.4 . I have a completely reproductible bug. just a note : 3.2.22 doesn't seems to exhibit the problem. I repeated the process 2 times without problems on this kernel. I'll launch realistic load on our ceph volume this week end (bacula backups). I'll see if 3.2.22 is solid. Then monday or tuesday I'll take the git bisect route to see which patch introduced the problem. Cheers, -- Yann Dupont - Service IRTS, DSI Université de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : yann.dup...@univ-nantes.fr -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unpackaged files in rpmbuild of 0.48
On Fri, 6 Jul 2012, Jimmy Tang wrote: Hi All, I'm not sure if this is intentional or not, but during a rpm build of 0.48 gives the following error: Installed (but unpackaged) file(s) found: /sbin/ceph-disk-activate /sbin/ceph-disk-prepare RPM build errors: Installed (but unpackaged) file(s) found: /sbin/ceph-disk-activate /sbin/ceph-disk-prepare The following will fix the spec file diff -u ceph.spec.in.orig ceph.spec.in --- ceph.spec.in.orig 2012-07-06 15:38:59.298497719 +0100 +++ ceph.spec.in 2012-07-06 15:39:45.423560177 +0100 @@ -326,6 +326,8 @@ /usr/sbin/rcceph %{_libdir}/rados-classes/libcls_rbd.so* %{_libdir}/rados-classes/libcls_rgw.so* +/sbin/ceph-disk-activate +/sbin/ceph-disk-prepare # %files fuse I fixed this up in the stable branch, and it'll be included in the .1 release. Thanks! sage I could of course ignore Regards, Jimmy Tang -- Senior Software Engineer, Digital Repository of Ireland (DRI) Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. http://www.tchpc.tcd.ie/ -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
mds fails to start on SL6
Hi All, I was giving ceph 0.48 a try on SL6x, the OSD's startup okay, but the mds fails to start, below is a snippet of the error, 2012-07-06 16:38:17.838055 7f2d6828d700 -1 mds.-1.0 *** got signal Terminated *** 2012-07-06 16:38:17.838139 7f2d6828d700 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2012-07-06 16:38:17.839020 7f2d6828d700 -1 osdc/Objecter.cc: In function 'void Objecter::shutdown()' thread 7f2d6828d700 time 2012-07-06 16:38:17.838156 osdc/Objecter.cc: 221: FAILED assert(initialized) ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030) 1: (Objecter::shutdown()+0x170) [0x6e2e20] 2: (MDS::suicide()+0xc9) [0x4ad829] 3: (MDS::handle_signal(int)+0x1bb) [0x4b447b] 4: (SignalHandler::entry()+0x283) [0x803d53] 5: /lib64/libpthread.so.0() [0x3b3ea077f1] 6: (clone()+0x6d) [0x3b3e6e5ccd] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- begin dump of recent events --- -3 2012-07-06 16:37:57.786954 7f2d6b496760 0 ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030), process ceph-mds, pid 12537 -2 2012-07-06 16:38:17.838055 7f2d6828d700 -1 mds.-1.0 *** got signal Terminated *** -1 2012-07-06 16:38:17.838139 7f2d6828d700 1 mds.-1.0 suicide. wanted down:dne, now up:boot 0 2012-07-06 16:38:17.839020 7f2d6828d700 -1 osdc/Objecter.cc: In function 'void Objecter::shutdown()' thread 7f2d6828d700 time 2012-07-06 16:38:17.838156 osdc/Objecter.cc: 221: FAILED assert(initialized) ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030) 1: (Objecter::shutdown()+0x170) [0x6e2e20] 2: (MDS::suicide()+0xc9) [0x4ad829] 3: (MDS::handle_signal(int)+0x1bb) [0x4b447b] 4: (SignalHandler::entry()+0x283) [0x803d53] 5: /lib64/libpthread.so.0() [0x3b3ea077f1] 6: (clone()+0x6d) [0x3b3e6e5ccd] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- end dump of recent events --- 2012-07-06 16:38:17.840237 7f2d6828d700 -1 *** Caught signal (Aborted) ** in thread 7f2d6828d700 ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030) 1: /usr/bin/ceph-mds() [0x803309] 2: /lib64/libpthread.so.0() [0x3b3ea0f4a0] 3: (gsignal()+0x35) [0x3b3e632885] 4: (abort()+0x175) [0x3b3e634065] 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3b432bea7d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- begin dump of recent events --- 0 2012-07-06 16:38:17.840237 7f2d6828d700 -1 *** Caught signal (Aborted) ** in thread 7f2d6828d700 ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030) 1: /usr/bin/ceph-mds() [0x803309] 2: /lib64/libpthread.so.0() [0x3b3ea0f4a0] 3: (gsignal()+0x35) [0x3b3e632885] 4: (abort()+0x175) [0x3b3e634065] 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3b432bea7d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- end dump of recent events --- Regards, Jimmy Tang -- Senior Software Engineer, Digital Repository of Ireland (DRI) Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. http://www.tchpc.tcd.ie/ -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Can I try CEPH on top of the openstack Essex ?
Hi, Can I try CEPH on top of the openstack/Swift Object store (Essex release) ? HB -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can I try CEPH on top of the openstack Essex ?
On Fri, Jul 6, 2012 at 8:29 AM, Chen, Hb hbc...@lanl.gov wrote: Hi, Can I try CEPH on top of the openstack/Swift Object store (Essex release) ? Nope! CephFS is heavily dependent on the features provided by the RADOS object store (and so is RBD, if that's what you're interested in). If you wanted to try it out[1] with an OpenStack cluster while only handling a single pool of storage, you could use RADOS and the RADOS Gateway, which speaks the Swift APIs. -Greg [1]: Though you should remember that CephFS is not production-ready yet. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mds fails to start on SL6
Do you have more in the log? It looks like it's being instructed to shut down before it's fully come up (thus the error in the Objecter http://tracker.newdream.net/issues/2740, but is not the root cause), but I can't see why. -Greg On Fri, Jul 6, 2012 at 8:42 AM, Jimmy Tang jt...@tchpc.tcd.ie wrote: Hi All, I was giving ceph 0.48 a try on SL6x, the OSD's startup okay, but the mds fails to start, below is a snippet of the error, 2012-07-06 16:38:17.838055 7f2d6828d700 -1 mds.-1.0 *** got signal Terminated *** 2012-07-06 16:38:17.838139 7f2d6828d700 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2012-07-06 16:38:17.839020 7f2d6828d700 -1 osdc/Objecter.cc: In function 'void Objecter::shutdown()' thread 7f2d6828d700 time 2012-07-06 16:38:17.838156 osdc/Objecter.cc: 221: FAILED assert(initialized) ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030) 1: (Objecter::shutdown()+0x170) [0x6e2e20] 2: (MDS::suicide()+0xc9) [0x4ad829] 3: (MDS::handle_signal(int)+0x1bb) [0x4b447b] 4: (SignalHandler::entry()+0x283) [0x803d53] 5: /lib64/libpthread.so.0() [0x3b3ea077f1] 6: (clone()+0x6d) [0x3b3e6e5ccd] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- begin dump of recent events --- -3 2012-07-06 16:37:57.786954 7f2d6b496760 0 ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030), process ceph-mds, pid 12537 -2 2012-07-06 16:38:17.838055 7f2d6828d700 -1 mds.-1.0 *** got signal Terminated *** -1 2012-07-06 16:38:17.838139 7f2d6828d700 1 mds.-1.0 suicide. wanted down:dne, now up:boot 0 2012-07-06 16:38:17.839020 7f2d6828d700 -1 osdc/Objecter.cc: In function 'void Objecter::shutdown()' thread 7f2d6828d700 time 2012-07-06 16:38:17.838156 osdc/Objecter.cc: 221: FAILED assert(initialized) ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030) 1: (Objecter::shutdown()+0x170) [0x6e2e20] 2: (MDS::suicide()+0xc9) [0x4ad829] 3: (MDS::handle_signal(int)+0x1bb) [0x4b447b] 4: (SignalHandler::entry()+0x283) [0x803d53] 5: /lib64/libpthread.so.0() [0x3b3ea077f1] 6: (clone()+0x6d) [0x3b3e6e5ccd] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- end dump of recent events --- 2012-07-06 16:38:17.840237 7f2d6828d700 -1 *** Caught signal (Aborted) ** in thread 7f2d6828d700 ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030) 1: /usr/bin/ceph-mds() [0x803309] 2: /lib64/libpthread.so.0() [0x3b3ea0f4a0] 3: (gsignal()+0x35) [0x3b3e632885] 4: (abort()+0x175) [0x3b3e634065] 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3b432bea7d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- begin dump of recent events --- 0 2012-07-06 16:38:17.840237 7f2d6828d700 -1 *** Caught signal (Aborted) ** in thread 7f2d6828d700 ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030) 1: /usr/bin/ceph-mds() [0x803309] 2: /lib64/libpthread.so.0() [0x3b3ea0f4a0] 3: (gsignal()+0x35) [0x3b3e632885] 4: (abort()+0x175) [0x3b3e634065] 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3b432bea7d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- end dump of recent events --- Regards, Jimmy Tang -- Senior Software Engineer, Digital Repository of Ireland (DRI) Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. http://www.tchpc.tcd.ie/ -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
On Fri, Jul 6, 2012 at 12:19 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: Le 05/07/2012 23:32, Gregory Farnum a écrit : [...] ok, so as all nodes were identical, I probably have hit a btrfs bug (like a erroneous out of space ) in more or less the same time. And when 1 osd was out, OH , I didn't finish the sentence... When 1 osd was out, missing data was copied on another nodes, probably speeding btrfs problem on those nodes (I suspect erroneous out of space conditions) Ah. How full are/were the disks? I've reformatted OSD with xfs. Performance is slightly worse for the moment (well, depend on the workload, and maybe lack of syncfs is to blame), but at least I hope to have the storage layer rock-solid. BTW, I've managed to keep the faulty btrfs volumes . [...] I wonder if maybe there's a confounding factor here — are all your nodes similar to each other, Yes. I designed the cluster that way. All nodes are identical hardware (powerEdge M610, 10G intel ethernet + emulex fibre channel attached to storage (1 Array for 2 OSD nodes, 1 controller dedicated for each OSD) Oh, interesting. Are the broken nodes all on the same set of arrays? No. There are 4 completely independant raid arrays, in 4 different locations. They are similar (same brand model, but slighltly different disks, and 1 different firmware), all arrays are multipathed. I don't think the raid array is the problem. We use those particular models since 2/3 years, and in the logs I don't see any problem that can be caused by the storage itself (like scsi or multipath errors) I must have misunderstood then. What did you mean by 1 Array for 2 OSD nodes? -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: speedup ceph / scaling / find the bottleneck
Am 06.07.2012 um 19:11 schrieb Gregory Farnum g...@inktank.com: On Thu, Jul 5, 2012 at 8:50 PM, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, Stefan is on vacation for the moment,I don't know if he can reply you. But I can reoly for him for the kvm part (as we do same tests together in parallel). - kvm is 1.1 - rbd 0.48 - drive option rbd:pool/volume:auth_supported=cephx;none;keyring=/etc/pve/priv/ceph/ceph.keyring:mon_host=X.X.X.X; -using writeback writeback tuning in ceph.conf on the kvm host rbd_cache_size = 33554432 rbd_cache_max_age = 2.0 benchmark use in kvm guest: fio --filename=$DISK --direct=1 --rw=randwrite --bs=4k --size=200G --numjobs=50 --runtime=90 --group_reporting --name=file1 results show max 14000io/s with 1 vm, 7000io/s by vm with 2vm,... so it doesn't scale (bench is with directio, so maybe writeback cache don't help) hardware for ceph , is 3 nodes with 4 intel ssd each. (1 drive can handle 4io/s randwrite locally) I'm interested in figuring out why we aren't getting useful data out of the admin socket, and for that I need the actual configuration files. It wouldn't surprise me if there are several layers to this issue but I'd like to start at the client's endpoint. :) While I'm on holiday I can't send you my ceph.conf but it doesn't contain anything else than the locations and journal dio false for tmpfs and /var/run/ceph_$name.sock Regarding the random IO, you shouldn't overestimate your storage. Under plenty of scenarios your drives are lucky to do more than 2k IO/s, which is about what you're seeing http://techreport.com/articles.x/22415/9 You're fine if the ceph workload is the same as the iometer file server workload. I don't know. I've measured the raw random 4k workload. Also I've tested adding another osd and speed still doesn't change but with a size of 200gb I should hit several osd servers. Stefan -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: speedup ceph / scaling / find the bottleneck
On Fri, Jul 6, 2012 at 11:09 AM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: Am 06.07.2012 um 19:11 schrieb Gregory Farnum g...@inktank.com: On Thu, Jul 5, 2012 at 8:50 PM, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, Stefan is on vacation for the moment,I don't know if he can reply you. But I can reoly for him for the kvm part (as we do same tests together in parallel). - kvm is 1.1 - rbd 0.48 - drive option rbd:pool/volume:auth_supported=cephx;none;keyring=/etc/pve/priv/ceph/ceph.keyring:mon_host=X.X.X.X; -using writeback writeback tuning in ceph.conf on the kvm host rbd_cache_size = 33554432 rbd_cache_max_age = 2.0 benchmark use in kvm guest: fio --filename=$DISK --direct=1 --rw=randwrite --bs=4k --size=200G --numjobs=50 --runtime=90 --group_reporting --name=file1 results show max 14000io/s with 1 vm, 7000io/s by vm with 2vm,... so it doesn't scale (bench is with directio, so maybe writeback cache don't help) hardware for ceph , is 3 nodes with 4 intel ssd each. (1 drive can handle 4io/s randwrite locally) I'm interested in figuring out why we aren't getting useful data out of the admin socket, and for that I need the actual configuration files. It wouldn't surprise me if there are several layers to this issue but I'd like to start at the client's endpoint. :) While I'm on holiday I can't send you my ceph.conf but it doesn't contain anything else than the locations and journal dio false for tmpfs and /var/run/ceph_$name.sock Is that socket in the global area? Does the KVM process have permission to access that directory? If you enable logging can you get any outputs that reference errors opening that file? (I realize you're on holiday; these are just the questions we'll need answered to get it working.) Regarding the random IO, you shouldn't overestimate your storage. Under plenty of scenarios your drives are lucky to do more than 2k IO/s, which is about what you're seeing http://techreport.com/articles.x/22415/9 You're fine if the ceph workload is the same as the iometer file server workload. I don't know. I've measured the raw random 4k workload. Also I've tested adding another osd and speed still doesn't change but with a size of 200gb I should hit several osd servers. Okay — just wanted to point it out. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: mds fails to start on SL6
Does SL6 have the kernel level required ? Tim -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- ow...@vger.kernel.org] On Behalf Of Jimmy Tang Sent: 06 July 2012 17:43 To: ceph-devel@vger.kernel.org Subject: mds fails to start on SL6 Hi All, I was giving ceph 0.48 a try on SL6x, the OSD's startup okay, but the mds fails to start, below is a snippet of the error, 2012-07-06 16:38:17.838055 7f2d6828d700 -1 mds.-1.0 *** got signal Terminated *** 2012-07-06 16:38:17.838139 7f2d6828d700 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2012-07-06 16:38:17.839020 7f2d6828d700 -1 osdc/Objecter.cc: In function 'void Objecter::shutdown()' thread 7f2d6828d700 time 2012-07-06 16:38:17.838156 osdc/Objecter.cc: 221: FAILED assert(initialized) ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030) 1: (Objecter::shutdown()+0x170) [0x6e2e20] 2: (MDS::suicide()+0xc9) [0x4ad829] 3: (MDS::handle_signal(int)+0x1bb) [0x4b447b] 4: (SignalHandler::entry()+0x283) [0x803d53] 5: /lib64/libpthread.so.0() [0x3b3ea077f1] 6: (clone()+0x6d) [0x3b3e6e5ccd] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- begin dump of recent events --- -3 2012-07-06 16:37:57.786954 7f2d6b496760 0 ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030), process ceph-mds, pid 12537 -2 2012-07-06 16:38:17.838055 7f2d6828d700 -1 mds.-1.0 *** got signal Terminated *** -1 2012-07-06 16:38:17.838139 7f2d6828d700 1 mds.-1.0 suicide. wanted down:dne, now up:boot 0 2012-07-06 16:38:17.839020 7f2d6828d700 -1 osdc/Objecter.cc: In function 'void Objecter::shutdown()' thread 7f2d6828d700 time 2012-07-06 16:38:17.838156 osdc/Objecter.cc: 221: FAILED assert(initialized) ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030) 1: (Objecter::shutdown()+0x170) [0x6e2e20] 2: (MDS::suicide()+0xc9) [0x4ad829] 3: (MDS::handle_signal(int)+0x1bb) [0x4b447b] 4: (SignalHandler::entry()+0x283) [0x803d53] 5: /lib64/libpthread.so.0() [0x3b3ea077f1] 6: (clone()+0x6d) [0x3b3e6e5ccd] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- end dump of recent events --- 2012-07-06 16:38:17.840237 7f2d6828d700 -1 *** Caught signal (Aborted) ** in thread 7f2d6828d700 ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030) 1: /usr/bin/ceph-mds() [0x803309] 2: /lib64/libpthread.so.0() [0x3b3ea0f4a0] 3: (gsignal()+0x35) [0x3b3e632885] 4: (abort()+0x175) [0x3b3e634065] 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3b432bea7d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- begin dump of recent events --- 0 2012-07-06 16:38:17.840237 7f2d6828d700 -1 *** Caught signal (Aborted) ** in thread 7f2d6828d700 ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030) 1: /usr/bin/ceph-mds() [0x803309] 2: /lib64/libpthread.so.0() [0x3b3ea0f4a0] 3: (gsignal()+0x35) [0x3b3e632885] 4: (abort()+0x175) [0x3b3e634065] 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3b432bea7d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- end dump of recent events --- Regards, Jimmy Tang -- Senior Software Engineer, Digital Repository of Ireland (DRI) Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. http://www.tchpc.tcd.ie/ -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html smime.p7s Description: S/MIME cryptographic signature
Re: mds fails to start on SL6
On Fri, Jul 6, 2012 at 12:07 PM, Tim Bell tim.b...@cern.ch wrote: Does SL6 have the kernel level required ? The MDS is a userspace daemon that demands absolutely nothing unusual from the kernel. :) -Greg Tim -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- ow...@vger.kernel.org] On Behalf Of Jimmy Tang Sent: 06 July 2012 17:43 To: ceph-devel@vger.kernel.org Subject: mds fails to start on SL6 Hi All, I was giving ceph 0.48 a try on SL6x, the OSD's startup okay, but the mds fails to start, below is a snippet of the error, 2012-07-06 16:38:17.838055 7f2d6828d700 -1 mds.-1.0 *** got signal Terminated *** 2012-07-06 16:38:17.838139 7f2d6828d700 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2012-07-06 16:38:17.839020 7f2d6828d700 -1 osdc/Objecter.cc: In function 'void Objecter::shutdown()' thread 7f2d6828d700 time 2012-07-06 16:38:17.838156 osdc/Objecter.cc: 221: FAILED assert(initialized) ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030) 1: (Objecter::shutdown()+0x170) [0x6e2e20] 2: (MDS::suicide()+0xc9) [0x4ad829] 3: (MDS::handle_signal(int)+0x1bb) [0x4b447b] 4: (SignalHandler::entry()+0x283) [0x803d53] 5: /lib64/libpthread.so.0() [0x3b3ea077f1] 6: (clone()+0x6d) [0x3b3e6e5ccd] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- begin dump of recent events --- -3 2012-07-06 16:37:57.786954 7f2d6b496760 0 ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030), process ceph-mds, pid 12537 -2 2012-07-06 16:38:17.838055 7f2d6828d700 -1 mds.-1.0 *** got signal Terminated *** -1 2012-07-06 16:38:17.838139 7f2d6828d700 1 mds.-1.0 suicide. wanted down:dne, now up:boot 0 2012-07-06 16:38:17.839020 7f2d6828d700 -1 osdc/Objecter.cc: In function 'void Objecter::shutdown()' thread 7f2d6828d700 time 2012-07-06 16:38:17.838156 osdc/Objecter.cc: 221: FAILED assert(initialized) ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030) 1: (Objecter::shutdown()+0x170) [0x6e2e20] 2: (MDS::suicide()+0xc9) [0x4ad829] 3: (MDS::handle_signal(int)+0x1bb) [0x4b447b] 4: (SignalHandler::entry()+0x283) [0x803d53] 5: /lib64/libpthread.so.0() [0x3b3ea077f1] 6: (clone()+0x6d) [0x3b3e6e5ccd] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- end dump of recent events --- 2012-07-06 16:38:17.840237 7f2d6828d700 -1 *** Caught signal (Aborted) ** in thread 7f2d6828d700 ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030) 1: /usr/bin/ceph-mds() [0x803309] 2: /lib64/libpthread.so.0() [0x3b3ea0f4a0] 3: (gsignal()+0x35) [0x3b3e632885] 4: (abort()+0x175) [0x3b3e634065] 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3b432bea7d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- begin dump of recent events --- 0 2012-07-06 16:38:17.840237 7f2d6828d700 -1 *** Caught signal (Aborted) ** in thread 7f2d6828d700 ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030) 1: /usr/bin/ceph-mds() [0x803309] 2: /lib64/libpthread.so.0() [0x3b3ea0f4a0] 3: (gsignal()+0x35) [0x3b3e632885] 4: (abort()+0x175) [0x3b3e634065] 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3b432bea7d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- end dump of recent events --- Regards, Jimmy Tang -- Senior Software Engineer, Digital Repository of Ireland (DRI) Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. http://www.tchpc.tcd.ie/ -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] Linux 3.4 Ceph stable release branch
There is a new branch available at the Ceph client git repository, which is located here: http://github.com/ceph/ceph-client.git The branch is named linux-3.4.4-ceph, and it is based on the latest Linux 3.4.y stable release. This Ceph stable branch contains ported bug fixes that have been added to Ceph-related code (including rbd, libceph, and the ceph file system) since the release of Linux 3.4. Right now we have no plans to port to any other kernels, but if there is enough demand for it we can arrange to do so. Commits in a Ceph stable branch will be submitted for inclusion in future upstream stable releases, but until they are released there they will be available on the Ceph git repository. Our general plan going forward will be to create Ceph stable branches only as required to ensure new critical bug fixes are provided for older released kernels in a timely manner. Once fixes in a Ceph stable branch have been incorporated into an upstream stable Linux release, there will be no more need to maintain the separate branch, so it will go away. -Alex Note: the latest upstream stable Linux releases are published here: http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: oops in rbd module (con_work in libceph)
On 07/06/2012 10:35 AM, Yann Dupont wrote: Le 06/07/2012 10:31, Yann Dupont a écrit : Hello. Bug happens in rbd client, at least in Kernel 3.4.4 . I have a completely reproductible bug. just a note : 3.2.22 doesn't seems to exhibit the problem. I repeated the process 2 times without problems on this kernel. There are a number of bugs that have been fixed since Linux 3.4, and the fixes have not made it into the 3.4.y stable releases. I just sent an announcement about the Ceph stable branch that's available in the Ceph git repository. If possible I would recommend you try using that for 3.4 testing. The branch is here: http://github.com/ceph/ceph-client/tree/linux-3.4.4-ceph If you do have troubles I would very much like to hear about it. And if you don't run into the problems you've been seeing that would be good to know as well. -Alex I'll launch realistic load on our ceph volume this week end (bacula backups). I'll see if 3.2.22 is solid. Then monday or tuesday I'll take the git bisect route to see which patch introduced the problem. Cheers, -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: mkcephfs failing on v0.48 argonaut
Hi again Sage This is very perplexing. Confirming this system is a stock Ubuntu 12.04 x64, with no custom kernel or anything else, fully apt-get dist-upgrade'd up to date. root@dsanb1-coy:~# uname -r 3.2.0-26-generic I have added in the suggestions you made to the script, we now have: modprobe btrfs || true which mkfs.btrfs echo RUNNING: mkfs.btrfs $btrfs_devs mkfs.btrfs $btrfs_devs btrfs device scan || btrfsctl -a echo RUNNING: mount -t btrfs $btrfs_opt $first_dev $btrfs_path which mount mount -t btrfs $btrfs_opt $first_dev $btrfs_path echo DID I GET HERE - OR CRASH OUT WITH mount ABOVE? chown $osd_user $btrfs_path See below that the same command within the mkcephfs that is failing, is working fine on a standard command line: === osd.0 === --- dsanb1-coy# /sbin/mkcephfs -d /tmp/mkcephfs.xgk025tjkQ --prepare-osdfs osd.0 umount: /srv/osd.0: not mounted umount: /dev/sdc: not mounted /sbin/mkfs.btrfs RUNNING: mkfs.btrfs /dev/sdc WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using fs created label (null) on /dev/sdc nodesize 4096 leafsize 4096 sectorsize 4096 size 1.82TB Btrfs Btrfs v0.19 Scanning for Btrfs filesystems RUNNING: mount -t btrfs -o noatime /dev/sdc /srv/osd.0 /bin/mount mount: wrong fs type, bad option, bad superblock on /dev/sdc, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so failed: '/sbin/mkcephfs -d /tmp/mkcephfs.xgk025tjkQ --prepare-osdfs osd.0' root@dsanb1-coy:~# /bin/mount -t btrfs -o noatime /dev/sdc /srv/osd.0 root@dsanb1-coy:~# mount | grep btrfs /dev/sdc on /srv/osd.0 type btrfs (rw,noatime) Remember, this is not isolated to btrfs, as per my original post it fails when not specifying to use btrfs. I can only conclude that /bin/sh /or /bin/bash and the way they interact with the mkcephfs script, which does call itself etc, is somehow now become fuddled up? Must be something wiggy, when the script output confirms it is calling the same command ( /bin/mount ) but somehow finds a way for that to not work and therefore cause the mkcephfs script terminate. Many thanks - will be a relief to sort this out, as all our Ceph project works are on hold til we can sort this one out. Cheers Paul -Original Message- From: Sage Weil [mailto:s...@inktank.com] Sent: Friday, 6 July 2012 2:09 PM To: Paul Pettigrew Cc: ceph-devel@vger.kernel.org Subject: RE: mkcephfs failing on v0.48 argonaut On Fri, 6 Jul 2012, Paul Pettigrew wrote: Hi Sage - thanks so much for the quick response :-) Firstly, and it is a bit hard to see, but the command output below is run with the -v option. To help isolate what command line in the script is failing, I have added in some simple echo output, and the script now looks like: ### prepare-osdfs ### if [ -n $prepareosdfs ]; then SNIP modprobe btrfs || true echo RUNNING: mkfs.btrfs $btrfs_devs mkfs.btrfs $btrfs_devs btrfs device scan || btrfsctl -a echo RUNNING: mount -t btrfs $btrfs_opt $first_dev $btrfs_path mount -t btrfs $btrfs_opt $first_dev $btrfs_path echo DID I GET HERE - OR CRASH OUT WITH mount ABOVE? chown $osd_user $btrfs_path chmod +w $btrfs_path exit 0 fi Per the modified script the above, here is the output displayed when running the script: root@dsanb1-coy:/srv# /sbin/mkcephfs -c /etc/ceph/ceph.conf --allhosts --mkbtrfs -k /etc/ceph/keyring --crushmapsrc crushfile.txt -v temp dir is /tmp/mkcephfs.uelzdJ82ej preparing monmap in /tmp/mkcephfs.uelzdJ82ej/monmap /usr/bin/monmaptool --create --clobber --add alpha 10.32.0.10:6789 --add bravo 10.32.0.25:6789 --add charlie 10.32.0.11:6789 --print /tmp/mkcephfs.uelzdJ82ej/monmap /usr/bin/monmaptool: monmap file /tmp/mkcephfs.uelzdJ82ej/monmap /usr/bin/monmaptool: generated fsid b254abdd-e036-4186-b6d5-e32b14e53b45 epoch 0 fsid b254abdd-e036-4186-b6d5-e32b14e53b45 last_changed 2012-07-06 12:31:38.416848 created 2012-07-06 12:31:38.416848 0: 10.32.0.10:6789/0 mon.alpha 1: 10.32.0.11:6789/0 mon.charlie 2: 10.32.0.25:6789/0 mon.bravo /usr/bin/monmaptool: writing epoch 0 to /tmp/mkcephfs.uelzdJ82ej/monmap (3 monitors) /usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.0 user === osd.0 === --- dsanb1-coy# /sbin/mkcephfs -d /tmp/mkcephfs.uelzdJ82ej --prepare-osdfs osd.0 umount: /srv/osd.0: not mounted umount: /dev/sdc: not mounted RUNNING: mkfs.btrfs /dev/sdc WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using fs created label (null) on /dev/sdc nodesize 4096 leafsize 4096 sectorsize 4096 size 1.82TB Btrfs Btrfs v0.19 Scanning for Btrfs filesystems RUNNING: mount -t btrfs -o noatime /dev/sdc /srv/osd.0 mount: wrong fs type, bad option, bad superblock on /dev/sdc, missing codepage or helper program, or other error In some cases useful info is found in syslog -
RE: mkcephfs failing on v0.48 argonaut
UPDATED code now within the below (paste snafu, sorry - ignore most recent post), my comments/findings the same however... Paul -Original Message- Hi again Sage This is very perplexing. Confirming this system is a stock Ubuntu 12.04 x64, with no custom kernel or anything else, fully apt-get dist-upgrade'd up to date. root@dsanb1-coy:~# uname -r 3.2.0-26-generic I have added in the suggestions you made to the script, we now have: modprobe btrfs || true which mkfs.btrfs echo RUNNING: mkfs.btrfs $btrfs_devs mkfs.btrfs $btrfs_devs btrfs device scan || btrfsctl -a which mount echo RUNNING: mount -t btrfs $btrfs_opt $first_dev $btrfs_path mount -t btrfs $btrfs_opt $first_dev $btrfs_path echo DID I GET HERE - OR CRASH OUT WITH mount ABOVE? chown $osd_user $btrfs_path chmod +w $btrfs_path See below that the same command within the mkcephfs that is failing, is working fine on a standard command line: === osd.0 === --- dsanb1-coy# /sbin/mkcephfs -d /tmp/mkcephfs.ruZy4Apo23 --prepare-osdfs osd.0 umount: /dev/sdc: not mounted /sbin/mkfs.btrfs RUNNING: mkfs.btrfs /dev/sdc WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using fs created label (null) on /dev/sdc nodesize 4096 leafsize 4096 sectorsize 4096 size 1.82TB Btrfs Btrfs v0.19 Scanning for Btrfs filesystems /bin/mount RUNNING: mount -t btrfs -o noatime /dev/sdc /srv/osd.0 mount: wrong fs type, bad option, bad superblock on /dev/sdc, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so failed: '/sbin/mkcephfs -d /tmp/mkcephfs.ruZy4Apo23 --prepare-osdfs osd.0' root@dsanb1-coy:~# /bin/mount -t btrfs -o noatime /dev/sdc /srv/osd.0 root@dsanb1-coy:~# mount | grep btrfs /dev/sdc on /srv/osd.0 type btrfs (rw,noatime) Remember, this is not isolated to btrfs, as per my original post it fails when not specifying to use btrfs. I can only conclude that /bin/sh /or /bin/bash and the way they interact with the mkcephfs script, which does call itself etc, is somehow now become fuddled up? Must be something wiggy, when the script output confirms it is calling the same command ( /bin/mount ) but somehow finds a way for that to not work and therefore cause the mkcephfs script terminate. Many thanks - will be a relief to sort this out, as all our Ceph project works are on hold til we can sort this one out. Cheers Paul -Original Message- From: Sage Weil [mailto:s...@inktank.com] Sent: Friday, 6 July 2012 2:09 PM To: Paul Pettigrew Cc: ceph-devel@vger.kernel.org Subject: RE: mkcephfs failing on v0.48 argonaut On Fri, 6 Jul 2012, Paul Pettigrew wrote: Hi Sage - thanks so much for the quick response :-) Firstly, and it is a bit hard to see, but the command output below is run with the -v option. To help isolate what command line in the script is failing, I have added in some simple echo output, and the script now looks like: ### prepare-osdfs ### if [ -n $prepareosdfs ]; then SNIP modprobe btrfs || true echo RUNNING: mkfs.btrfs $btrfs_devs mkfs.btrfs $btrfs_devs btrfs device scan || btrfsctl -a echo RUNNING: mount -t btrfs $btrfs_opt $first_dev $btrfs_path mount -t btrfs $btrfs_opt $first_dev $btrfs_path echo DID I GET HERE - OR CRASH OUT WITH mount ABOVE? chown $osd_user $btrfs_path chmod +w $btrfs_path exit 0 fi Per the modified script the above, here is the output displayed when running the script: root@dsanb1-coy:/srv# /sbin/mkcephfs -c /etc/ceph/ceph.conf --allhosts --mkbtrfs -k /etc/ceph/keyring --crushmapsrc crushfile.txt -v temp dir is /tmp/mkcephfs.uelzdJ82ej preparing monmap in /tmp/mkcephfs.uelzdJ82ej/monmap /usr/bin/monmaptool --create --clobber --add alpha 10.32.0.10:6789 --add bravo 10.32.0.25:6789 --add charlie 10.32.0.11:6789 --print /tmp/mkcephfs.uelzdJ82ej/monmap /usr/bin/monmaptool: monmap file /tmp/mkcephfs.uelzdJ82ej/monmap /usr/bin/monmaptool: generated fsid b254abdd-e036-4186-b6d5-e32b14e53b45 epoch 0 fsid b254abdd-e036-4186-b6d5-e32b14e53b45 last_changed 2012-07-06 12:31:38.416848 created 2012-07-06 12:31:38.416848 0: 10.32.0.10:6789/0 mon.alpha 1: 10.32.0.11:6789/0 mon.charlie 2: 10.32.0.25:6789/0 mon.bravo /usr/bin/monmaptool: writing epoch 0 to /tmp/mkcephfs.uelzdJ82ej/monmap (3 monitors) /usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.0 user === osd.0 === --- dsanb1-coy# /sbin/mkcephfs -d /tmp/mkcephfs.uelzdJ82ej --prepare-osdfs osd.0 umount: /srv/osd.0: not mounted umount: /dev/sdc: not mounted RUNNING: mkfs.btrfs /dev/sdc WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using fs created label (null) on /dev/sdc nodesize 4096 leafsize 4096 sectorsize 4096 size 1.82TB Btrfs Btrfs v0.19 Scanning for Btrfs filesystems RUNNING: mount -t btrfs -o noatime /dev/sdc /srv/osd.0 mount: wrong fs
RE: mkcephfs failing on v0.48 argonaut
On Sat, 7 Jul 2012, Paul Pettigrew wrote: Hi again Sage This is very perplexing. Confirming this system is a stock Ubuntu 12.04 x64, with no custom kernel or anything else, fully apt-get dist-upgrade'd up to date. root@dsanb1-coy:~# uname -r 3.2.0-26-generic I have added in the suggestions you made to the script, we now have: modprobe btrfs || true which mkfs.btrfs echo RUNNING: mkfs.btrfs $btrfs_devs mkfs.btrfs $btrfs_devs btrfs device scan || btrfsctl -a echo RUNNING: mount -t btrfs $btrfs_opt $first_dev $btrfs_path which mount mount -t btrfs $btrfs_opt $first_dev $btrfs_path echo DID I GET HERE - OR CRASH OUT WITH mount ABOVE? chown $osd_user $btrfs_path See below that the same command within the mkcephfs that is failing, is working fine on a standard command line: Weirdness! === osd.0 === --- dsanb1-coy# /sbin/mkcephfs -d /tmp/mkcephfs.xgk025tjkQ --prepare-osdfs osd.0 Can you run this command with -x to see exactly what bash is doing? sh -x /sbin/mkcephfs -d /tmp/mkcephfs.xgk025tjkQ --prepare-osdfs osd.0 In particular, I'm curious if you do mkfs.btrfs /dev/sdc btrfs device scan mount /dev/sdc /srv/osd.0 (or whatever the exact sequence that mkcephfs does is) from the command line, does it give you the same error? sage umount: /srv/osd.0: not mounted umount: /dev/sdc: not mounted /sbin/mkfs.btrfs RUNNING: mkfs.btrfs /dev/sdc WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using fs created label (null) on /dev/sdc nodesize 4096 leafsize 4096 sectorsize 4096 size 1.82TB Btrfs Btrfs v0.19 Scanning for Btrfs filesystems RUNNING: mount -t btrfs -o noatime /dev/sdc /srv/osd.0 /bin/mount mount: wrong fs type, bad option, bad superblock on /dev/sdc, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so failed: '/sbin/mkcephfs -d /tmp/mkcephfs.xgk025tjkQ --prepare-osdfs osd.0' root@dsanb1-coy:~# /bin/mount -t btrfs -o noatime /dev/sdc /srv/osd.0 root@dsanb1-coy:~# mount | grep btrfs /dev/sdc on /srv/osd.0 type btrfs (rw,noatime) Remember, this is not isolated to btrfs, as per my original post it fails when not specifying to use btrfs. I can only conclude that /bin/sh /or /bin/bash and the way they interact with the mkcephfs script, which does call itself etc, is somehow now become fuddled up? Must be something wiggy, when the script output confirms it is calling the same command ( /bin/mount ) but somehow finds a way for that to not work and therefore cause the mkcephfs script termin5Date. Many thanks - will be a relief to sort this out, as all our Ceph project works are on hold til we can sort this one out. Cheers Paul -Original Message- From: Sage Weil [mailto:s...@inktank.com] Sent: Friday, 6 July 2012 2:09 PM To: Paul Pettigrew Cc: ceph-devel@vger.kernel.org Subject: RE: mkcephfs failing on v0.48 argonaut On Fri, 6 Jul 2012, Paul Pettigrew wrote: Hi Sage - thanks so much for the quick response :-) Firstly, and it is a bit hard to see, but the command output below is run with the -v option. To help isolate what command line in the script is failing, I have added in some simple echo output, and the script now looks like: ### prepare-osdfs ### if [ -n $prepareosdfs ]; then SNIP modprobe btrfs || true echo RUNNING: mkfs.btrfs $btrfs_devs mkfs.btrfs $btrfs_devs btrfs device scan || btrfsctl -a echo RUNNING: mount -t btrfs $btrfs_opt $first_dev $btrfs_path mount -t btrfs $btrfs_opt $first_dev $btrfs_path echo DID I GET HERE - OR CRASH OUT WITH mount ABOVE? chown $osd_user $btrfs_path chmod +w $btrfs_path exit 0 fi Per the modified script the above, here is the output displayed when running the script: root@dsanb1-coy:/srv# /sbin/mkcephfs -c /etc/ceph/ceph.conf --allhosts --mkbtrfs -k /etc/ceph/keyring --crushmapsrc crushfile.txt -v temp dir is /tmp/mkcephfs.uelzdJ82ej preparing monmap in /tmp/mkcephfs.uelzdJ82ej/monmap /usr/bin/monmaptool --create --clobber --add alpha 10.32.0.10:6789 --add bravo 10.32.0.25:6789 --add charlie 10.32.0.11:6789 --print /tmp/mkcephfs.uelzdJ82ej/monmap /usr/bin/monmaptool: monmap file /tmp/mkcephfs.uelzdJ82ej/monmap /usr/bin/monmaptool: generated fsid b254abdd-e036-4186-b6d5-e32b14e53b45 epoch 0 fsid b254abdd-e036-4186-b6d5-e32b14e53b45 last_changed 2012-07-06 12:31:38.416848 created 2012-07-06 12:31:38.416848 0: 10.32.0.10:6789/0 mon.alpha 1: 10.32.0.11:6789/0 mon.charlie 2: 10.32.0.25:6789/0 mon.bravo /usr/bin/monmaptool: writing epoch 0 to /tmp/mkcephfs.uelzdJ82ej/monmap (3 monitors) /usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.0 user === osd.0 === --- dsanb1-coy# /sbin/mkcephfs -d /tmp/mkcephfs.uelzdJ82ej --prepare-osdfs osd.0 umount: /srv/osd.0: not