Re: domino-style OSD crash
Le 09/07/2012 19:14, Samuel Just a écrit : Can you restart the node that failed to complete the upgrade with Well, it's a little big complicated ; I now run those nodes with XFS, and I've long-running jobs on it right now, so I can't stop the ceph cluster at the moment. As I've keeped the original broken btrfs volumes, I tried this morning to run the old osd in parrallel, using the $cluster variable. I only have partial success. I tried using different port for the mons, but ceph want to use the old mon map. I can edit it (epoch 1) but it seems to use 'latest' instead, the format isn't compatible with monmaptool and I don't know how to inject the modified on a non running cluster. Anyway, osd seems to start fine, and I can reproduce the bug : debug filestore = 20 debug osd = 20 I've put it in [global], is it sufficient ? and post the log after an hour or so of running? The upgrade process might legitimately take a while. -Sam Only 15 minutes running, but ceph-osd is consumming lots of cpu, and a strace shows lots of pread. Here is the log : [..] 2012-07-10 11:33:29.560052 7f3e615ac780 0 filestore(/CEPH-PROD/data/osd.1) mount syncfs(2) syscall not support by glibc 2012-07-10 11:33:29.560062 7f3e615ac780 0 filestore(/CEPH-PROD/data/osd.1) mount no syncfs(2), but the btrfs SYNC ioctl will suffice 2012-07-10 11:33:29.560172 7f3e615ac780 -1 filestore(/CEPH-PROD/data/osd.1) FileStore::mount : stale version stamp detected: 2. Proceeding, do_update is set, performing disk format upgrade. 2012-07-10 11:33:29.560233 7f3e615ac780 0 filestore(/CEPH-PROD/data/osd.1) mount found snaps 3744666,3746725 2012-07-10 11:33:29.560263 7f3e615ac780 10 filestore(/CEPH-PROD/data/osd.1) current/ seq was 3746725 2012-07-10 11:33:29.560267 7f3e615ac780 10 filestore(/CEPH-PROD/data/osd.1) most recent snap from 3744666,3746725 is 3746725 2012-07-10 11:33:29.560280 7f3e615ac780 10 filestore(/CEPH-PROD/data/osd.1) mount rolling back to consistent snap 3746725 2012-07-10 11:33:29.839281 7f3e615ac780 5 filestore(/CEPH-PROD/data/osd.1) mount op_seq is 3746725 ... and nothing more. I'll let him running for 3 hours. If I have another message, I'll let you know. Cheers, -- Yann Dupont - Service IRTS, DSI Université de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : yann.dup...@univ-nantes.fr -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
On Tue, Jul 10, 2012 at 2:46 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: As I've keeped the original broken btrfs volumes, I tried this morning to run the old osd in parrallel, using the $cluster variable. I only have partial success. The cluster mechanism was never intended for moving existing osds to other clusters. Trying that might not be a good idea. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
Le 10/07/2012 17:56, Tommi Virtanen a écrit : On Tue, Jul 10, 2012 at 2:46 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: As I've keeped the original broken btrfs volumes, I tried this morning to run the old osd in parrallel, using the $cluster variable. I only have partial success. The cluster mechanism was never intended for moving existing osds to other clusters. Trying that might not be a good idea. Ok, good to know. I saw that the remaining maps could lead to problem, but in 2 words, what are the other associated risks ? Basically If I use 2 distincts config files, with differents non-overlapping paths, and different ports for OSD, MDS MON, we basically have 2 distincts and independant instances ? By the way, is using 2 mon instance with different ports supported ? Cheers, -- Yann Dupont - Service IRTS, DSI Université de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : yann.dup...@univ-nantes.fr -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
On Tue, Jul 10, 2012 at 9:39 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: The cluster mechanism was never intended for moving existing osds to other clusters. Trying that might not be a good idea. Ok, good to know. I saw that the remaining maps could lead to problem, but in 2 words, what are the other associated risks ? Basically If I use 2 distincts config files, with differents non-overlapping paths, and different ports for OSD, MDS MON, we basically have 2 distincts and independant instances ? Fundamentally, it comes down to this: the two clusters will still have the same fsid, and you won't be isolated from configuration errors or leftover state (such as the monmap) in any way. There's a high chance that your let's poke around and debug cluster wrecks your healthy cluster. By the way, is using 2 mon instance with different ports supported ? Monitors are identified by ip:port. You can have multiple bind to the same IP address, as long as they get separate ports. Naturally, this practically means giving up on high availability. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
Le 10/07/2012 19:11, Tommi Virtanen a écrit : On Tue, Jul 10, 2012 at 9:39 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: The cluster mechanism was never intended for moving existing osds to other clusters. Trying that might not be a good idea. Ok, good to know. I saw that the remaining maps could lead to problem, but in 2 words, what are the other associated risks ? Basically If I use 2 distincts config files, with differents non-overlapping paths, and different ports for OSD, MDS MON, we basically have 2 distincts and independant instances ? Fundamentally, it comes down to this: the two clusters will still have the same fsid, and you won't be isolated from configuration errors or Ah I understand. This is not the case : see : root@chichibu:~# cat /CEPH/data/osd.0/fsid f00139fe-478e-4c50-80e2-f7cb359100d4 root@chichibu:~# cat /CEPH-PROD/data/osd.0/fsid 43afd025-330e-4aa8-9324-3e9b0afce794 (CEPH-PROD is the old btrfs volume ). /CEPH is new xfs volume, completely redone reformatted with mkcephfs. The volumes are totally independant : if you want the gore details : root@chichibu:~# lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert ceph-osdLocalDisk -wi-a- 225,00g mon-btrfs LocalDisk -wi-ao 10,00g mon-xfs LocalDisk -wi-ao 10,00g dataceph-chichibu -wi-ao 5,00t- OLD btrfs, mounted on /CEPH-PROD dataxceph-chichibu -wi-ao 4,50t - NEW xfs, mounted on /CEPH leftover state (such as the monmap) in any way. There's a high chance that your let's poke around and debug cluster wrecks your healthy cluster. Yes I understand the risk. By the way, is using 2 mon instance with different ports supported ? Monitors are identified by ip:port. You can have multiple bind to the same IP address, as long as they get separate ports. Naturally, this practically means giving up on high availability. The idea is not just having 2 mon. I'll still use 3 differents machines for mon, but with 2 mon instance on each. One for the current ceph, the other for the old ceph. 2x3 Mon. Cheers, -- Yann Dupont - Service IRTS, DSI Université de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : yann.dup...@univ-nantes.fr -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
On Tue, Jul 10, 2012 at 10:36 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: Fundamentally, it comes down to this: the two clusters will still have the same fsid, and you won't be isolated from configuration errors or (CEPH-PROD is the old btrfs volume ). /CEPH is new xfs volume, completely redone reformatted with mkcephfs. The volumes are totally independant : Ahh you re-created the monitors too. That changes things, then you have a new random fsid. I understood you only re-mkfsed the osd. Doing it like that, your real worry is just the remembered state of monmaps, osdmaps etc. If the daemons accidentally talk to the wrong cluster, the fsid *should* protect you from damage; they should get rejected. Similarly, if you use cephx authentication, the keys won't match either. Naturally, this practically means giving up on high availability. The idea is not just having 2 mon. I'll still use 3 differents machines for mon, but with 2 mon instance on each. One for the current ceph, the other for the old ceph. 2x3 Mon. That should be perfectly doable. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
Can you restart the node that failed to complete the upgrade with debug filestore = 20 debug osd = 20 and post the log after an hour or so of running? The upgrade process might legitimately take a while. -Sam On Sat, Jul 7, 2012 at 1:19 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: Le 06/07/2012 19:01, Gregory Farnum a écrit : On Fri, Jul 6, 2012 at 12:19 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: Le 05/07/2012 23:32, Gregory Farnum a écrit : [...] ok, so as all nodes were identical, I probably have hit a btrfs bug (like a erroneous out of space ) in more or less the same time. And when 1 osd was out, OH , I didn't finish the sentence... When 1 osd was out, missing data was copied on another nodes, probably speeding btrfs problem on those nodes (I suspect erroneous out of space conditions) Ah. How full are/were the disks? The OSD nodes were below 50 % (all are 5 To volumes): osd.0 : 31% osd.1 : 31% osd.2 : 39% osd.3 : 65% no osd.4 :) osd.5 : 35% osd.6 : 60% osd.7 : 42% osd.8 : 34% all the volumes were using btrfs with lzo compress. [...] Oh, interesting. Are the broken nodes all on the same set of arrays? No. There are 4 completely independant raid arrays, in 4 different locations. They are similar (same brand model, but slighltly different disks, and 1 different firmware), all arrays are multipathed. I don't think the raid array is the problem. We use those particular models since 2/3 years, and in the logs I don't see any problem that can be caused by the storage itself (like scsi or multipath errors) I must have misunderstood then. What did you mean by 1 Array for 2 OSD nodes? I have 8 osd nodes, in 4 different locations (several km away). In each location I have 2 nodes and 1 raid Array. On each location, each raid array has 16 2To disks, 2 controllers with 4x 8 Gb FC channels each. The 16 disks are organized in Raid 5 (8 disks for one, 7 disks for the orher). Each raid set is primary attached to 1 controller, and each osd node on the location has acces to the controller with 2 distinct paths. There were no correlation between failed nodes raid array. Cheers, -- Yann Dupont - Service IRTS, DSI Université de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : yann.dup...@univ-nantes.fr -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
On Wed, Jul 4, 2012 at 1:06 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: Well, I probably wasn't clear enough. I talked about crashed FS, but i was talking about ceph. The underlying FS (btrfs in that case) of 1 node (and only one) has PROBABLY crashed in the past, causing corruption in ceph data on this node, and then the subsequent crash of other nodes. RIGHT now btrfs on this node is OK. I can access the filesystem without errors. But the LevelDB isn't. It's contents got corrupted, somehow somewhere, and it really is up to the LevelDB library to tolerate those errors; we have a simple get/put interface we use, and LevelDB is triggering an internal error. One node had problem with btrfs, leading first to kernel problem , probably corruption (in disk/ in memory maybe ?) ,and ultimately to a kernel oops. Before that ultimate kernel oops, bad data has been transmitted to other (sane) nodes, leading to ceph-osd crash on thoses nodes. The LevelDB binary contents are not transferred over to other nodes; this kind of corruption would not spread over the Ceph clustering mechanisms. It's more likely that you have 4 independently corrupted LevelDBs. Something in the workload Ceph runs makes that corruption quite likely. The information here isn't enough to say whether the cause of the corruption is btrfs or LevelDB, but the recovery needs to handled by LevelDB -- and upstream is working on making it more robust: http://code.google.com/p/leveldb/issues/detail?id=97 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
Le 09/07/2012 19:43, Tommi Virtanen a écrit : On Wed, Jul 4, 2012 at 1:06 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: Well, I probably wasn't clear enough. I talked about crashed FS, but i was talking about ceph. The underlying FS (btrfs in that case) of 1 node (and only one) has PROBABLY crashed in the past, causing corruption in ceph data on this node, and then the subsequent crash of other nodes. RIGHT now btrfs on this node is OK. I can access the filesystem without errors. But the LevelDB isn't. It's contents got corrupted, somehow somewhere, and it really is up to the LevelDB library to tolerate those errors; we have a simple get/put interface we use, and LevelDB is triggering an internal error. Yes, understood. One node had problem with btrfs, leading first to kernel problem , probably corruption (in disk/ in memory maybe ?) ,and ultimately to a kernel oops. Before that ultimate kernel oops, bad data has been transmitted to other (sane) nodes, leading to ceph-osd crash on thoses nodes. The LevelDB binary contents are not transferred over to other nodes; Ok thanks for the clarification ; this kind of corruption would not spread over the Ceph clustering mechanisms. It's more likely that you have 4 independently corrupted LevelDBs. Something in the workload Ceph runs makes that corruption quite likely. Very likely : since I reformatted my nodes with XFS I don't have problems so far. The information here isn't enough to say whether the cause of the corruption is btrfs or LevelDB, but the recovery needs to handled by LevelDB -- and upstream is working on making it more robust: http://code.google.com/p/leveldb/issues/detail?id=97 Yes, saw this. It's very important. Sometimes, s... happens. In respect to the size ceph volumes can reach, having a tool to restart damaged nodes (for whatever reason) is a must. Thanks for the time you took to answer. It's much clearer for me now. Cheers, -- Yann Dupont - Service IRTS, DSI Université de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : yann.dup...@univ-nantes.fr -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
On Mon, Jul 9, 2012 at 12:05 PM, Yann Dupont yann.dup...@univ-nantes.fr wrote: The information here isn't enough to say whether the cause of the corruption is btrfs or LevelDB, but the recovery needs to handled by LevelDB -- and upstream is working on making it more robust: http://code.google.com/p/leveldb/issues/detail?id=97 Yes, saw this. It's very important. Sometimes, s... happens. In respect to the size ceph volumes can reach, having a tool to restart damaged nodes (for whatever reason) is a must. Thanks for the time you took to answer. It's much clearer for me now. If it doesn't recover, you re-format the disk and thereby throw away the contents. Not really all that different from handling hardware failure. That's why we have replication. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
Le 06/07/2012 19:01, Gregory Farnum a écrit : On Fri, Jul 6, 2012 at 12:19 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: Le 05/07/2012 23:32, Gregory Farnum a écrit : [...] ok, so as all nodes were identical, I probably have hit a btrfs bug (like a erroneous out of space ) in more or less the same time. And when 1 osd was out, OH , I didn't finish the sentence... When 1 osd was out, missing data was copied on another nodes, probably speeding btrfs problem on those nodes (I suspect erroneous out of space conditions) Ah. How full are/were the disks? The OSD nodes were below 50 % (all are 5 To volumes): osd.0 : 31% osd.1 : 31% osd.2 : 39% osd.3 : 65% no osd.4 :) osd.5 : 35% osd.6 : 60% osd.7 : 42% osd.8 : 34% all the volumes were using btrfs with lzo compress. [...] Oh, interesting. Are the broken nodes all on the same set of arrays? No. There are 4 completely independant raid arrays, in 4 different locations. They are similar (same brand model, but slighltly different disks, and 1 different firmware), all arrays are multipathed. I don't think the raid array is the problem. We use those particular models since 2/3 years, and in the logs I don't see any problem that can be caused by the storage itself (like scsi or multipath errors) I must have misunderstood then. What did you mean by 1 Array for 2 OSD nodes? I have 8 osd nodes, in 4 different locations (several km away). In each location I have 2 nodes and 1 raid Array. On each location, each raid array has 16 2To disks, 2 controllers with 4x 8 Gb FC channels each. The 16 disks are organized in Raid 5 (8 disks for one, 7 disks for the orher). Each raid set is primary attached to 1 controller, and each osd node on the location has acces to the controller with 2 distinct paths. There were no correlation between failed nodes raid array. Cheers, -- Yann Dupont - Service IRTS, DSI Université de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : yann.dup...@univ-nantes.fr -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
Le 05/07/2012 23:32, Gregory Farnum a écrit : [...] ok, so as all nodes were identical, I probably have hit a btrfs bug (like a erroneous out of space ) in more or less the same time. And when 1 osd was out, OH , I didn't finish the sentence... When 1 osd was out, missing data was copied on another nodes, probably speeding btrfs problem on those nodes (I suspect erroneous out of space conditions) I've reformatted OSD with xfs. Performance is slightly worse for the moment (well, depend on the workload, and maybe lack of syncfs is to blame), but at least I hope to have the storage layer rock-solid. BTW, I've managed to keep the faulty btrfs volumes . [...] I wonder if maybe there's a confounding factor here — are all your nodes similar to each other, Yes. I designed the cluster that way. All nodes are identical hardware (powerEdge M610, 10G intel ethernet + emulex fibre channel attached to storage (1 Array for 2 OSD nodes, 1 controller dedicated for each OSD) Oh, interesting. Are the broken nodes all on the same set of arrays? No. There are 4 completely independant raid arrays, in 4 different locations. They are similar (same brand model, but slighltly different disks, and 1 different firmware), all arrays are multipathed. I don't think the raid array is the problem. We use those particular models since 2/3 years, and in the logs I don't see any problem that can be caused by the storage itself (like scsi or multipath errors) Cheers, -- Yann Dupont - Service IRTS, DSI Université de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : yann.dup...@univ-nantes.fr -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
On Fri, Jul 6, 2012 at 12:19 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: Le 05/07/2012 23:32, Gregory Farnum a écrit : [...] ok, so as all nodes were identical, I probably have hit a btrfs bug (like a erroneous out of space ) in more or less the same time. And when 1 osd was out, OH , I didn't finish the sentence... When 1 osd was out, missing data was copied on another nodes, probably speeding btrfs problem on those nodes (I suspect erroneous out of space conditions) Ah. How full are/were the disks? I've reformatted OSD with xfs. Performance is slightly worse for the moment (well, depend on the workload, and maybe lack of syncfs is to blame), but at least I hope to have the storage layer rock-solid. BTW, I've managed to keep the faulty btrfs volumes . [...] I wonder if maybe there's a confounding factor here — are all your nodes similar to each other, Yes. I designed the cluster that way. All nodes are identical hardware (powerEdge M610, 10G intel ethernet + emulex fibre channel attached to storage (1 Array for 2 OSD nodes, 1 controller dedicated for each OSD) Oh, interesting. Are the broken nodes all on the same set of arrays? No. There are 4 completely independant raid arrays, in 4 different locations. They are similar (same brand model, but slighltly different disks, and 1 different firmware), all arrays are multipathed. I don't think the raid array is the problem. We use those particular models since 2/3 years, and in the logs I don't see any problem that can be caused by the storage itself (like scsi or multipath errors) I must have misunderstood then. What did you mean by 1 Array for 2 OSD nodes? -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
On Wed, Jul 4, 2012 at 10:53 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: Le 04/07/2012 18:21, Gregory Farnum a écrit : On Wednesday, July 4, 2012 at 1:06 AM, Yann Dupont wrote: Le 03/07/2012 23:38, Tommi Virtanen a écrit : On Tue, Jul 3, 2012 at 1:54 PM, Yann Dupont yann.dup...@univ-nantes.fr (mailto:yann.dup...@univ-nantes.fr) wrote: In the case I could repair, do you think a crashed FS as it is right now is valuable for you, for future reference , as I saw you can't reproduce the problem ? I can make an archive (or a btrfs dump ?), but it will be quite big. At this point, it's more about the upstream developers (of btrfs etc) than us; we're on good terms with them but not experts on the on-disk format(s). You might want to send an email to the relevant mailing lists before wiping the disks. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org (mailto:majord...@vger.kernel.org) More majordomo info at http://vger.kernel.org/majordomo-info.html Well, I probably wasn't clear enough. I talked about crashed FS, but i was talking about ceph. The underlying FS (btrfs in that case) of 1 node (and only one) has PROBABLY crashed in the past, causing corruption in ceph data on this node, and then the subsequent crash of other nodes. RIGHT now btrfs on this node is OK. I can access the filesystem without errors. For the moment, on 8 nodes, 4 refuse to restart . 1 of the 4 nodes was the crashed node , the 3 others didn't had broblem with the underlying fs as far as I can tell. So I think the scenario is : One node had problem with btrfs, leading first to kernel problem , probably corruption (in disk/ in memory maybe ?) ,and ultimately to a kernel oops. Before that ultimate kernel oops, bad data has been transmitted to other (sane) nodes, leading to ceph-osd crash on thoses nodes. I don't think that's actually possible — the OSDs all do quite a lot of interpretation between what they get off the wire and what goes on disk. What you've got here are 4 corrupted LevelDB databases, and we pretty much can't do that through the interfaces we have. :/ ok, so as all nodes were identical, I probably have hit a btrfs bug (like a erroneous out of space ) in more or less the same time. And when 1 osd was out, If you think this scenario is highly improbable in real life (that is, btrfs will probably be fixed for good, and then, corruption can't happen), it's ok. But I wonder if this scenario can be triggered with other problem, and bad data can be transmitted to other sane nodes (power outage, out of memory condition, disk full... for example) That's why I proposed you a crashed ceph volume image (I shouldn't have talked about a crashed fs, sorry for the confusion) I appreciate the offer, but I don't think this will help much — it's a disk state managed by somebody else, not our logical state, which has broken. If we could figure out how that state got broken that'd be good, but a ceph image won't really help in doing so. ok, no problem. I'll restart from scratch, freshly formated. I wonder if maybe there's a confounding factor here — are all your nodes similar to each other, Yes. I designed the cluster that way. All nodes are identical hardware (powerEdge M610, 10G intel ethernet + emulex fibre channel attached to storage (1 Array for 2 OSD nodes, 1 controller dedicated for each OSD) Oh, interesting. Are the broken nodes all on the same set of arrays? or are they running on different kinds of hardware? How did you do your Ceph upgrades? What's ceph -s display when the cluster is running as best it can? Ceph was running 0.47.2 at that time - (debian package for ceph). After the crash I couldn't restart all the nodes. Tried 0.47.3 and now 0.48 without success. Nothing particular for upgrades, because for the moment ceph is broken, so just apt-get upgrade with new version. ceph -s show that : root@label5:~# ceph -s health HEALTH_WARN 260 pgs degraded; 793 pgs down; 785 pgs peering; 32 pgs recovering; 96 pgs stale; 793 pgs stuck inactive; 96 pgs stuck stale; 1092 pgs stuck unclean; recovery 267286/2491140 degraded (10.729%); 1814/1245570 unfound (0.146%) monmap e1: 3 mons at {chichibu=172.20.14.130:6789/0,glenesk=172.20.14.131:6789/0,karuizawa=172.20.14.133:6789/0}, election epoch 12, quorum 0,1,2 chichibu,glenesk,karuizawa osdmap e2404: 8 osds: 3 up, 3 in pgmap v173701: 1728 pgs: 604 active+clean, 8 down, 5 active+recovering+remapped, 32 active+clean+replay, 11 active+recovering+degraded, 25 active+remapped, 710 down+peering, 222 active+degraded, 7 stale+active+recovering+degraded, 61 stale+down+peering, 20 stale+active+degraded, 6 down+remapped+peering, 8 stale+down+remapped+peering, 9 active+recovering; 4786 GB data, 7495 GB used, 7280 GB / 15360 GB avail; 267286/2491140 degraded (10.729%);
Re: domino-style OSD crash
Le 03/07/2012 23:38, Tommi Virtanen a écrit : On Tue, Jul 3, 2012 at 1:54 PM, Yann Dupont yann.dup...@univ-nantes.fr wrote: In the case I could repair, do you think a crashed FS as it is right now is valuable for you, for future reference , as I saw you can't reproduce the problem ? I can make an archive (or a btrfs dump ?), but it will be quite big. At this point, it's more about the upstream developers (of btrfs etc) than us; we're on good terms with them but not experts on the on-disk format(s). You might want to send an email to the relevant mailing lists before wiping the disks. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Well, I probably wasn't clear enough. I talked about crashed FS, but i was talking about ceph. The underlying FS (btrfs in that case) of 1 node (and only one) has PROBABLY crashed in the past, causing corruption in ceph data on this node, and then the subsequent crash of other nodes. RIGHT now btrfs on this node is OK. I can access the filesystem without errors. For the moment, on 8 nodes, 4 refuse to restart . 1 of the 4 nodes was the crashed node , the 3 others didn't had broblem with the underlying fs as far as I can tell. So I think the scenario is : One node had problem with btrfs, leading first to kernel problem , probably corruption (in disk/ in memory maybe ?) ,and ultimately to a kernel oops. Before that ultimate kernel oops, bad data has been transmitted to other (sane) nodes, leading to ceph-osd crash on thoses nodes. If you think this scenario is highly improbable in real life (that is, btrfs will probably be fixed for good, and then, corruption can't happen), it's ok. But I wonder if this scenario can be triggered with other problem, and bad data can be transmitted to other sane nodes (power outage, out of memory condition, disk full... for example) That's why I proposed you a crashed ceph volume image (I shouldn't have talked about a crashed fs, sorry for the confusion) Talking about btrfs, there is a lot of fixes in btrfs between 3.4 and 3.5rc. After the crash, I couldn't mount the btrfs volume. With 3.5rc I can , and there is no sign of problem on it. It does'nt mean data is safe there, but i think it's a sign that at least, some bugs have been corrected in btrfs code. Cheers, -- Yann Dupont - Service IRTS, DSI Université de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : yann.dup...@univ-nantes.fr -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
On Wednesday, July 4, 2012 at 1:06 AM, Yann Dupont wrote: Le 03/07/2012 23:38, Tommi Virtanen a écrit : On Tue, Jul 3, 2012 at 1:54 PM, Yann Dupont yann.dup...@univ-nantes.fr (mailto:yann.dup...@univ-nantes.fr) wrote: In the case I could repair, do you think a crashed FS as it is right now is valuable for you, for future reference , as I saw you can't reproduce the problem ? I can make an archive (or a btrfs dump ?), but it will be quite big. At this point, it's more about the upstream developers (of btrfs etc) than us; we're on good terms with them but not experts on the on-disk format(s). You might want to send an email to the relevant mailing lists before wiping the disks. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org (mailto:majord...@vger.kernel.org) More majordomo info at http://vger.kernel.org/majordomo-info.html Well, I probably wasn't clear enough. I talked about crashed FS, but i was talking about ceph. The underlying FS (btrfs in that case) of 1 node (and only one) has PROBABLY crashed in the past, causing corruption in ceph data on this node, and then the subsequent crash of other nodes. RIGHT now btrfs on this node is OK. I can access the filesystem without errors. For the moment, on 8 nodes, 4 refuse to restart . 1 of the 4 nodes was the crashed node , the 3 others didn't had broblem with the underlying fs as far as I can tell. So I think the scenario is : One node had problem with btrfs, leading first to kernel problem , probably corruption (in disk/ in memory maybe ?) ,and ultimately to a kernel oops. Before that ultimate kernel oops, bad data has been transmitted to other (sane) nodes, leading to ceph-osd crash on thoses nodes. I don't think that's actually possible — the OSDs all do quite a lot of interpretation between what they get off the wire and what goes on disk. What you've got here are 4 corrupted LevelDB databases, and we pretty much can't do that through the interfaces we have. :/ If you think this scenario is highly improbable in real life (that is, btrfs will probably be fixed for good, and then, corruption can't happen), it's ok. But I wonder if this scenario can be triggered with other problem, and bad data can be transmitted to other sane nodes (power outage, out of memory condition, disk full... for example) That's why I proposed you a crashed ceph volume image (I shouldn't have talked about a crashed fs, sorry for the confusion) I appreciate the offer, but I don't think this will help much — it's a disk state managed by somebody else, not our logical state, which has broken. If we could figure out how that state got broken that'd be good, but a ceph image won't really help in doing so. I wonder if maybe there's a confounding factor here — are all your nodes similar to each other, or are they running on different kinds of hardware? How did you do your Ceph upgrades? What's ceph -s display when the cluster is running as best it can? -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
Le 04/07/2012 18:21, Gregory Farnum a écrit : On Wednesday, July 4, 2012 at 1:06 AM, Yann Dupont wrote: Le 03/07/2012 23:38, Tommi Virtanen a écrit : On Tue, Jul 3, 2012 at 1:54 PM, Yann Dupont yann.dup...@univ-nantes.fr (mailto:yann.dup...@univ-nantes.fr) wrote: In the case I could repair, do you think a crashed FS as it is right now is valuable for you, for future reference , as I saw you can't reproduce the problem ? I can make an archive (or a btrfs dump ?), but it will be quite big. At this point, it's more about the upstream developers (of btrfs etc) than us; we're on good terms with them but not experts on the on-disk format(s). You might want to send an email to the relevant mailing lists before wiping the disks. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org (mailto:majord...@vger.kernel.org) More majordomo info at http://vger.kernel.org/majordomo-info.html Well, I probably wasn't clear enough. I talked about crashed FS, but i was talking about ceph. The underlying FS (btrfs in that case) of 1 node (and only one) has PROBABLY crashed in the past, causing corruption in ceph data on this node, and then the subsequent crash of other nodes. RIGHT now btrfs on this node is OK. I can access the filesystem without errors. For the moment, on 8 nodes, 4 refuse to restart . 1 of the 4 nodes was the crashed node , the 3 others didn't had broblem with the underlying fs as far as I can tell. So I think the scenario is : One node had problem with btrfs, leading first to kernel problem , probably corruption (in disk/ in memory maybe ?) ,and ultimately to a kernel oops. Before that ultimate kernel oops, bad data has been transmitted to other (sane) nodes, leading to ceph-osd crash on thoses nodes. I don't think that's actually possible — the OSDs all do quite a lot of interpretation between what they get off the wire and what goes on disk. What you've got here are 4 corrupted LevelDB databases, and we pretty much can't do that through the interfaces we have. :/ ok, so as all nodes were identical, I probably have hit a btrfs bug (like a erroneous out of space ) in more or less the same time. And when 1 osd was out, If you think this scenario is highly improbable in real life (that is, btrfs will probably be fixed for good, and then, corruption can't happen), it's ok. But I wonder if this scenario can be triggered with other problem, and bad data can be transmitted to other sane nodes (power outage, out of memory condition, disk full... for example) That's why I proposed you a crashed ceph volume image (I shouldn't have talked about a crashed fs, sorry for the confusion) I appreciate the offer, but I don't think this will help much — it's a disk state managed by somebody else, not our logical state, which has broken. If we could figure out how that state got broken that'd be good, but a ceph image won't really help in doing so. ok, no problem. I'll restart from scratch, freshly formated. I wonder if maybe there's a confounding factor here — are all your nodes similar to each other, Yes. I designed the cluster that way. All nodes are identical hardware (powerEdge M610, 10G intel ethernet + emulex fibre channel attached to storage (1 Array for 2 OSD nodes, 1 controller dedicated for each OSD) or are they running on different kinds of hardware? How did you do your Ceph upgrades? What's ceph -s display when the cluster is running as best it can? Ceph was running 0.47.2 at that time - (debian package for ceph). After the crash I couldn't restart all the nodes. Tried 0.47.3 and now 0.48 without success. Nothing particular for upgrades, because for the moment ceph is broken, so just apt-get upgrade with new version. ceph -s show that : root@label5:~# ceph -s health HEALTH_WARN 260 pgs degraded; 793 pgs down; 785 pgs peering; 32 pgs recovering; 96 pgs stale; 793 pgs stuck inactive; 96 pgs stuck stale; 1092 pgs stuck unclean; recovery 267286/2491140 degraded (10.729%); 1814/1245570 unfound (0.146%) monmap e1: 3 mons at {chichibu=172.20.14.130:6789/0,glenesk=172.20.14.131:6789/0,karuizawa=172.20.14.133:6789/0}, election epoch 12, quorum 0,1,2 chichibu,glenesk,karuizawa osdmap e2404: 8 osds: 3 up, 3 in pgmap v173701: 1728 pgs: 604 active+clean, 8 down, 5 active+recovering+remapped, 32 active+clean+replay, 11 active+recovering+degraded, 25 active+remapped, 710 down+peering, 222 active+degraded, 7 stale+active+recovering+degraded, 61 stale+down+peering, 20 stale+active+degraded, 6 down+remapped+peering, 8 stale+down+remapped+peering, 9 active+recovering; 4786 GB data, 7495 GB used, 7280 GB / 15360 GB avail; 267286/2491140 degraded (10.729%); 1814/1245570 unfound (0.146%) mdsmap e172: 1/1/1 up {0=karuizawa=up:replay}, 2 up:standby BTW, After the 0.48 upgrade, there was a disk format conversion. 1 of the 4 surviving OSD didn't
Re: domino-style OSD crash
On Tue, Jul 3, 2012 at 1:40 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: Upgraded the kernel to 3.5.0-rc4 + some patches, seems btrfs is OK right now. Tried to restart osd with 0.47.3, then next branch, and today with 0.48. 4 of 8 nodes fails with the same message : ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030) 1: /usr/bin/ceph-osd() [0x701929] ... 13: (leveldb::InternalKeyComparator::FindShortestSeparator(std::string*, leveldb::Slice const) const+0x4d) [0x6e811d] That looks like http://tracker.newdream.net/issues/2563 and the best we have for that ticket is looks like you have a corrupted leveldb file. Is this reproducible with a freshly mkfs'ed data partition? -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
Le 03/07/2012 21:42, Tommi Virtanen a écrit : On Tue, Jul 3, 2012 at 1:40 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: Upgraded the kernel to 3.5.0-rc4 + some patches, seems btrfs is OK right now. Tried to restart osd with 0.47.3, then next branch, and today with 0.48. 4 of 8 nodes fails with the same message : ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030) 1: /usr/bin/ceph-osd() [0x701929] ... 13: (leveldb::InternalKeyComparator::FindShortestSeparator(std::string*, leveldb::Slice const) const+0x4d) [0x6e811d] That looks like http://tracker.newdream.net/issues/2563 and the best we have for that ticket is looks like you have a corrupted leveldb file. Is this reproducible with a freshly mkfs'ed data partition? Probably not. I have multiple data volumes on each nodes (I was planning xfs vs ext4 vs btrfs benchmarks before being ill) and thoses nodes start OK with another data partition . It's very probable that there is corruption somewhere, due to kernel bug , probably triggered by btrfs. Issue 2563 is probably the same. I'd like to restart those nodes without formatting them, not because the data is valuable, but because if the same thing happens in production, a method similar to fsck the node could be of great value. I saw the method to check the leveldb. Will try tomorrow without garantees. In the case I could repair, do you think a crashed FS as it is right now is valuable for you, for future reference , as I saw you can't reproduce the problem ? I can make an archive (or a btrfs dump ?), but it will be quite big. Cheers, -- Yann Dupont - Service IRTS, DSI Université de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : yann.dup...@univ-nantes.fr -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
On Tue, Jul 3, 2012 at 1:54 PM, Yann Dupont yann.dup...@univ-nantes.fr wrote: In the case I could repair, do you think a crashed FS as it is right now is valuable for you, for future reference , as I saw you can't reproduce the problem ? I can make an archive (or a btrfs dump ?), but it will be quite big. At this point, it's more about the upstream developers (of btrfs etc) than us; we're on good terms with them but not experts on the on-disk format(s). You might want to send an email to the relevant mailing lists before wiping the disks. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
On Mon, Jun 4, 2012 at 1:44 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: Results : Worked like a charm during two days, apart btrfs warn messages then OSD begin to crash 1 after all 'domino style'. Sorry to hear that. Reading through your message, there seem to be several problems; whether they are because of the same root cause, I can't tell. Quick triage to benefit the other devs: #1: kernel crash, no details available 1 of the physical machine was in kernel oops state - Nothing was remote #2: leveldb corruption? may be memory corruption that started elsewhere.. Sam, does this look like the leveldb issue you saw? [push] v 1438'9416 snapset=0=[]:[] snapc=0=[]) v6 currently started 0 2012-06-03 12:55:33.088034 7ff1237f6700 -1 *** Caught signal (Aborted) ** ... 13: (leveldb::InternalKeyComparator::FindShortestSeparator(std::string*, leveldb::Slice const) const+0x4d) [0x6ef69d] 14: (leveldb::TableBuilder::Add(leveldb::Slice const, leveldb::Slice const)+0x9f) [0x6fdd9f] #3: PG::merge_log assertion while recovering from the above; Sam, any ideas? 0 2012-06-03 13:36:48.147020 7f74f58b6700 -1 osd/PG.cc: In function 'void PG::merge_log(ObjectStore::Transaction, pg_info_t, pg_log_t, int)' thread 7f74f58b6700 time 2012-06-03 13:36:48.100157 osd/PG.cc: 402: FAILED assert(log.head = olog.tail olog.head = log.tail) #4: unknown btrfs warnings, there should an actual message above this traceback; believed fixed in latest kernel Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479278] [a026fca5] ? btrfs_orphan_commit_root+0x105/0x110 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479328] [a026965a] ? commit_fs_roots.isra.22+0xaa/0x170 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479379] [a02bc9a0] ? btrfs_scrub_pause+0xf0/0x100 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479415] [a026a6f1] ? btrfs_commit_transaction+0x521/0x9d0 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479460] [8105a9f0] ? add_wait_queue+0x60/0x60 Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479493] [a026aba0] ? btrfs_commit_transaction+0x9d0/0x9d0 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479543] [a026abb1] ? do_async_commit+0x11/0x20 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479572] -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
Can you send the osd logs? The merge_log crashes are probably fixable if I can see the logs. The leveldb crash is almost certainly a result of memory corruption. Thanks -Sam On Mon, Jun 4, 2012 at 9:16 AM, Tommi Virtanen t...@inktank.com wrote: On Mon, Jun 4, 2012 at 1:44 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: Results : Worked like a charm during two days, apart btrfs warn messages then OSD begin to crash 1 after all 'domino style'. Sorry to hear that. Reading through your message, there seem to be several problems; whether they are because of the same root cause, I can't tell. Quick triage to benefit the other devs: #1: kernel crash, no details available 1 of the physical machine was in kernel oops state - Nothing was remote #2: leveldb corruption? may be memory corruption that started elsewhere.. Sam, does this look like the leveldb issue you saw? [push] v 1438'9416 snapset=0=[]:[] snapc=0=[]) v6 currently started 0 2012-06-03 12:55:33.088034 7ff1237f6700 -1 *** Caught signal (Aborted) ** ... 13: (leveldb::InternalKeyComparator::FindShortestSeparator(std::string*, leveldb::Slice const) const+0x4d) [0x6ef69d] 14: (leveldb::TableBuilder::Add(leveldb::Slice const, leveldb::Slice const)+0x9f) [0x6fdd9f] #3: PG::merge_log assertion while recovering from the above; Sam, any ideas? 0 2012-06-03 13:36:48.147020 7f74f58b6700 -1 osd/PG.cc: In function 'void PG::merge_log(ObjectStore::Transaction, pg_info_t, pg_log_t, int)' thread 7f74f58b6700 time 2012-06-03 13:36:48.100157 osd/PG.cc: 402: FAILED assert(log.head = olog.tail olog.head = log.tail) #4: unknown btrfs warnings, there should an actual message above this traceback; believed fixed in latest kernel Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479278] [a026fca5] ? btrfs_orphan_commit_root+0x105/0x110 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479328] [a026965a] ? commit_fs_roots.isra.22+0xaa/0x170 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479379] [a02bc9a0] ? btrfs_scrub_pause+0xf0/0x100 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479415] [a026a6f1] ? btrfs_commit_transaction+0x521/0x9d0 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479460] [8105a9f0] ? add_wait_queue+0x60/0x60 Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479493] [a026aba0] ? btrfs_commit_transaction+0x9d0/0x9d0 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479543] [a026abb1] ? do_async_commit+0x11/0x20 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479572] -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
This is probably the same/similar to http://tracker.newdream.net/issues/2462, no? There's a log there, though I've no idea how helpful it is. On Monday, June 4, 2012 at 10:40 AM, Sam Just wrote: Can you send the osd logs? The merge_log crashes are probably fixable if I can see the logs. The leveldb crash is almost certainly a result of memory corruption. Thanks -Sam On Mon, Jun 4, 2012 at 9:16 AM, Tommi Virtanen t...@inktank.com (mailto:t...@inktank.com) wrote: On Mon, Jun 4, 2012 at 1:44 AM, Yann Dupont yann.dup...@univ-nantes.fr (mailto:yann.dup...@univ-nantes.fr) wrote: Results : Worked like a charm during two days, apart btrfs warn messages then OSD begin to crash 1 after all 'domino style'. Sorry to hear that. Reading through your message, there seem to be several problems; whether they are because of the same root cause, I can't tell. Quick triage to benefit the other devs: #1: kernel crash, no details available 1 of the physical machine was in kernel oops state - Nothing was remote #2: leveldb corruption? may be memory corruption that started elsewhere.. Sam, does this look like the leveldb issue you saw? [push] v 1438'9416 snapset=0=[]:[] snapc=0=[]) v6 currently started 0 2012-06-03 12:55:33.088034 7ff1237f6700 -1 *** Caught signal (Aborted) ** ... 13: (leveldb::InternalKeyComparator::FindShortestSeparator(std::string*, leveldb::Slice const) const+0x4d) [0x6ef69d] 14: (leveldb::TableBuilder::Add(leveldb::Slice const, leveldb::Slice const)+0x9f) [0x6fdd9f] #3: PG::merge_log assertion while recovering from the above; Sam, any ideas? 0 2012-06-03 13:36:48.147020 7f74f58b6700 -1 osd/PG.cc (http://PG.cc): In function 'void PG::merge_log(ObjectStore::Transaction, pg_info_t, pg_log_t, int)' thread 7f74f58b6700 time 2012-06-03 13:36:48.100157 osd/PG.cc (http://PG.cc): 402: FAILED assert(log.head = olog.tail olog.head = log.tail) #4: unknown btrfs warnings, there should an actual message above this traceback; believed fixed in latest kernel Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479278] [a026fca5] ? btrfs_orphan_commit_root+0x105/0x110 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479328] [a026965a] ? commit_fs_roots.isra.22+0xaa/0x170 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479379] [a02bc9a0] ? btrfs_scrub_pause+0xf0/0x100 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479415] [a026a6f1] ? btrfs_commit_transaction+0x521/0x9d0 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479460] [8105a9f0] ? add_wait_queue+0x60/0x60 Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479493] [a026aba0] ? btrfs_commit_transaction+0x9d0/0x9d0 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479543] [a026abb1] ? do_async_commit+0x11/0x20 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479572] -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org (mailto:majord...@vger.kernel.org) More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org (mailto:majord...@vger.kernel.org) More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html