Strange behavior after upgrading to 0.48
Hi, I put up a small cluster with 3 osds, 2 mds, 3 mons, on 3 machines. They were running 0.47.2, and this is a test to do rolling upgrade to 0.48. I shutdown, upgraded the software, then restarted. One node at a time. The first two seemed to be ok. The third one gave me some weird thing. While it was doing the conversion and recovering, the command ceph -s gives things like this: root@china:/tmp# ceph -s 2012-07-05 14:28:41.069470 7fa3c8443780 2 auth: KeyRing::load: loaded key file /etc/ceph/client.admin.keyring 2012-07-05 14:28:41.594229 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.596313 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.598949 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.601158 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.603069 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.605020 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.607436 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.609304 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.611047 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.667980 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.670283 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.672274 7fa3c030e700 0 monclient: hunting for new mon And it never stopped. I was thinking, maybe it just behaved like that during recovery. But after the recovery is done, it still get the same thing: root@china:/tmp# ceph health 2012-07-05 14:28:55.077364 7f8306a0d780 2 auth: KeyRing::load: loaded key file /etc/ceph/client.admin.keyring HEALTH_OK root@china:/tmp# ceph -s 2012-07-05 14:30:49.688017 7feb6338e780 2 auth: KeyRing::load: loaded key file /etc/ceph/client.admin.keyring 2012-07-05 14:30:49.691690 7feb5b259700 0 monclient: hunting for new mon 2012-07-05 14:30:49.694295 7feb5b259700 0 monclient: hunting for new mon 2012-07-05 14:30:49.696487 7feb5b259700 0 monclient: hunting for new mon 2012-07-05 14:30:49.698953 7feb5b259700 0 monclient: hunting for new mon 2012-07-05 14:30:49.700833 7feb5b259700 0 monclient: hunting for new mon Upgrading the first two nodes have no such problem. This first two nodes all run osd, mds, and mon. The third only runs osd and mon. The mon log on the 3rd node shows this, not sure if this is helpful: 925291 lease_expire=2012-07-05 02:38:14.149966 has v44 lc 44 2012-07-05 02:38:12.572107 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap active c 29531..30031) is_readable now=2012-07-05 02:38:12.572114 lease_expire=2012-07-05 02:38:15.889056 has v0 lc 30031 2012-07-05 02:38:12.572128 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap active c 29531..30031) is_readable now=2012-07-05 02:38:12.572129 lease_expire=2012-07-05 02:38:15.889056 has v0 lc 30031 2012-07-05 02:38:15.120439 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap active c 1..44) is_readable now=2012-07-05 02:38:15.120446 lease_expire=2012-07-05 02:38:17.149967 has v44 lc 44 2012-07-05 02:38:15.925349 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap active c 1..44) is_readable now=2012-07-05 02:38:15.925356 lease_expire=2012-07-05 02:38:20.149971 has v44 lc 44 2012-07-05 02:38:17.572181 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap active c 29531..30031) is_readable now=2012-07-05 02:38:17.572189 lease_expire=2012-07-05 02:38:21.889065 has v0 lc 30031 2012-07-05 02:38:17.572204 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap active c 29531..30031) is_readable now=2012-07-05 02:38:17.572205 lease_expire=2012-07-05 02:38:21.889065 has v0 lc 30031 2012-07-05 02:38:19.120463 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap active c 1..44) is_readable now=2012-07-05 02:38:19.120470 lease_expire=2012-07-05 02:38:23.149973 has v44 lc 44 2012-07-05 02:38:19.925323 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap active c 1..44) is_readable now=2012-07-05 02:38:19.925330 lease_expire=2012-07-05 02:38:23.149973 has v44 lc 44 Could someone give a hint on this? Thanks Xiaopong -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange behavior after upgrading to 0.48
When I run the command ceph -s, I see the following information on the mon log: 2012-07-05 02:44:13.298942 7f7d92b14700 0 can't decode unknown message type 54 MSG_AUTH=17 2012-07-05 02:44:13.301588 7f7d9401b700 1 mon.a@0(leader).paxos(auth active c 412..432) is_readable now=2012-07-05 02:44:13.301590 lease_expire=2012-07-05 02:44:17.566529 has v0 lc 432 2012-07-05 02:44:13.302113 7f7d9401b700 1 mon.a@0(leader).paxos(auth active c 412..432) is_readable now=2012-07-05 02:44:13.302114 lease_expire=2012-07-05 02:44:17.566529 has v0 lc 432 2012-07-05 02:44:13.303072 7f7d92b14700 0 can't decode unknown message type 54 MSG_AUTH=17 2012-07-05 02:44:13.309450 7f7d9401b700 1 mon.a@0(leader).paxos(auth active c 412..432) is_readable now=2012-07-05 02:44:13.309452 lease_expire=2012-07-05 02:44:17.566529 has v0 lc 432 2012-07-05 02:44:13.309845 7f7d9401b700 1 mon.a@0(leader).paxos(auth active c 412..432) is_readable now=2012-07-05 02:44:13.309847 lease_expire=2012-07-05 02:44:17.566529 has v0 lc 432 Couldn't find any helpful information regarding can't decode error message, unless digging into the codes. Thanks for any hint. Xiaopong On 07/05/2012 02:41 PM, Xiaopong Tran wrote: Hi, I put up a small cluster with 3 osds, 2 mds, 3 mons, on 3 machines. They were running 0.47.2, and this is a test to do rolling upgrade to 0.48. I shutdown, upgraded the software, then restarted. One node at a time. The first two seemed to be ok. The third one gave me some weird thing. While it was doing the conversion and recovering, the command ceph -s gives things like this: root@china:/tmp# ceph -s 2012-07-05 14:28:41.069470 7fa3c8443780 2 auth: KeyRing::load: loaded key file /etc/ceph/client.admin.keyring 2012-07-05 14:28:41.594229 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.596313 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.598949 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.601158 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.603069 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.605020 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.607436 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.609304 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.611047 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.667980 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.670283 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.672274 7fa3c030e700 0 monclient: hunting for new mon And it never stopped. I was thinking, maybe it just behaved like that during recovery. But after the recovery is done, it still get the same thing: root@china:/tmp# ceph health 2012-07-05 14:28:55.077364 7f8306a0d780 2 auth: KeyRing::load: loaded key file /etc/ceph/client.admin.keyring HEALTH_OK root@china:/tmp# ceph -s 2012-07-05 14:30:49.688017 7feb6338e780 2 auth: KeyRing::load: loaded key file /etc/ceph/client.admin.keyring 2012-07-05 14:30:49.691690 7feb5b259700 0 monclient: hunting for new mon 2012-07-05 14:30:49.694295 7feb5b259700 0 monclient: hunting for new mon 2012-07-05 14:30:49.696487 7feb5b259700 0 monclient: hunting for new mon 2012-07-05 14:30:49.698953 7feb5b259700 0 monclient: hunting for new mon 2012-07-05 14:30:49.700833 7feb5b259700 0 monclient: hunting for new mon Upgrading the first two nodes have no such problem. This first two nodes all run osd, mds, and mon. The third only runs osd and mon. The mon log on the 3rd node shows this, not sure if this is helpful: 925291 lease_expire=2012-07-05 02:38:14.149966 has v44 lc 44 2012-07-05 02:38:12.572107 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap active c 29531..30031) is_readable now=2012-07-05 02:38:12.572114 lease_expire=2012-07-05 02:38:15.889056 has v0 lc 30031 2012-07-05 02:38:12.572128 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap active c 29531..30031) is_readable now=2012-07-05 02:38:12.572129 lease_expire=2012-07-05 02:38:15.889056 has v0 lc 30031 2012-07-05 02:38:15.120439 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap active c 1..44) is_readable now=2012-07-05 02:38:15.120446 lease_expire=2012-07-05 02:38:17.149967 has v44 lc 44 2012-07-05 02:38:15.925349 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap active c 1..44) is_readable now=2012-07-05 02:38:15.925356 lease_expire=2012-07-05 02:38:20.149971 has v44 lc 44 2012-07-05 02:38:17.572181 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap active c 29531..30031) is_readable now=2012-07-05 02:38:17.572189 lease_expire=2012-07-05 02:38:21.889065 has v0 lc 30031 2012-07-05 02:38:17.572204 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap active c 29531..30031) is_readable now=2012-07-05 02:38:17.572205 lease_expire=2012-07-05 02:38:21.889065 has v0 lc 30031 2012-07-05 02:38:19.120463 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap active c 1..44) is_readable now=2012-07-05 02:38:19.120470
Re: [PATCH] librados: Bump the version to 0.48
On 07/04/2012 06:33 PM, Sage Weil wrote: On Wed, 4 Jul 2012, Gregory Farnum wrote: Hmmm ÿÿ we generally try to modify these versions when the API changes, not on every sprint. It looks to me like Sage added one function in 0.45 where we maybe should have bumped it, but that was a long time ago and at this point we should maybe just eat it? Yeah, I went ahead and applied this to stable (argonaut) since it's as good a reference point as any. Moving forward, we should try to sync this up with API changes as they happen. Hmm, like that assert ObjectOperation that just went into master... That was my reasoning. I compiled phprados against 0.48 and saw that librados was reporting 0.44 as version. That could confuse users and they might think they still have an old library in place. Imho the version numbering should be totally different from Ceph if you only want to bump the version on an API change. Wido sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Allow URL-safe base64 cephx keys to be decoded.
In these cases + and / are replaced by - and _ to prevent problems when using the base64 strings in URLs. Signed-off-by: Wido den Hollander w...@widodh.nl --- src/common/armor.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/common/armor.c b/src/common/armor.c index d1d5664..e4b8b86 100644 --- a/src/common/armor.c +++ b/src/common/armor.c @@ -24,9 +24,9 @@ static int decode_bits(char c) return c - 'a' + 26; if (c = '0' c = '9') return c - '0' + 52; - if (c == '+') + if (c == '+' || c == '-') return 62; - if (c == '/') + if (c == '/' || c == '_') return 63; if (c == '=') return 0; /* just non-negative, please */ -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Generate URL-safe base64 strings for keys.
On 04-07-12 18:18, Sage Weil wrote: On Wed, 4 Jul 2012, Wido den Hollander wrote: On Wed, 4 Jul 2012, Wido den Hollander wrote: By using this we prevent scenarios where cephx keys are not accepted in various situations. Replacing the + and / by - and _ we generate URL-safe base64 keys Signed-off-by: Wido den Hollander w...@widodh.nl Do already properly decode URL-sage base64 encoding? Yes, it decodes URL-safe base64 as well. See the if statements for 62 and 63, + and - are treated equally, just like / and _. Oh, got it. The commit description confused me... I thought this was related encoding only. I think we should break the encode and decode patches into separate versions, and apply the decode to a stable branch (argonaut) and the encode to the master. That should avoid most problems with a rolling/staggered upgrade... I just submitted a patch for decoding only. During some tests I did I found out that libvirt uses GNUlib and won't handle URL-safe base64 encoded keys. So, as long as Ceph allows them we're good. Users can always replace the + and / in their key knowing it will be accepted by Ceph. This works for me for now. The exact switch to base64url should be done at a later stage I think. The RFC on this: http://tools.ietf.org/html/rfc4648#page-7 Wido sage Wido sage --- src/common/armor.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/common/armor.c b/src/common/armor.c index d1d5664..7f73da1 100644 --- a/src/common/armor.c +++ b/src/common/armor.c @@ -9,7 +9,7 @@ * base64 encode/decode. */ -const char *pem_key = ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/; +const char *pem_key = ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_; static int encode_bits(int c) { @@ -24,9 +24,9 @@ static int decode_bits(char c) return c - 'a' + 26; if (c = '0' c = '9') return c - '0' + 52; -if (c == '+') +if (c == '+' || c == '-') return 62; -if (c == '/') +if (c == '/' || c == '_') return 63; if (c == '=') return 0; /* just non-negative, please */ -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Qemu fails to open RBD image when auth_supported is not set to 'none'
On 02-07-12 21:21, Wido den Hollander wrote: On 06/25/2012 05:45 PM, Wido den Hollander wrote: On 06/25/2012 05:20 PM, Wido den Hollander wrote: Hi, I just tried to start a VM with libvirt with the following disk: disk type='network' device='disk' driver name='qemu' type='raw' cache='none'/ source protocol='rbd' name='rbd/8489c04f-aab8-4796-a22a-ebaa7be247a7' host name='31.25.XX.XX' port='6789'/ /source target dev='vda' bus='virtio'/ /disk That fails with: Operation not supported I tried qemu-img: qemu-img info rbd:rbd/8489c04f-aab8-4796-a22a-ebaa7be247a7:mon_host=31.25.XX.XX\\:6789 Same result. I then tried: qemu-img info rbd:rbd/8489c04f-aab8-4796-a22a-ebaa7be247a7:mon_host=31.25.XX.XX\\:6789:auth_supported=none And that worked :) This host does not have a local ceph.conf, all the parameters have to come from the command line. I know that recently auth_supported defaults to cephx, but that now break the libvirt integration since it doesn't set auth_supported to explicitly none when no auth section is present. Should this be something that gets fixed in librados or in libvirt? Thought about it, this is something in libvirt :) If it's libvirt, I'll write a patch for it :) Just did so, very simple patch: https://www.redhat.com/archives/libvir-list/2012-June/msg01119.html libvirt 0.9.13 just got out. The good news is that the RBD storage pool is in this release, but the patch above did not make it in time. The patch just made it into libvirt: http://libvirt.org/git/?p=libvirt.git;a=commit;h=ccb94785007d33365d49dd566e194eb0a022148d You will need this libvirt patch if you are going to run RBD without cephx enabled Wido We'll have to wait for 0.9.14 to get that one in. Wido Wido -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rados mailbox? (was Re: Ceph for email storage)
On 04-07-12 22:40, Sage Weil wrote: Although Ceph fs would technically work for storing mail with maildir, when you step back from the situation, Maildir + a distributed file system is a pretty terrible way to approach mail storage. Maildir was designed to work around the limited consistency of NFS, and manages that, but performs pretty horribly on almost any file system. Mostly this is due to the message-per-file approach and the fact that file systems' internal management of inodes and directories mean lots and lots of seeks, even to read message headers. Ceph's MDS will probably do better than most due to its embedded inodes, but it's hardly ideal. However, and idea that has been kicking around here is building a mail storage system directly on top of RADOS. In principle, it should be a relatively straightforward matter of implementing a library and plugging it into the storage backend for something like Dovecot, or any other mail system (delivery agent and/or IMAP/POP frontend) with a pluggable backend. (I think postfix has pluggable delivery agents, but that's about where my experience in this area runs out.) When you first told me the idea about a couple of months ago I took a look at the Dovecot code and it's not that trivial to implement. It seems that mbox and Maildir are pretty hardcoded in Dovecot, but there is an advantage: You can use Dovecot as your LDA/VDA (Local/Virtual Delivery Agent) for Postfix, so you'd only have to implement this library in Dovecot and you'd be able to handle IMAP, POP3 and Delivery of e-mails to RADOS. Source: http://wiki.dovecot.org/LDA/Postfix The basic idea is this: - each mail message is a rados object, and immutable. - each mailbox is an index of messages, stored in a rados object. - the index consists of omap records, one for each message. - the key is some unique id - the value is a copy of (a useful subset of) the message headers This has a number of nice properties: - you can efficiently list messages in the mailbox using the omap operations - you can (more) efficiently search messages (everything but the message body) based on the index contents (since it's all stored in one object) - you can efficiently grab recent messages with the omap ops (e.g., list keys last_seen_msgid) - moving messages between folders involves updating the indices only; the messages objects need not be copied/moved. - no metadata bottleneck: mailbox indices are distributed across the entire cluster, just like the mail. - all the scaling benefits of rados for a growing mail system. I don't know enough about what exactly the mail storage backends need to support to know what issues will come up. Presumably there are several. E.g., if you delete a message, is the IMAP client expected to discover that efficiently? And do the mail storage backends attempt to do it efficiently? With IMAP a message gets marked as deleted until your do a PURGE, that will actually remove the message, Problem with IMAP clients however is that there are a lot of bugs in them, especially outlook. But if you can somehow plug into Dovecot and only handle the calls that it's doing you should be fine. This also doesn't solve the problem of efficiently indexing/searching the bodies of messages, although I suspect that indexing could be efficiently implemented on top of this scheme. Nowadays most clients keep a local cache, at least Thunderbird does and uses that for local search. Much faster! Webmail clients like RoundCube have a local cache as well and applications like OpenXchange also have local caches. So, a non-trivial project, but probably one that can be prototyped without that much pain, and one that would perform and scale drastically better than existing solutions I'm aware of. Yes, MUCH better than Maildir over CephFS or NFS. I'm hoping there are some motivated hackers lurking who understand the pain that is maildir/mail infrastructure... Plenty of motivation, not enough time I think. Wido sage On Wed, 4 Jul 2012, Mitsue Acosta Murakami wrote: Hello, We are examining Ceph to use as email storage. In our current system, several clients servers with different services (imap, smtp, etc) access a NFS storage server. The mailboxes are stored in Maildir format, with many small files. We use Amazon AWS EC2 for clients and storage server. In this scenario, we have some questions about Ceph: 1. Is Ceph recommended for heavy write/read of small files? 2. Is there any problem in installing Ceph on Amazon instances? 3. Does Ceph already support quota? 4. What File System would you encourage us to use? Thanks in advance, -- Mitsue Acosta Murakami -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line
Re: OSD doesn't start
On 2012. July 4. 09:34:04 Gregory Farnum wrote: Hrm, it looks like the OSD data directory got a little busted somehow. How did you perform your upgrade? (That is, how did you kill your daemons, in what order, and when did you bring them back up.) Since it would be hard and long to describe in text, I've collected the relevant log entries, sorted by time at http://pastebin.com/Ev3M4DQ9 . The short story is that after seeing that the OSDs won't start, I tried to bring down the whole cluster and start it up from scratch. It didn't change anything, so I rebooted the two machines (running all three daemons), to see if it changes anything. It didn't and I gave up. My ceph config is available at http://pastebin.com/KKNjmiWM . Since this is my test cluster, I'm not very concerned about the data on it. But the other one, with the same config, is dying I think. ceph-fuse is eating around 75% CPU on the sole monitor (cc) node. The monitor about 15%. On the other two nodes, the OSD eats around 50%, the MDS 15%, the monitor another 10%. No Ceph filesystem activity is going on at the moment. Blktrace reports about 1kB/s disk traffic on the partition hosting the OSD data dir. The data seems to be accessible at the moment, but I'm afraid that my production cluster will end up in a similar situation after upgrade, so I don't dare to touch it. Do you have any suggestion what I should check? Thanks, -- cc On Wednesday, July 4, 2012 at 8:31 AM, Székelyi Szabolcs wrote: Hi, after upgrading to 0.48 Argonaut, my OSDs won't start up again. This problem might not be related to the upgrade, since the cluster had strange behavior before, too: ceph-fuse was spinning the CPU around 70%, so did the OSDs. This happened to both of my clusters. Thought that upgrading might solve the problem, but it just got worse. I've copied the log of the OSD run to http://pastebin.com/XYRtfFMU . I've rebooted all the nodes, but they still don't work. What should I do to resurrect my OSDs? Thanks, -- cc -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org (mailto:majord...@vger.kernel.org) More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange behavior after upgrading to 0.48
Hi, On Thu, 5 Jul 2012, Xiaopong Tran wrote: Hi, I put up a small cluster with 3 osds, 2 mds, 3 mons, on 3 machines. They were running 0.47.2, and this is a test to do rolling upgrade to 0.48. I shutdown, upgraded the software, then restarted. One node at a time. The first two seemed to be ok. The third one gave me some weird thing. While it was doing the conversion and recovering, the command ceph -s gives things like this: root@china:/tmp# ceph -s 2012-07-05 14:28:41.069470 7fa3c8443780 2 auth: KeyRing::load: loaded key file /etc/ceph/client.admin.keyring 2012-07-05 14:28:41.594229 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.596313 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.598949 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.601158 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.603069 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.605020 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.607436 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.609304 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.611047 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.667980 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.670283 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.672274 7fa3c030e700 0 monclient: hunting for new mon The problem is that the ceph utility itself is pre-0.48, but the monitors are running 0.48. You need to upgrade the utility as well. (There was a note about this in the release announcement.) This only affects the -s and -w commands. sage And it never stopped. I was thinking, maybe it just behaved like that during recovery. But after the recovery is done, it still get the same thing: root@china:/tmp# ceph health 2012-07-05 14:28:55.077364 7f8306a0d780 2 auth: KeyRing::load: loaded key file /etc/ceph/client.admin.keyring HEALTH_OK root@china:/tmp# ceph -s 2012-07-05 14:30:49.688017 7feb6338e780 2 auth: KeyRing::load: loaded key file /etc/ceph/client.admin.keyring 2012-07-05 14:30:49.691690 7feb5b259700 0 monclient: hunting for new mon 2012-07-05 14:30:49.694295 7feb5b259700 0 monclient: hunting for new mon 2012-07-05 14:30:49.696487 7feb5b259700 0 monclient: hunting for new mon 2012-07-05 14:30:49.698953 7feb5b259700 0 monclient: hunting for new mon 2012-07-05 14:30:49.700833 7feb5b259700 0 monclient: hunting for new mon Upgrading the first two nodes have no such problem. This first two nodes all run osd, mds, and mon. The third only runs osd and mon. The mon log on the 3rd node shows this, not sure if this is helpful: 925291 lease_expire=2012-07-05 02:38:14.149966 has v44 lc 44 2012-07-05 02:38:12.572107 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap active c 29531..30031) is_readable now=2012-07-05 02:38:12.572114 lease_expire=2012-07-05 02:38:15.889056 has v0 lc 30031 2012-07-05 02:38:12.572128 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap active c 29531..30031) is_readable now=2012-07-05 02:38:12.572129 lease_expire=2012-07-05 02:38:15.889056 has v0 lc 30031 2012-07-05 02:38:15.120439 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap active c 1..44) is_readable now=2012-07-05 02:38:15.120446 lease_expire=2012-07-05 02:38:17.149967 has v44 lc 44 2012-07-05 02:38:15.925349 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap active c 1..44) is_readable now=2012-07-05 02:38:15.925356 lease_expire=2012-07-05 02:38:20.149971 has v44 lc 44 2012-07-05 02:38:17.572181 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap active c 29531..30031) is_readable now=2012-07-05 02:38:17.572189 lease_expire=2012-07-05 02:38:21.889065 has v0 lc 30031 2012-07-05 02:38:17.572204 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap active c 29531..30031) is_readable now=2012-07-05 02:38:17.572205 lease_expire=2012-07-05 02:38:21.889065 has v0 lc 30031 2012-07-05 02:38:19.120463 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap active c 1..44) is_readable now=2012-07-05 02:38:19.120470 lease_expire=2012-07-05 02:38:23.149973 has v44 lc 44 2012-07-05 02:38:19.925323 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap active c 1..44) is_readable now=2012-07-05 02:38:19.925330 lease_expire=2012-07-05 02:38:23.149973 has v44 lc 44 Could someone give a hint on this? Thanks Xiaopong -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Generate URL-safe base64 strings for keys.
On Thu, 5 Jul 2012, Wido den Hollander wrote: On 04-07-12 18:18, Sage Weil wrote: On Wed, 4 Jul 2012, Wido den Hollander wrote: On Wed, 4 Jul 2012, Wido den Hollander wrote: By using this we prevent scenarios where cephx keys are not accepted in various situations. Replacing the + and / by - and _ we generate URL-safe base64 keys Signed-off-by: Wido den Hollander w...@widodh.nl Do already properly decode URL-sage base64 encoding? Yes, it decodes URL-safe base64 as well. See the if statements for 62 and 63, + and - are treated equally, just like / and _. Oh, got it. The commit description confused me... I thought this was related encoding only. I think we should break the encode and decode patches into separate versions, and apply the decode to a stable branch (argonaut) and the encode to the master. That should avoid most problems with a rolling/staggered upgrade... I just submitted a patch for decoding only. Applied, thanks! During some tests I did I found out that libvirt uses GNUlib and won't handle URL-safe base64 encoded keys. So, as long as Ceph allows them we're good. Users can always replace the + and / in their key knowing it will be accepted by Ceph. This works for me for now. The exact switch to base64url should be done at a later stage I think. The RFC on this: http://tools.ietf.org/html/rfc4648#page-7 We could: - submit a patch for gnulib; someday it'll support it - kludge the secret generation code in ceph so that it rejects secrets with problematic encoding... :/ (radosgw-admin does something similar with +'s in the s3-style user keys.) sage Wido sage Wido sage --- src/common/armor.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/common/armor.c b/src/common/armor.c index d1d5664..7f73da1 100644 --- a/src/common/armor.c +++ b/src/common/armor.c @@ -9,7 +9,7 @@ * base64 encode/decode. */ -const char *pem_key = ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/; +const char *pem_key = ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_; static int encode_bits(int c) { @@ -24,9 +24,9 @@ static int decode_bits(char c) return c - 'a' + 26; if (c = '0' c = '9') return c - '0' + 52; -if (c == '+') +if (c == '+' || c == '-') return 62; -if (c == '/') +if (c == '/' || c == '_') return 63; if (c == '=') return 0; /* just non-negative, please */ -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange behavior after upgrading to 0.48
Sage Weil s...@inktank.com wrote: Hi, On Thu, 5 Jul 2012, Xiaopong Tran wrote: Hi, I put up a small cluster with 3 osds, 2 mds, 3 mons, on 3 machines. They were running 0.47.2, and this is a test to do rolling upgrade to 0.48. I shutdown, upgraded the software, then restarted. One node at a time. The first two seemed to be ok. The third one gave me some weird thing. While it was doing the conversion and recovering, the command ceph -s gives things like this: root@china:/tmp# ceph -s 2012-07-05 14:28:41.069470 7fa3c8443780 2 auth: KeyRing::load: loaded key file /etc/ceph/client.admin.keyring 2012-07-05 14:28:41.594229 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.596313 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.598949 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.601158 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.603069 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.605020 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.607436 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.609304 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.611047 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.667980 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.670283 7fa3c030e700 0 monclient: hunting for new mon 2012-07-05 14:28:41.672274 7fa3c030e700 0 monclient: hunting for new mon The problem is that the ceph utility itself is pre-0.48, but the monitors are running 0.48. You need to upgrade the utility as well. (There was a note about this in the release announcement.) This only affects the -s and -w commands. sage I have read the notes, andupgraded the utility first. There was no problem when the first two were upgraded and recovering. This only happened when the third node is upgraded. The nodes are running debian wheezy, while the client admin node is running ubuntu 12.04. thanks Xiaopong And it never stopped. I was thinking, maybe it just behaved like that during recovery. But after the recovery is done, it still get the same thing: root@china:/tmp# ceph health 2012-07-05 14:28:55.077364 7f8306a0d780 2 auth: KeyRing::load: loaded key file /etc/ceph/client.admin.keyring HEALTH_OK root@china:/tmp# ceph -s 2012-07-05 14:30:49.688017 7feb6338e780 2 auth: KeyRing::load: loaded key file /etc/ceph/client.admin.keyring 2012-07-05 14:30:49.691690 7feb5b259700 0 monclient: hunting for new mon 2012-07-05 14:30:49.694295 7feb5b259700 0 monclient: hunting for new mon 2012-07-05 14:30:49.696487 7feb5b259700 0 monclient: hunting for new mon 2012-07-05 14:30:49.698953 7feb5b259700 0 monclient: hunting for new mon 2012-07-05 14:30:49.700833 7feb5b259700 0 monclient: hunting for new mon Upgrading the first two nodes have no such problem. This first two nodes all run osd, mds, and mon. The third only runs osd and mon. The mon log on the 3rd node shows this, not sure if this is helpful: 925291 lease_expire=2012-07-05 02:38:14.149966 has v44 lc 44 2012-07-05 02:38:12.572107 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap active c 29531..30031) is_readable now=2012-07-05 02:38:12.572114 lease_expire=2012-07-05 02:38:15.889056 has v0 lc 30031 2012-07-05 02:38:12.572128 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap active c 29531..30031) is_readable now=2012-07-05 02:38:12.572129 lease_expire=2012-07-05 02:38:15.889056 has v0 lc 30031 2012-07-05 02:38:15.120439 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap active c 1..44) is_readable now=2012-07-05 02:38:15.120446 lease_expire=2012-07-05 02:38:17.149967 has v44 lc 44 2012-07-05 02:38:15.925349 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap active c 1..44) is_readable now=2012-07-05 02:38:15.925356 lease_expire=2012-07-05 02:38:20.149971 has v44 lc 44 2012-07-05 02:38:17.572181 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap active c 29531..30031) is_readable now=2012-07-05 02:38:17.572189 lease_expire=2012-07-05 02:38:21.889065 has v0 lc 30031 2012-07-05 02:38:17.572204 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap active c 29531..30031) is_readable now=2012-07-05 02:38:17.572205 lease_expire=2012-07-05 02:38:21.889065 has v0 lc 30031 2012-07-05 02:38:19.120463 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap active c 1..44) is_readable now=2012-07-05 02:38:19.120470 lease_expire=2012-07-05 02:38:23.149973 has v44 lc 44 2012-07-05 02:38:19.925323 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap active c 1..44) is_readable now=2012-07-05 02:38:19.925330 lease_expire=2012-07-05 02:38:23.149973 has v44 lc 44 Could someone give a hint on this? Thanks Xiaopong -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this
Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem
On Wed, 4 Jul 2012, Sha Zhengju wrote: On 07/02/2012 10:49 PM, Sage Weil wrote: On Mon, 2 Jul 2012, Sha Zhengju wrote: On 06/29/2012 01:21 PM, Sage Weil wrote: On Thu, 28 Jun 2012, Sha Zhengju wrote: From: Sha Zhengjuhandai@taobao.com Following we will treat SetPageDirty and dirty page accounting as an integrated operation. Filesystems had better use vfs interface directly to avoid those details. Signed-off-by: Sha Zhengjuhandai@taobao.com --- fs/buffer.c |2 +- fs/ceph/addr.c | 20 ++-- include/linux/buffer_head.h |2 ++ 3 files changed, 5 insertions(+), 19 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index e8d96b8..55522dd 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); * If warn is true, then emit a warning if the page is not uptodate and has * not been truncated. */ -static int __set_page_dirty(struct page *page, +int __set_page_dirty(struct page *page, struct address_space *mapping, int warn) { if (unlikely(!mapping)) This also needs an EXPORT_SYMBOL(__set_page_dirty) to allow ceph to continue to build as a module. With that fixed, the ceph bits are a welcome cleanup! Acked-by: Sage Weils...@inktank.com Further, I check the path again and may it be reworked as follows to avoid undo? __set_page_dirty(); __set_page_dirty(); ceph operations;== if (page-mapping) if (page-mapping)ceph operations; ; else undo = 1; if (undo) xxx; Yep. Taking another look at the original code, though, I'm worried that one reason the __set_page_dirty() actions were spread out the way they are is because we wanted to ensure that the ceph operations were always performed when PagePrivate was set. Sorry, I've lost something: __set_page_dirty();__set_page_dirty(); ceph operations; if(page-mapping) == if(page-mapping) { SetPagePrivate;SetPagePrivate; else ceph operations; undo = 1; } if (undo) XXX; I think this can ensure that ceph operations are performed together with SetPagePrivate. Yeah, that looks right, as long as the ceph accounting operations happen before SetPagePrivate. I think it's no more or less racy than before, at least. The patch doesn't apply without the previous ones in the series, it looks like. Do you want to prepare a new version or should I? Thanks! sage It looks like invalidatepage won't get called if private isn't set, and presumably it handles the truncate race with __set_page_dirty() properly (right?). What about writeback? Do we need to worry about writepage[s] getting called with a NULL page-private? __set_page_dirty does handle racing conditions with truncate and writeback writepage[s] also take page-private into consideration which is done inside specific filesystems. I notice that ceph has handled this in ceph_writepage(). Sorry, not vfs expert and maybe I've not caught your point... Thanks, Sha Thanks! sage Thanks, Sha diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 8b67304..d028fbe 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -5,6 +5,7 @@ #includelinux/mm.h #includelinux/pagemap.h #includelinux/writeback.h/* generic_writepages */ +#includelinux/buffer_head.h #includelinux/slab.h #includelinux/pagevec.h #includelinux/task_io_accounting_ops.h @@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page) int undo = 0; struct ceph_snap_context *snapc; - if (unlikely(!mapping)) - return !TestSetPageDirty(page); - - if (TestSetPageDirty(page)) { - dout(%p set_page_dirty %p idx %lu -- already dirty\n, - mapping-host, page, page-index); + if (!__set_page_dirty(page, mapping, 1)) return 0; - } inode = mapping-host; ci = ceph_inode(inode); @@ -107,14 +102,7 @@ static int ceph_set_page_dirty(struct page *page) snapc, snapc-seq, snapc-num_snaps); spin_unlock(ci-i_ceph_lock); - /* now adjust page */ - spin_lock_irq(mapping-tree_lock); if (page-mapping) {/* Race with truncate? */ - WARN_ON_ONCE(!PageUptodate(page)); - account_page_dirtied(page, page-mapping); -
Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem
On Thu, Jul 5, 2012 at 11:20 PM, Sage Weil s...@inktank.com wrote: On Wed, 4 Jul 2012, Sha Zhengju wrote: On 07/02/2012 10:49 PM, Sage Weil wrote: On Mon, 2 Jul 2012, Sha Zhengju wrote: On 06/29/2012 01:21 PM, Sage Weil wrote: On Thu, 28 Jun 2012, Sha Zhengju wrote: From: Sha Zhengjuhandai@taobao.com Following we will treat SetPageDirty and dirty page accounting as an integrated operation. Filesystems had better use vfs interface directly to avoid those details. Signed-off-by: Sha Zhengjuhandai@taobao.com --- fs/buffer.c |2 +- fs/ceph/addr.c | 20 ++-- include/linux/buffer_head.h |2 ++ 3 files changed, 5 insertions(+), 19 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index e8d96b8..55522dd 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode); * If warn is true, then emit a warning if the page is not uptodate and has * not been truncated. */ -static int __set_page_dirty(struct page *page, +int __set_page_dirty(struct page *page, struct address_space *mapping, int warn) { if (unlikely(!mapping)) This also needs an EXPORT_SYMBOL(__set_page_dirty) to allow ceph to continue to build as a module. With that fixed, the ceph bits are a welcome cleanup! Acked-by: Sage Weils...@inktank.com Further, I check the path again and may it be reworked as follows to avoid undo? __set_page_dirty(); __set_page_dirty(); ceph operations;== if (page-mapping) if (page-mapping)ceph operations; ; else undo = 1; if (undo) xxx; Yep. Taking another look at the original code, though, I'm worried that one reason the __set_page_dirty() actions were spread out the way they are is because we wanted to ensure that the ceph operations were always performed when PagePrivate was set. Sorry, I've lost something: __set_page_dirty();__set_page_dirty(); ceph operations; if(page-mapping) == if(page-mapping) { SetPagePrivate;SetPagePrivate; else ceph operations; undo = 1; } if (undo) XXX; I think this can ensure that ceph operations are performed together with SetPagePrivate. Yeah, that looks right, as long as the ceph accounting operations happen before SetPagePrivate. I think it's no more or less racy than before, at least. The patch doesn't apply without the previous ones in the series, it looks like. Do you want to prepare a new version or should I? Good. I'm doing some test then I'll send out a new version patchset, please wait a bit. : ) Thanks, Sha -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Setting a big maxosd kills all mons
Hi guys, Someone I worked with today pointed me to a quick and easy way to bring down an entire cluster, by making all mons kill themselves in mass suicide: ceph osd setmaxosd 2147483647 2012-07-05 16:29:41.893862 b5962b70 0 monclient: hunting for new mon I don't know what the actual threshold is, but setting your maxosd to any sufficiently big number should do it. I had hoped 2^31-1 would be fine, but evidently it's not. This is what's in the mon log -- the first line is obviously only on the leader at the time of the command, the others are on all mons. -1 2012-07-05 16:29:41.829470 b41a1b70 0 mon.daisy@0(leader) e1 handle_command mon_command(osd setmaxosd 2147483647 v 0) v1 0 2012-07-05 16:29:41.887590 b41a1b70 -1 *** Caught signal (Aborted) ** in thread b41a1b70 ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030) 1: /usr/bin/ceph-mon() [0x816f461] 2: [0xb7738400] 3: [0xb7738424] 4: (gsignal()+0x51) [0xb731a781] 5: (abort()+0x182) [0xb731dbb2] 6: (__gnu_cxx::__verbose_terminate_handler()+0x14f) [0xb753b53f] 7: (()+0xbd405) [0xb7539405] 8: (()+0xbd442) [0xb7539442] 9: (()+0xbd581) [0xb7539581] 10: (()+0x11dea) [0xb7582dea] 11: (tc_new()+0x26) [0xb75a1636] 12: (std::vectorunsigned char, std::allocatorunsigned char ::_M_fill_insert(__gnu_cxx::__normal_iteratorunsigned char*, std::vectorunsigned char, std::allocatorunsigned char , unsigned int, unsigned char const)+0x79) [0x8185629] 13: (OSDMap::set_max_osd(int)+0x497) [0x817c6b7] From src/mon/OSDMonitor.cc: int newmax = atoi(m-cmd[2].c_str()); if (newmax osdmap.crush-get_max_devices()) { err = -ERANGE; ss cannot set max_osd to newmax which is crush max_devices osdmap.crush-get_max_devices(); goto out; } I think that counts as unchecked user input, or has cmd[2] been sanitized at any time before it gets here? Also, is there a way to recover from this, short of reinitializing all mons? Cheers, Florian -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Writes to mounted Ceph FS fail silently if client has no write capability on data pool
Hi everyone, please enlighten me if I'm misinterpreting something, but I think the Ceph FS layer could handle the following situation better. How to reproduce (this is on a 3.2.0 kernel): 1. Create a client, mine is named test, with the following capabilities: client.test key: key caps: [mds] allow caps: [mon] allow r caps: [osd] allow rw pool=testpool Note the client only has access to a single pool, testpool. 2. Export the client's secret and mount a Ceph FS. mount -t ceph -o name=test,secretfile=/etc/ceph/test.secret daisy,eric,frank:/ /mnt This succeeds, despite us not even having read access to the data pool. 3. Write something to a file. root@alice:/mnt# echo hello world hello.txt root@alice:/mnt# cat hello.txt This too succeeds. 4. Sync and clear caches. root@alice:/mnt# sync root@alice:/mnt# echo 3 /proc/sys/vm/drop_caches 5. Check file size and contents. root@alice:/mnt# ls -la total 5 drwxr-xr-x 1 root root0 Jul 5 17:15 . drwxr-xr-x 21 root root 4096 Jun 11 09:03 .. -rw-r--r-- 1 root root 12 Jul 5 17:15 hello.txt root@alice:/mnt# cat hello.txt root@alice:/mnt# Note the reported file size in unchanged, but the file is empty. Checking the data pool with client.admin credentials obviously shows that that pool is empty, so objects are never written. Interestingly, cephfs hello.txt show_location does list an object_name, identifying an object which doesn't exist. Is there any way to make the client fail with -EIO, -EPERM, -EOPNOTSUPP or whatever else is appropriate, rather than pretending to write when it can't? Also, going down the rabbit hole, how would this behavior change if I used cephfs to set the default layout on some directory to use a different pool? All thoughts appreciated. Cheers, Florian -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
cephfs show_location produces kernel divide error: 0000 [#1] when run against a directory that is not the filesystem root
And one more issue report for today... :) Really easy to reproduce on my 3.2.0 Debian squeeze-backports kernel: mount a Ceph FS, create a directory in it. Then run cephfs dir show_location. dmesg stacktrace: [ 7153.714260] libceph: mon2 192.168.42.116:6789 session established [ 7308.584193] divide error: [#1] SMP [ 7308.584936] Modules linked in: cryptd aes_i586 aes_generic cbc ceph libceph nfsd lockd nfs_acl auth_rpcgss sunrpc fuse joydev usbhid hid snd_pcm snd_timer snd processor soundcore snd_page_alloc thermal_sys button tpm_tis tpm tpm_bios psmouse i2c_piix4 evdev serio_raw i2c_core virtio_balloon pcspkr ext3 jbd mbcache btrfs zlib_deflate crc32c libcrc32c sg sr_mod cdrom ata_generic virtio_net virtio_blk ata_piix uhci_hcd ehci_hcd libata usbcore floppy scsi_mod virtio_pci usb_common [last unloaded: scsi_wait_scan] [ 7308.588013] [ 7308.588013] Pid: 1444, comm: cephfs Not tainted 3.2.0-0.bpo.2-686-pae #1 Bochs Bochs [ 7308.588013] EIP: 0060:[f848c6c2] EFLAGS: 00010246 CPU: 0 [ 7308.588013] EIP is at ceph_calc_file_object_mapping+0x44/0xe8 [libceph] [ 7308.588013] EAX: EBX: ECX: EDX: [ 7308.588013] ESI: EDI: EBP: ESP: f7495ce4 [ 7308.588013] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 7308.588013] Process cephfs (pid: 1444, ti=f7494000 task=f7266a60 task.ti=f7494000) [ 7308.588013] Stack: [ 7308.588013] 0001b053 f5f20624 f5f203f0 f749a800 f5f20420 [ 7308.588013] f84ca6a7 f7495d40 f7495d58 f7495d50 f7495d38 0001 0246 f5f20420 [ 7308.588013] f749a90c bff6ff70 c14203a4 fffba978 000a0050 f79f0298 0001 [ 7308.588013] Call Trace: [ 7308.588013] [f84ca6a7] ? ceph_ioctl_get_dataloc+0x9e/0x213 [ceph] [ 7308.588013] [c10b6781] ? __do_fault+0x3ee/0x42b [ 7308.588013] [c10b75f3] ? handle_pte_fault+0x3aa/0xa67 [ 7308.588013] [c10e0844] ? path_openat+0x27f/0x294 [ 7308.588013] [f84cac16] ? ceph_ioctl+0x3fa/0x460 [ceph] [ 7308.588013] [c10d9fdb] ? cp_new_stat64+0xee/0x100 [ 7308.588013] [c10b7ebe] ? handle_mm_fault+0x20e/0x224 [ 7308.588013] [f84ca81c] ? ceph_ioctl_get_dataloc+0x213/0x213 [ceph] I unfortunately don't have a more recent kernel to test with, so if this has been fixed upstream feel free to ignore me. Otherwise, perhaps something that could go into the 3.5-rc cycle. Doing show_location on a file, and on the root directory of the fs, both work fine. Cheers, Florian -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Setting a big maxosd kills all mons
On Thu, Jul 5, 2012 at 10:39 AM, Florian Haas flor...@hastexo.com wrote: Hi guys, Someone I worked with today pointed me to a quick and easy way to bring down an entire cluster, by making all mons kill themselves in mass suicide: ceph osd setmaxosd 2147483647 2012-07-05 16:29:41.893862 b5962b70 0 monclient: hunting for new mon Ungh. Can you file a bug report? The problem here is that the monitor is trying to allocate a number of maps and arrays with that many entries; we probably need to put an artificial cap in place as a config option. I don't know what the actual threshold is, but setting your maxosd to any sufficiently big number should do it. I had hoped 2^31-1 would be fine, but evidently it's not. This is what's in the mon log -- the first line is obviously only on the leader at the time of the command, the others are on all mons. -1 2012-07-05 16:29:41.829470 b41a1b70 0 mon.daisy@0(leader) e1 handle_command mon_command(osd setmaxosd 2147483647 v 0) v1 0 2012-07-05 16:29:41.887590 b41a1b70 -1 *** Caught signal (Aborted) ** in thread b41a1b70 ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030) 1: /usr/bin/ceph-mon() [0x816f461] 2: [0xb7738400] 3: [0xb7738424] 4: (gsignal()+0x51) [0xb731a781] 5: (abort()+0x182) [0xb731dbb2] 6: (__gnu_cxx::__verbose_terminate_handler()+0x14f) [0xb753b53f] 7: (()+0xbd405) [0xb7539405] 8: (()+0xbd442) [0xb7539442] 9: (()+0xbd581) [0xb7539581] 10: (()+0x11dea) [0xb7582dea] 11: (tc_new()+0x26) [0xb75a1636] 12: (std::vectorunsigned char, std::allocatorunsigned char ::_M_fill_insert(__gnu_cxx::__normal_iteratorunsigned char*, std::vectorunsigned char, std::allocatorunsigned char , unsigned int, unsigned char const)+0x79) [0x8185629] 13: (OSDMap::set_max_osd(int)+0x497) [0x817c6b7] From src/mon/OSDMonitor.cc: int newmax = atoi(m-cmd[2].c_str()); if (newmax osdmap.crush-get_max_devices()) { err = -ERANGE; ss cannot set max_osd to newmax which is crush max_devices osdmap.crush-get_max_devices(); goto out; } I think that counts as unchecked user input, or has cmd[2] been sanitized at any time before it gets here? Yeah, there's all kinds of unsanitized user input in the monitor command-parsing code. Also, is there a way to recover from this, short of reinitializing all mons? Hmm. We can do it by manipulating the disk format, but there's not any programmatic way to do so. I *think* that if you turn off all the monitors, and: 1) delete the latest osdmap and osdmap_full entries, 2) edit the osdmap and osdmap_full last_committed entries to be one prior to what they are, 3) start the monitors then you should be okay. But it's possible that the latest entry got updated, in which case you'd also have to modify that to be an older map. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow request warnings on 0.48
On 07/04/2012 11:58 AM, Alexandre DERUMIER wrote: Hi, I see same messages here after upgrade to 0.48. with random write benchmark. I have more lags than before with 0.47 (but disks are at 100% usage, so can't tell if it's normal or not) - Mail original - De: David Blundelldavid.blund...@100percentit.com À: ceph-devel@vger.kernel.org Envoyé: Mercredi 4 Juillet 2012 18:53:02 Objet: Slow request warnings on 0.48 I have three servers running mon and osd using Ubuntu 12.04 that I have been testing with RADOS storing RBD KVM instances 0.47.3 worked extremely well (once I got over a few btrfs issues). The same servers running 0.48 give a large number of [WRN] slow request messages whenever I generate a lot of random IO in the KVM instances using iozone. The slow responses eventually leads to disk timeouts on the KVM instances. I have erased the osds and recreated on new btrfs volumes with the same result. I have also tried switching to xfs using mkfs.xfs -n size=64k with noatime, inode64,delaylog,logbufs=8,logbsize=256k Xfs gives the same result - the iozone tests run fine until the random IO starts and then there are lots of slow request warnings. Does anyone have any ideas about the best place to start troubleshooting / debugging? Thanks, David Hi David and Alexandre, Does this only happen with random writes or also sequential writes? If it happens with sequential writes as well, does it happen with rados bench? -- Mark Nelson Performance Engineer Inktank -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Slow request warnings on 0.48
Hi David and Alexandre, Does this only happen with random writes or also sequential writes? If it happens with sequential writes as well, does it happen with rados bench? -- Mark Nelson Performance Engineer Inktank Hi Mark, I have only ever seen it with random writes. I'll retry rados bench in a few minutes to double check - are there any other tests you would like me to run? I'm currently waiting for some iozone tests to finish. When the sequential tests are running the logs are fine, as soon as the random writes start, the logs start to fill with messages like: 2012-07-05 19:10:00.599250 osd.6 10.0.1.42:6802/2145 1151 : [WRN] slow request 37.933071 seconds old, received at 2012-07-05 19:09:22.665917: osd_op(client.96416.0:91965 rb.0.1.015f [write 4022272~4096] 2.3777e91a snapc 11=[11,10]) v4 currently waiting for sub ops 2012-07-05 19:10:00.599258 osd.6 10.0.1.42:6802/2145 1152 : [WRN] slow request 37.932836 seconds old, received at 2012-07-05 19:09:22.666152: osd_op(client.96416.0:91966 rb.0.1.015f [write 4034560~4096] 2.3777e91a snapc 11=[11,10]) v4 currently waiting for sub ops 2012-07-05 19:10:03.278141 mon.0 10.0.1.40:6789/0 493 : [INF] pgmap v7564: 1344 pgs: 1344 active+clean; 5183 MB data, 11066 MB used, 1367 GB / 1377 GB avail 2012-07-05 19:09:55.388448 osd.3 10.0.1.41:6802/2540 160 : [WRN] 6 slow requests, 6 included below; oldest blocked for 32.622016 secs 2012-07-05 19:09:55.388463 osd.3 10.0.1.41:6802/2540 161 : [WRN] slow request 32.622016 seconds old, received at 2012-07-05 19:09:22.766269: osd_op(client.96416.0:92308 rb.0.1.017b [write 4001792~4096] 2.f606a6c6 snapc 11=[11,10]) v4 currently waiting for sub ops
Re: Slow request warnings on 0.48
It was during a random write (fio benchmark). I can't reproduce it now,I'll try to do tests again this week. - Mail original - De: Mark Nelson mark.nel...@inktank.com À: Alexandre DERUMIER aderum...@odiso.com Cc: David Blundell david.blund...@100percentit.com, ceph-devel@vger.kernel.org Envoyé: Jeudi 5 Juillet 2012 19:58:27 Objet: Re: Slow request warnings on 0.48 On 07/04/2012 11:58 AM, Alexandre DERUMIER wrote: Hi, I see same messages here after upgrade to 0.48. with random write benchmark. I have more lags than before with 0.47 (but disks are at 100% usage, so can't tell if it's normal or not) - Mail original - De: David Blundelldavid.blund...@100percentit.com À: ceph-devel@vger.kernel.org Envoyé: Mercredi 4 Juillet 2012 18:53:02 Objet: Slow request warnings on 0.48 I have three servers running mon and osd using Ubuntu 12.04 that I have been testing with RADOS storing RBD KVM instances 0.47.3 worked extremely well (once I got over a few btrfs issues). The same servers running 0.48 give a large number of [WRN] slow request messages whenever I generate a lot of random IO in the KVM instances using iozone. The slow responses eventually leads to disk timeouts on the KVM instances. I have erased the osds and recreated on new btrfs volumes with the same result. I have also tried switching to xfs using mkfs.xfs -n size=64k with noatime, inode64,delaylog,logbufs=8,logbsize=256k Xfs gives the same result - the iozone tests run fine until the random IO starts and then there are lots of slow request warnings. Does anyone have any ideas about the best place to start troubleshooting / debugging? Thanks, David Hi David and Alexandre, Does this only happen with random writes or also sequential writes? If it happens with sequential writes as well, does it happen with rados bench? -- Mark Nelson Performance Engineer Inktank -- -- Alexandre D e rumier Ingénieur Systèmes et Réseaux Fixe : 03 20 68 88 85 Fax : 03 20 68 90 88 45 Bvd du Général Leclerc 59100 Roubaix 12 rue Marivaux 75002 Paris -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Slow request warnings on 0.48
Hi David and Alexandre, Does this only happen with random writes or also sequential writes? If it happens with sequential writes as well, does it happen with rados bench? -- Mark Nelson Performance Engineer Inktank Hi Mark, I just ran rados -p data bench 60 write -t 16 and a few dd tests with no problems at all so at the moment it looks like only random IO triggers the slow writes. Please do let me know if there are any other tests that I can do to help track down the cause. David
Re: Slow request warnings on 0.48
On 07/05/2012 01:43 PM, David Blundell wrote: Hi David and Alexandre, Does this only happen with random writes or also sequential writes? If it happens with sequential writes as well, does it happen with rados bench? -- Mark Nelson Performance Engineer Inktank Hi Mark, I just ran rados -p data bench 60 write -t 16 and a few dd tests with no problems at all so at the moment it looks like only random IO triggers the slow writes. Please do let me know if there are any other tests that I can do to help track down the cause. David Thanks David! We've got some people internally taking a look at this. I'll let you guys know if there is anything else we need! Thanks, Mark -- Mark Nelson Performance Engineer Inktank -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Writes to mounted Ceph FS fail silently if client has no write capability on data pool
On Thu, Jul 5, 2012 at 10:40 AM, Florian Haas flor...@hastexo.com wrote: Hi everyone, please enlighten me if I'm misinterpreting something, but I think the Ceph FS layer could handle the following situation better. How to reproduce (this is on a 3.2.0 kernel): 1. Create a client, mine is named test, with the following capabilities: client.test key: key caps: [mds] allow caps: [mon] allow r caps: [osd] allow rw pool=testpool Note the client only has access to a single pool, testpool. 2. Export the client's secret and mount a Ceph FS. mount -t ceph -o name=test,secretfile=/etc/ceph/test.secret daisy,eric,frank:/ /mnt This succeeds, despite us not even having read access to the data pool. 3. Write something to a file. root@alice:/mnt# echo hello world hello.txt root@alice:/mnt# cat hello.txt This too succeeds. 4. Sync and clear caches. root@alice:/mnt# sync root@alice:/mnt# echo 3 /proc/sys/vm/drop_caches 5. Check file size and contents. root@alice:/mnt# ls -la total 5 drwxr-xr-x 1 root root 0 Jul 5 17:15 . drwxr-xr-x 21 root root 4096 Jun 11 09:03 .. -rw-r--r-- 1 root root 12 Jul 5 17:15 hello.txt root@alice:/mnt# cat hello.txt root@alice:/mnt# Note the reported file size in unchanged, but the file is empty. Checking the data pool with client.admin credentials obviously shows that that pool is empty, so objects are never written. Interestingly, cephfs hello.txt show_location does list an object_name, identifying an object which doesn't exist. Is there any way to make the client fail with -EIO, -EPERM, -EOPNOTSUPP or whatever else is appropriate, rather than pretending to write when it can't? There definitely are, but I don't think we're going to fix that until we get to working seriously on the filesystem. Create a bug! ;) Also, going down the rabbit hole, how would this behavior change if I used cephfs to set the default layout on some directory to use a different pool? I'm not sure what you're asking here — if you have access to the metadata server, you can change the pool that new files go into, and I think you can set the pool to be whatever you like (and we should probably harden all this, too). So you can fix it if it's a problem, but you can also turn it into a problem. Is that what you were after? -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cephfs show_location produces kernel divide error: 0000 [#1] when run against a directory that is not the filesystem root
On Thu, Jul 5, 2012 at 10:04 PM, Gregory Farnum g...@inktank.com wrote: But I have a few more queries while this is fresh. If you create a directory, unmount and remount, and get the location, does that work? Nope, same error. (actually, just flushing caches would probably do it.) Idem. If you create a directory on one node, and then go look at it on another node and try to get the location from there, does that work? No. Cheers, Florian -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow request warnings on 0.48
David, Could you try rados -p data bench 60 write -t 16 -b 4096? rados bench defaults to 4MB objects, this'll give us results for 4k objects. If you could give me the latency too, that would help. -Sam On Thu, Jul 5, 2012 at 12:49 PM, Mark Nelson mark.nel...@inktank.com wrote: On 07/05/2012 01:43 PM, David Blundell wrote: Hi David and Alexandre, Does this only happen with random writes or also sequential writes? If it happens with sequential writes as well, does it happen with rados bench? -- Mark Nelson Performance Engineer Inktank Hi Mark, I just ran rados -p data bench 60 write -t 16 and a few dd tests with no problems at all so at the moment it looks like only random IO triggers the slow writes. Please do let me know if there are any other tests that I can do to help track down the cause. David Thanks David! We've got some people internally taking a look at this. I'll let you guys know if there is anything else we need! Thanks, Mark -- Mark Nelson Performance Engineer Inktank -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Writes to mounted Ceph FS fail silently if client has no write capability on data pool
On Thu, Jul 5, 2012 at 10:01 PM, Gregory Farnum g...@inktank.com wrote: Also, going down the rabbit hole, how would this behavior change if I used cephfs to set the default layout on some directory to use a different pool? I'm not sure what you're asking here — if you have access to the metadata server, you can change the pool that new files go into, and I think you can set the pool to be whatever you like (and we should probably harden all this, too). So you can fix it if it's a problem, but you can also turn it into a problem. I am aware that I would be able to do this. My question was more along the lines of: if the pool that data is written to can be set on a per-file or per-directory basis, and we can also set read and write permissions per pool, how would the filesystem behave properly? Hide files the mounting user doesn't have read access to? Return -EIO or -EPERM on writes to files stored in pools we can't write to? Failing a mount if we're missing some permission on any file or directory in the fs? All of these sound painful in one way or another, so I'm having trouble envisioning what the correct behavior would look like. Florian -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cephfs show_location produces kernel divide error: 0000 [#1] when run against a directory that is not the filesystem root
On Thu, Jul 5, 2012 at 1:19 PM, Florian Haas flor...@hastexo.com wrote: On Thu, Jul 5, 2012 at 10:04 PM, Gregory Farnum g...@inktank.com wrote: But I have a few more queries while this is fresh. If you create a directory, unmount and remount, and get the location, does that work? Nope, same error. (actually, just flushing caches would probably do it.) Idem. If you create a directory on one node, and then go look at it on another node and try to get the location from there, does that work? No. Cheers, Florian Okay, this used to work at least some, so something definitely got broken in the kernel. :/ Thanks for checking... -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Writes to mounted Ceph FS fail silently if client has no write capability on data pool
On Thu, Jul 5, 2012 at 1:25 PM, Florian Haas flor...@hastexo.com wrote: On Thu, Jul 5, 2012 at 10:01 PM, Gregory Farnum g...@inktank.com wrote: Also, going down the rabbit hole, how would this behavior change if I used cephfs to set the default layout on some directory to use a different pool? I'm not sure what you're asking here — if you have access to the metadata server, you can change the pool that new files go into, and I think you can set the pool to be whatever you like (and we should probably harden all this, too). So you can fix it if it's a problem, but you can also turn it into a problem. I am aware that I would be able to do this. My question was more along the lines of: if the pool that data is written to can be set on a per-file or per-directory basis, and we can also set read and write permissions per pool, how would the filesystem behave properly? Hide files the mounting user doesn't have read access to? Return -EIO or -EPERM on writes to files stored in pools we can't write to? Failing a mount if we're missing some permission on any file or directory in the fs? All of these sound painful in one way or another, so I'm having trouble envisioning what the correct behavior would look like. Ah, yes. My feeling would be that we want to treat it like a local file they aren't allowed to access — ie, return EPERM. I *think* that is what will actually happen if they try to read those files, but the write path works a bit differently (since the writes are flushed out asynchronously) and so we would need to introduce some smarts into the client to check the pool permissions and proactively apply them on any attempted access. -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: domino-style OSD crash
On Wed, Jul 4, 2012 at 10:53 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: Le 04/07/2012 18:21, Gregory Farnum a écrit : On Wednesday, July 4, 2012 at 1:06 AM, Yann Dupont wrote: Le 03/07/2012 23:38, Tommi Virtanen a écrit : On Tue, Jul 3, 2012 at 1:54 PM, Yann Dupont yann.dup...@univ-nantes.fr (mailto:yann.dup...@univ-nantes.fr) wrote: In the case I could repair, do you think a crashed FS as it is right now is valuable for you, for future reference , as I saw you can't reproduce the problem ? I can make an archive (or a btrfs dump ?), but it will be quite big. At this point, it's more about the upstream developers (of btrfs etc) than us; we're on good terms with them but not experts on the on-disk format(s). You might want to send an email to the relevant mailing lists before wiping the disks. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org (mailto:majord...@vger.kernel.org) More majordomo info at http://vger.kernel.org/majordomo-info.html Well, I probably wasn't clear enough. I talked about crashed FS, but i was talking about ceph. The underlying FS (btrfs in that case) of 1 node (and only one) has PROBABLY crashed in the past, causing corruption in ceph data on this node, and then the subsequent crash of other nodes. RIGHT now btrfs on this node is OK. I can access the filesystem without errors. For the moment, on 8 nodes, 4 refuse to restart . 1 of the 4 nodes was the crashed node , the 3 others didn't had broblem with the underlying fs as far as I can tell. So I think the scenario is : One node had problem with btrfs, leading first to kernel problem , probably corruption (in disk/ in memory maybe ?) ,and ultimately to a kernel oops. Before that ultimate kernel oops, bad data has been transmitted to other (sane) nodes, leading to ceph-osd crash on thoses nodes. I don't think that's actually possible — the OSDs all do quite a lot of interpretation between what they get off the wire and what goes on disk. What you've got here are 4 corrupted LevelDB databases, and we pretty much can't do that through the interfaces we have. :/ ok, so as all nodes were identical, I probably have hit a btrfs bug (like a erroneous out of space ) in more or less the same time. And when 1 osd was out, If you think this scenario is highly improbable in real life (that is, btrfs will probably be fixed for good, and then, corruption can't happen), it's ok. But I wonder if this scenario can be triggered with other problem, and bad data can be transmitted to other sane nodes (power outage, out of memory condition, disk full... for example) That's why I proposed you a crashed ceph volume image (I shouldn't have talked about a crashed fs, sorry for the confusion) I appreciate the offer, but I don't think this will help much — it's a disk state managed by somebody else, not our logical state, which has broken. If we could figure out how that state got broken that'd be good, but a ceph image won't really help in doing so. ok, no problem. I'll restart from scratch, freshly formated. I wonder if maybe there's a confounding factor here — are all your nodes similar to each other, Yes. I designed the cluster that way. All nodes are identical hardware (powerEdge M610, 10G intel ethernet + emulex fibre channel attached to storage (1 Array for 2 OSD nodes, 1 controller dedicated for each OSD) Oh, interesting. Are the broken nodes all on the same set of arrays? or are they running on different kinds of hardware? How did you do your Ceph upgrades? What's ceph -s display when the cluster is running as best it can? Ceph was running 0.47.2 at that time - (debian package for ceph). After the crash I couldn't restart all the nodes. Tried 0.47.3 and now 0.48 without success. Nothing particular for upgrades, because for the moment ceph is broken, so just apt-get upgrade with new version. ceph -s show that : root@label5:~# ceph -s health HEALTH_WARN 260 pgs degraded; 793 pgs down; 785 pgs peering; 32 pgs recovering; 96 pgs stale; 793 pgs stuck inactive; 96 pgs stuck stale; 1092 pgs stuck unclean; recovery 267286/2491140 degraded (10.729%); 1814/1245570 unfound (0.146%) monmap e1: 3 mons at {chichibu=172.20.14.130:6789/0,glenesk=172.20.14.131:6789/0,karuizawa=172.20.14.133:6789/0}, election epoch 12, quorum 0,1,2 chichibu,glenesk,karuizawa osdmap e2404: 8 osds: 3 up, 3 in pgmap v173701: 1728 pgs: 604 active+clean, 8 down, 5 active+recovering+remapped, 32 active+clean+replay, 11 active+recovering+degraded, 25 active+remapped, 710 down+peering, 222 active+degraded, 7 stale+active+recovering+degraded, 61 stale+down+peering, 20 stale+active+degraded, 6 down+remapped+peering, 8 stale+down+remapped+peering, 9 active+recovering; 4786 GB data, 7495 GB used, 7280 GB / 15360 GB avail; 267286/2491140 degraded (10.729%);
Re: speedup ceph / scaling / find the bottleneck
Could you send over the ceph.conf on your KVM host, as well as how you're configuring KVM to use rbd? On Tue, Jul 3, 2012 at 11:20 AM, Stefan Priebe s.pri...@profihost.ag wrote: I'm sorry but this is the KVM Host Machine there is no ceph running on this machine. If i change the admin socket to: admin_socket=/var/run/ceph_$name.sock i don't have any socket at all ;-( Am 03.07.2012 17:31, schrieb Sage Weil: On Tue, 3 Jul 2012, Stefan Priebe - Profihost AG wrote: Hello, Am 02.07.2012 22:30, schrieb Josh Durgin: If you add admin_socket=/path/to/admin_socket for your client running qemu (in that client's ceph.conf section or manually in the qemu command line) you can check that caching is enabled: ceph --admin-daemon /path/to/admin_socket show config | grep rbd_cache And see statistics it generates (look for cache) with: ceph --admin-daemon /path/to/admin_socket perfcounters_dump This doesn't work for me: ceph --admin-daemon /var/run/ceph.sock show config read only got 0 bytes of 4 expected for response length; invalid command?2012-07-03 09:46:57.931821 7fa75d129700 -1 asok(0x8115a0) AdminSocket: request 'show config' not defined Oh, it's 'config show'. Also, 'help' will list the supported commands. Also perfcounters does not show anything: # ceph --admin-daemon /var/run/ceph.sock perfcounters_dump {} There may be another daemon that tried to attach to the same socket file. You might want to set 'admin socket = /var/run/ceph/$name.sock' or something similar, or whatever else is necessary to make it a unique file. ~]# ceph -v ceph version 0.48argonaut-2-gb576faa (commit:b576faa6f24356f4d3ec7205e298d58659e29c68) Out of curiousity, what patches are you applying on top of the release? sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow request warnings on 0.48
On 5 Jul 2012, at 21:21, Samuel Just wrote: David, Could you try rados -p data bench 60 write -t 16 -b 4096? rados bench defaults to 4MB objects, this'll give us results for 4k objects. If you could give me the latency too, that would help. -Sam Hi Sam, I first ran this with the standard ceph settings giving http://pastebin.com/MWLxEazS This did not cause any slow request warnings so I set filestore queue max ops = 5000 to increase the number of requests in flight. This resulted in http://pastebin.com/yFnALmGW and also a small number of slow request warnings. I ran it again with similar results http://pastebin.com/VnKSVmsq If there's anything else you need, please let me know. David-- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mkcephfs failing on v0.48 argonaut
Hi Paul, On Wed, 4 Jul 2012, Paul Pettigrew wrote: Firstly, well done guys on achieving this version milestone. I successfully upgraded to the 0.48 format uneventfully on a live (test) system. The same system was then going through rebuild testing, to confirm that also worked fine. Unfortunately, the mkcephfs command is failing: root@dsanb1-coy:~# mkcephfs -c /etc/ceph/ceph.conf --allhosts --mkbtrfs -k /etc/ceph/keyring --crushmapsrc crushfile.txt -v temp dir is /tmp/mkcephfs.GaRCZ9i06a preparing monmap in /tmp/mkcephfs.GaRCZ9i06a/monmap /usr/bin/monmaptool --create --clobber --add alpha 10.32.0.10:6789 --add bravo 10.32.0.25:6789 --add charlie 10.32.0.11:6789 --print /tmp/mkcephfs.GaRCZ9i06a/monmap /usr/bin/monmaptool: monmap file /tmp/mkcephfs.GaRCZ9i06a/monmap /usr/bin/monmaptool: generated fsid c7202495-468c-4678-b678-115c3ee33402 epoch 0 fsid c7202495-468c-4678-b678-115c3ee33402 last_changed 2012-07-04 15:02:31.732275 created 2012-07-04 15:02:31.732275 0: 10.32.0.10:6789/0 mon.alpha 1: 10.32.0.11:6789/0 mon.charlie 2: 10.32.0.25:6789/0 mon.bravo /usr/bin/monmaptool: writing epoch 0 to /tmp/mkcephfs.GaRCZ9i06a/monmap (3 monitors) /usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.0 user === osd.0 === --- dsanb1-coy# /sbin/mkcephfs -d /tmp/mkcephfs.GaRCZ9i06a --prepare-osdfs osd.0 umount: /srv/osd.0: not mounted umount: /dev/disk/by-wwn/wwn-0x50014ee601246234: not mounted WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using fs created label (null) on /dev/disk/by-wwn/wwn-0x50014ee601246234 nodesize 4096 leafsize 4096 sectorsize 4096 size 1.82TB Btrfs Btrfs v0.19 Scanning for Btrfs filesystems mount: wrong fs type, bad option, bad superblock on /dev/sdc, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so failed: '/sbin/mkcephfs -d /tmp/mkcephfs.GaRCZ9i06a --prepare-osdfs osd.0' Hmm. Can you try running with -v? That will tell us exactly which command it is running, and hopefully we can work backwards from there. dmesg/syslog is spitting out at the time of this failure: Jul 4 15:02:31 dsanb1-coy kernel: [ 2306.751945] device fsid 7de0d192-b710-4629-a201-849df1d9db17 devid 1 transid 27109 /dev/sdp Jul 4 15:02:31 dsanb1-coy kernel: [ 2306.751987] device fsid 08fc3479-2fa2-4388-8b61-83e2a742a13e devid 1 transid 28699 /dev/sdo Jul 4 15:02:31 dsanb1-coy kernel: [ 2306.752023] device fsid 8b4a7c43-1a05-4dcb-bbed-de2a5c933996 devid 1 transid 24346 /dev/sdn Jul 4 15:02:31 dsanb1-coy kernel: [ 2306.752068] device fsid ba5fb1ca-c642-49b1-8a41-7f56f8e59fbd devid 1 transid 27274 /dev/sdm Jul 4 15:02:31 dsanb1-coy kernel: [ 2306.761453] device fsid 7fe8c5cf-bf8c-4276-90f2-c3f57f5275fb devid 1 transid 28724 /dev/sdi Jul 4 15:02:31 dsanb1-coy kernel: [ 2306.761518] device fsid 93fa3631-1202-4d42-8908-e5ef4d3e600d devid 1 transid 25201 /dev/sdh Jul 4 15:02:31 dsanb1-coy kernel: [ 2306.761579] device fsid b9a1b5e4-3e5e-4381-a29a-33470f4b870f devid 1 transid 23375 /dev/sdg Jul 4 15:02:31 dsanb1-coy kernel: [ 2306.761635] device fsid 280ea990-23f8-4c43-9e56-140c82340fdc devid 1 transid 25559 /dev/sdf Jul 4 15:02:31 dsanb1-coy kernel: [ 2306.761693] device fsid 2f724cde-6de5-4262-b195-1ba3eea2256e devid 1 transid 176 /dev/sde Jul 4 15:02:31 dsanb1-coy kernel: [ 2306.761732] device fsid a66f890f-8b08-4393-aab0-f222637ca5a4 devid 1 transid 7 /dev/sdd Jul 4 15:02:31 dsanb1-coy kernel: [ 2306.761769] device fsid 6c181a94-697c-4e0c-af0d-05eb04d3626c devid 1 transid 7 /dev/sdc Jul 4 15:02:31 dsanb1-coy kernel: [ 2306.775931] device fsid 6c181a94-697c-4e0c-af0d-05eb04d3626c devid 1 transid 7 /dev/sdc Jul 4 15:02:31 dsanb1-coy kernel: [ 2306.779716] btrfs bad fsid on block 20971520 Jul 4 15:02:31 dsanb1-coy kernel: [ 2306.791594] btrfs bad fsid on block 20971520 Jul 4 15:02:31 dsanb1-coy kernel: [ 2306.803608] btrfs bad fsid on block 20971520 Jul 4 15:02:31 dsanb1-coy kernel: [ 2306.815541] btrfs bad fsid on block 20971520 Jul 4 15:02:31 dsanb1-coy kernel: [ 2306.815878] btrfs bad fsid on block 20971520 Jul 4 15:02:32 dsanb1-coy kernel: [ 2306.823554] btrfs bad fsid on block 20971520 Jul 4 15:02:32 dsanb1-coy kernel: [ 2306.823797] btrfs bad fsid on block 20971520 Jul 4 15:02:32 dsanb1-coy kernel: [ 2306.823887] btrfs: failed to read chunk root on sdc Jul 4 15:02:32 dsanb1-coy kernel: [ 2306.825622] btrfs: open_ctree failed Long shot, but is the kernel on that machine recent? Also fails if not forcing to use btrfs, eg: root@dsanb1-coy:~# mkcephfs -c /etc/ceph/ceph.conf --allhosts -k /etc/ceph/keyring --crushmapsrc crushfile.txt -v temp dir is /tmp/mkcephfs.ZOh6tBPAH0 preparing monmap in /tmp/mkcephfs.ZOh6tBPAH0/monmap /usr/bin/monmaptool --create --clobber --add alpha 10.32.0.10:6789 --add bravo 10.32.0.25:6789 --add
Re: Strange behavior after upgrading to 0.48
On 07/05/2012 10:38 PM, Sage Weil wrote: On Thu, 5 Jul 2012, Xiaopong Tran wrote: The problem is that the ceph utility itself is pre-0.48, but the monitors are running 0.48. You need to upgrade the utility as well. (There was a note about this in the release announcement.) This only affects the -s and -w commands. sage I have read the notes, andupgraded the utility first. There was no problem when the first two were upgraded and recovering. This only happened when the third node is upgraded. The nodes are running debian wheezy, while the client admin node is running ubuntu 12.04. Oooh, maybe the package for wheezy in the repo is wrong. Can you confirm which version the ceph utility is with 'ceph -v'? Thanks! sage Thanks for the quick reply, I didn't have the computer with me last night. But you were right. I checked the version of ceph on ubuntu, and it's still stuck with 0.47.3, despite upgrading. I redid the upgrade, and it's still stuck with that version. That's something I didn't pay attention to. I had to purge the ceph, ceph-common and other related packages, and re-install it, then I got 0.48. And now ceph -s works just as it should. So, somehow, the upgrade on ubuntu does not work properly. Thinking about this issue just right now, I think ceph -s still worked right because there was still an older version of mon when the first two nodes were being upgraded. When the last one was upgraded, there's no mon of the same version anymore. Sorry, should have checked if apt upgrade was done properly first :) Thanks Xiaopong -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: mkcephfs failing on v0.48 argonaut
Hi Sage - thanks so much for the quick response :-) Firstly, and it is a bit hard to see, but the command output below is run with the -v option. To help isolate what command line in the script is failing, I have added in some simple echo output, and the script now looks like: ### prepare-osdfs ### if [ -n $prepareosdfs ]; then SNIP modprobe btrfs || true echo RUNNING: mkfs.btrfs $btrfs_devs mkfs.btrfs $btrfs_devs btrfs device scan || btrfsctl -a echo RUNNING: mount -t btrfs $btrfs_opt $first_dev $btrfs_path mount -t btrfs $btrfs_opt $first_dev $btrfs_path echo DID I GET HERE - OR CRASH OUT WITH mount ABOVE? chown $osd_user $btrfs_path chmod +w $btrfs_path exit 0 fi Per the modified script the above, here is the output displayed when running the script: root@dsanb1-coy:/srv# /sbin/mkcephfs -c /etc/ceph/ceph.conf --allhosts --mkbtrfs -k /etc/ceph/keyring --crushmapsrc crushfile.txt -v temp dir is /tmp/mkcephfs.uelzdJ82ej preparing monmap in /tmp/mkcephfs.uelzdJ82ej/monmap /usr/bin/monmaptool --create --clobber --add alpha 10.32.0.10:6789 --add bravo 10.32.0.25:6789 --add charlie 10.32.0.11:6789 --print /tmp/mkcephfs.uelzdJ82ej/monmap /usr/bin/monmaptool: monmap file /tmp/mkcephfs.uelzdJ82ej/monmap /usr/bin/monmaptool: generated fsid b254abdd-e036-4186-b6d5-e32b14e53b45 epoch 0 fsid b254abdd-e036-4186-b6d5-e32b14e53b45 last_changed 2012-07-06 12:31:38.416848 created 2012-07-06 12:31:38.416848 0: 10.32.0.10:6789/0 mon.alpha 1: 10.32.0.11:6789/0 mon.charlie 2: 10.32.0.25:6789/0 mon.bravo /usr/bin/monmaptool: writing epoch 0 to /tmp/mkcephfs.uelzdJ82ej/monmap (3 monitors) /usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.0 user === osd.0 === --- dsanb1-coy# /sbin/mkcephfs -d /tmp/mkcephfs.uelzdJ82ej --prepare-osdfs osd.0 umount: /srv/osd.0: not mounted umount: /dev/sdc: not mounted RUNNING: mkfs.btrfs /dev/sdc WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using fs created label (null) on /dev/sdc nodesize 4096 leafsize 4096 sectorsize 4096 size 1.82TB Btrfs Btrfs v0.19 Scanning for Btrfs filesystems RUNNING: mount -t btrfs -o noatime /dev/sdc /srv/osd.0 mount: wrong fs type, bad option, bad superblock on /dev/sdc, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so failed: '/sbin/mkcephfs -d /tmp/mkcephfs.uelzdJ82ej --prepare-osdfs osd.0' Which clearly isolates the issue to the mount command line. The trouble is, I can run this precise line on the command line directly without error: root@dsanb1-coy:/srv# mount -t btrfs -o noatime /dev/sdc /srv/osd.0 root@dsanb1-coy:/srv# mount | grep btrfs /dev/sdc on /srv/osd.0 type btrfs (rw,noatime) Therefore, what could possibly be preventing the mkcephfs running a simple mount command on the first OSD disk it gets to, that otherwise works fine from the command line? Many thanks Sage Paul PS: changing the btrfs device scan || btrfsctl -a line as proposed had no effect, and neither did putting in a sleep 10 immediately before the mount line. PPS: zerofilling the /dev/sdc and then re-creating a partition and mounting manually, then writing data to it is all fine. Same errors if we substitute any of the other HDD's in the server as 1st/osd.0. Ie, cannot see any issues with the hardware. -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Sage Weil Sent: Friday, 6 July 2012 8:18 AM To: Paul Pettigrew Cc: ceph-devel@vger.kernel.org Subject: Re: mkcephfs failing on v0.48 argonaut Hi Paul, On Wed, 4 Jul 2012, Paul Pettigrew wrote: Firstly, well done guys on achieving this version milestone. I successfully upgraded to the 0.48 format uneventfully on a live (test) system. The same system was then going through rebuild testing, to confirm that also worked fine. Unfortunately, the mkcephfs command is failing: root@dsanb1-coy:~# mkcephfs -c /etc/ceph/ceph.conf --allhosts --mkbtrfs -k /etc/ceph/keyring --crushmapsrc crushfile.txt -v temp dir is /tmp/mkcephfs.GaRCZ9i06a preparing monmap in /tmp/mkcephfs.GaRCZ9i06a/monmap /usr/bin/monmaptool --create --clobber --add alpha 10.32.0.10:6789 --add bravo 10.32.0.25:6789 --add charlie 10.32.0.11:6789 --print /tmp/mkcephfs.GaRCZ9i06a/monmap /usr/bin/monmaptool: monmap file /tmp/mkcephfs.GaRCZ9i06a/monmap /usr/bin/monmaptool: generated fsid c7202495-468c-4678-b678-115c3ee33402 epoch 0 fsid c7202495-468c-4678-b678-115c3ee33402 last_changed 2012-07-04 15:02:31.732275 created 2012-07-04 15:02:31.732275 0: 10.32.0.10:6789/0 mon.alpha 1: 10.32.0.11:6789/0 mon.charlie 2: 10.32.0.25:6789/0 mon.bravo /usr/bin/monmaptool: writing epoch 0 to /tmp/mkcephfs.GaRCZ9i06a/monmap (3 monitors) /usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.0 user === osd.0 === --- dsanb1-coy# /sbin/mkcephfs -d /tmp/mkcephfs.GaRCZ9i06a
Re: speedup ceph / scaling / find the bottleneck
Hi, Stefan is on vacation for the moment,I don't know if he can reply you. But I can reoly for him for the kvm part (as we do same tests together in parallel). - kvm is 1.1 - rbd 0.48 - drive option rbd:pool/volume:auth_supported=cephx;none;keyring=/etc/pve/priv/ceph/ceph.keyring:mon_host=X.X.X.X; -using writeback writeback tuning in ceph.conf on the kvm host rbd_cache_size = 33554432 rbd_cache_max_age = 2.0 benchmark use in kvm guest: fio --filename=$DISK --direct=1 --rw=randwrite --bs=4k --size=200G --numjobs=50 --runtime=90 --group_reporting --name=file1 results show max 14000io/s with 1 vm, 7000io/s by vm with 2vm,... so it doesn't scale (bench is with directio, so maybe writeback cache don't help) hardware for ceph , is 3 nodes with 4 intel ssd each. (1 drive can handle 4io/s randwrite locally) - Alexandre - Mail original - De: Gregory Farnum g...@inktank.com À: Stefan Priebe s.pri...@profihost.ag Cc: ceph-devel@vger.kernel.org, Sage Weil s...@inktank.com Envoyé: Jeudi 5 Juillet 2012 23:33:18 Objet: Re: speedup ceph / scaling / find the bottleneck Could you send over the ceph.conf on your KVM host, as well as how you're configuring KVM to use rbd? On Tue, Jul 3, 2012 at 11:20 AM, Stefan Priebe s.pri...@profihost.ag wrote: I'm sorry but this is the KVM Host Machine there is no ceph running on this machine. If i change the admin socket to: admin_socket=/var/run/ceph_$name.sock i don't have any socket at all ;-( Am 03.07.2012 17:31, schrieb Sage Weil: On Tue, 3 Jul 2012, Stefan Priebe - Profihost AG wrote: Hello, Am 02.07.2012 22:30, schrieb Josh Durgin: If you add admin_socket=/path/to/admin_socket for your client running qemu (in that client's ceph.conf section or manually in the qemu command line) you can check that caching is enabled: ceph --admin-daemon /path/to/admin_socket show config | grep rbd_cache And see statistics it generates (look for cache) with: ceph --admin-daemon /path/to/admin_socket perfcounters_dump This doesn't work for me: ceph --admin-daemon /var/run/ceph.sock show config read only got 0 bytes of 4 expected for response length; invalid command?2012-07-03 09:46:57.931821 7fa75d129700 -1 asok(0x8115a0) AdminSocket: request 'show config' not defined Oh, it's 'config show'. Also, 'help' will list the supported commands. Also perfcounters does not show anything: # ceph --admin-daemon /var/run/ceph.sock perfcounters_dump {} There may be another daemon that tried to attach to the same socket file. You might want to set 'admin socket = /var/run/ceph/$name.sock' or something similar, or whatever else is necessary to make it a unique file. ~]# ceph -v ceph version 0.48argonaut-2-gb576faa (commit:b576faa6f24356f4d3ec7205e298d58659e29c68) Out of curiousity, what patches are you applying on top of the release? sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- -- Alexandre D e rumier Ingénieur Systèmes et Réseaux Fixe : 03 20 68 88 85 Fax : 03 20 68 90 88 45 Bvd du Général Leclerc 59100 Roubaix 12 rue Marivaux 75002 Paris -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange behavior after upgrading to 0.48
On Fri, 6 Jul 2012, Mark Kirkwood wrote: On 06/07/12 14:38, Xiaopong Tran wrote: Thanks for the quick reply, I didn't have the computer with me last night. But you were right. I checked the version of ceph on ubuntu, and it's still stuck with 0.47.3, despite upgrading. I redid the upgrade, and it's still stuck with that version. That's something I didn't pay attention to. I had to purge the ceph, ceph-common and other related packages, and re-install it, then I got 0.48. And now ceph -s works just as it should. So, somehow, the upgrade on ubuntu does not work properly. Thinking about this issue just right now, I think ceph -s still worked right because there was still an older version of mon when the first two nodes were being upgraded. When the last one was upgraded, there's no mon of the same version anymore. Sorry, should have checked if apt upgrade was done properly first :) FYI: I ran into this too - you need to do: apt-get dist-upgrade for the 0.47-2 packages to be replaced by 0.48 (of course purging 'em and reinstalling works too...just a bit more drastic)! That's strange... anyone know why? sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange behavior after upgrading to 0.48
On 06/07/12 16:17, Sage Weil wrote: On Fri, 6 Jul 2012, Mark Kirkwood wrote: FYI: I ran into this too - you need to do: apt-get dist-upgrade for the 0.47-2 packages to be replaced by 0.48 (of course purging 'em and reinstalling works too...just a bit more drastic)! That's strange... anyone know why? sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html From the apt-get manual: upgrade upgrade is used to install the newest versions of all packages currently installed on the system from the sources enumerated in /etc/apt/sources.list. Packages currently installed with new versions available are retrieved and upgraded; under no circumstances are currently installed packages removed, or packages not already installed retrieved and installed. New versions of currently installed packages that cannot be upgraded without changing the install status of another package will be left at their current version. An update must be performed first so that apt-get knows that new versions of packages are available. dist-upgrade dist-upgrade in addition to performing the function of upgrade, also intelligently handles changing dependencies with new versions of packages; apt-get has a smart conflict resolution system, and it will attempt to upgrade the most important packages at the expense of less important ones if necessary. So, dist-upgrade command may remove some packages. The /etc/apt/sources.list file contains a list of locations from which to retrieve desired package files. See also apt_preferences(5) for a mechanism for overriding the general settings for individual packages. Does 0.48 have new dependancies perhaps? -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange behavior after upgrading to 0.48
On Fri, 6 Jul 2012, Mark Kirkwood wrote: On 06/07/12 16:17, Sage Weil wrote: On Fri, 6 Jul 2012, Mark Kirkwood wrote: FYI: I ran into this too - you need to do: apt-get dist-upgrade for the 0.47-2 packages to be replaced by 0.48 (of course purging 'em and reinstalling works too...just a bit more drastic)! That's strange... anyone know why? sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html From the apt-get manual: upgrade upgrade is used to install the newest versions of all packages currently installed on the system from the sources enumerated in /etc/apt/sources.list. Packages currently installed with new versions available are retrieved and upgraded; under no circumstances are currently installed packages removed, or packages not already installed retrieved and installed. New versions of currently installed packages that cannot be upgraded without changing the install status of another package will be left at their current version. An update must be performed first so that apt-get knows that new versions of packages are available. dist-upgrade dist-upgrade in addition to performing the function of upgrade, also intelligently handles changing dependencies with new versions of packages; apt-get has a smart conflict resolution system, and it will attempt to upgrade the most important packages at the expense of less important ones if necessary. So, dist-upgrade command may remove some packages. The /etc/apt/sources.list file contains a list of locations from which to retrieve desired package files. See also apt_preferences(5) for a mechanism for overriding the general settings for individual packages. Does 0.48 have new dependancies perhaps? Oh, yeah. We switched to libnss from libcrypto++ by default, among other things; that would explain it! Thanks- sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html