Re: [ceph-users] OSD Segfaults after Bluestore conversion
I'm following up from awhile ago. I don't think this is the same bug. The bug referenced shows "abort: Corruption: block checksum mismatch", and I'm not seeing that on mine. Now I've had 8 OSDs down on this one server for a couple of weeks, and I just tried to start it back up. Here's a link to the log of that OSD (which segfaulted right after starting up): http://people.beocat.ksu.edu/~kylehutson/ceph-osd.414.log To me, it looks like the logs are providing surprisingly few hints as to where the problem lies. Is there a way I can turn up logging to see if I can get any more info as to why this is happening? On Thu, Feb 8, 2018 at 3:02 AM, Mike O'Connor <m...@oeg.com.au> wrote: > On 7/02/2018 8:23 AM, Kyle Hutson wrote: > > We had a 26-node production ceph cluster which we upgraded to Luminous > > a little over a month ago. I added a 27th-node with Bluestore and > > didn't have any issues, so I began converting the others, one at a > > time. The first two went off pretty smoothly, but the 3rd is doing > > something strange. > > > > Initially, all the OSDs came up fine, but then some started to > > segfault. Out of curiosity more than anything else, I did reboot the > > server to see if it would get better or worse, and it pretty much > > stayed the same - 12 of the 18 OSDs did not properly come up. Of > > those, 3 again segfaulted > > > > I picked one that didn't properly come up and copied the log to where > > anybody can view it: > > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.426.log > > <http://people.beocat.ksu.edu/%7Ekylehutson/ceph-osd.426.log> > > > > You can contrast that with one that is up: > > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.428.log > > <http://people.beocat.ksu.edu/%7Ekylehutson/ceph-osd.428.log> > > > > (which is still showing segfaults in the logs, but seems to be > > recovering from them OK?) > > > > Any ideas? > Ideas ? yes > > There is a a bug which is hitting a small number of systems and at this > time there is no solution. Issues details at > http://tracker.ceph.com/issues/22102. > > Please submit more details of your problem on the ticket. > > Mike > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OSD Segfaults after Bluestore conversion
We had a 26-node production ceph cluster which we upgraded to Luminous a little over a month ago. I added a 27th-node with Bluestore and didn't have any issues, so I began converting the others, one at a time. The first two went off pretty smoothly, but the 3rd is doing something strange. Initially, all the OSDs came up fine, but then some started to segfault. Out of curiosity more than anything else, I did reboot the server to see if it would get better or worse, and it pretty much stayed the same - 12 of the 18 OSDs did not properly come up. Of those, 3 again segfaulted I picked one that didn't properly come up and copied the log to where anybody can view it: http://people.beocat.ksu.edu/~kylehutson/ceph-osd.426.log You can contrast that with one that is up: http://people.beocat.ksu.edu/~kylehutson/ceph-osd.428.log (which is still showing segfaults in the logs, but seems to be recovering from them OK?) Any ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v9.1.0 Infernalis release candidate released
Nice! Thanks! On Wed, Oct 14, 2015 at 1:23 PM, Sage Weil <s...@newdream.net> wrote: > On Wed, 14 Oct 2015, Kyle Hutson wrote: > > > Which bug? We want to fix hammer, too! > > > > This > > one: > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg23915.html > > > > (Adam sits about 5' from me.) > > Oh... that fix is already in the hammer branch and will be in 0.94.4. > Since you have to go to that anyway before infernalis you may as well stop > there (unless there is something else you want from internalis!). > > sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v9.1.0 Infernalis release candidate released
A couple of questions related to this, especially since we have a hammer bug that's biting us so we're anxious to upgrade to Infernalis. 1) RE: ibrbd and librados ABI compatibility is broken. Be careful installing this RC on client machines (e.g., those running qemu). It will be fixed in the final v9.2.0 release. We have several qemu clients. If we upgrade the ceph servers (and not the qemu clients), will this affect us? 2) RE: Upgrading directly from Firefly v0.80.z is not possible. All clusters must first upgrade to Hammer v0.94.4 or a later v0.94.z release; only then is it possible to upgrade to Infernalis 9.2.z. I think I understand this, but want to verify. We're on 0.94.3. Can we upgrade to the RC 9.1.0 and then safely upgrade to 9.2.z when it is finalized? Any foreseen issues with this upgrade path? On Wed, Oct 14, 2015 at 7:30 AM, Sage Weilwrote: > On Wed, 14 Oct 2015, Dan van der Ster wrote: > > Hi Goncalo, > > > > On Wed, Oct 14, 2015 at 6:51 AM, Goncalo Borges > > wrote: > > > Hi Sage... > > > > > > I've seen that the rh6 derivatives have been ruled out. > > > > > > This is a problem in our case since the OS choice in our systems is, > > > somehow, imposed by CERN. The experiments software is certified for > SL6 and > > > the transition to SL7 will take some time. > > > > Are you accessing Ceph directly from "physics" machines? Here at CERN > > we run CentOS 7 on the native clients (e.g. qemu-kvm hosts) and by the > > time we upgrade to Infernalis the servers will all be CentOS 7 as > > well. Batch nodes running SL6 don't (currently) talk to Ceph directly > > (in the future they might talk to Ceph-based storage via an xroot > > gateway). But if there are use-cases then perhaps we could find a > > place to build and distributing the newer ceph clients. > > > > There's a ML ceph-t...@cern.ch where we could take this discussion. > > Mail me if have trouble joining that e-Group. > > Also note that it *is* possible to build infernalis on el6, but it > requires a lot more effort... enough that we would rather spend our time > elsewhere (at least as far as ceph.com packages go). If someone else > wants to do that work we'd be happy to take patches to update the and/or > release process. > > IIRC the thing that eventually made me stop going down this patch was the > fact that the newer gcc had a runtime dependency on the newer libstdc++, > which wasn't part of the base distro... which means we'd need also to > publish those packages in the ceph.com repos, or users would have to > add some backport repo or ppa or whatever to get things running. Bleh. > > sage > > > > > > Cheers, Dan > > CERN IT-DSS > > > > > This is kind of a showstopper specially if we can't deploy clients in > SL6 / > > > Centos6. > > > > > > Is there any alternative? > > > > > > TIA > > > Goncalo > > > > > > > > > > > > On 10/14/2015 08:01 AM, Sage Weil wrote: > > >> > > >> This is the first Infernalis release candidate. There have been some > > >> major changes since hammer, and the upgrade process is non-trivial. > > >> Please read carefully. > > >> > > >> Getting the release candidate > > >> - > > >> > > >> The v9.1.0 packages are pushed to the development release > repositories:: > > >> > > >>http://download.ceph.com/rpm-testing > > >>http://download.ceph.com/debian-testing > > >> > > >> For for info, see:: > > >> > > >>http://docs.ceph.com/docs/master/install/get-packages/ > > >> > > >> Or install with ceph-deploy via:: > > >> > > >>ceph-deploy install --testing HOST > > >> > > >> Known issues > > >> > > >> > > >> * librbd and librados ABI compatibility is broken. Be careful > > >>installing this RC on client machines (e.g., those running qemu). > > >>It will be fixed in the final v9.2.0 release. > > >> > > >> Major Changes from Hammer > > >> - > > >> > > >> * *General*: > > >>* Ceph daemons are now managed via systemd (with the exception of > > >> Ubuntu Trusty, which still uses upstart). > > >>* Ceph daemons run as 'ceph' user instead root. > > >>* On Red Hat distros, there is also an SELinux policy. > > >> * *RADOS*: > > >>* The RADOS cache tier can now proxy write operations to the base > > >> tier, allowing writes to be handled without forcing migration of > > >> an object into the cache. > > >>* The SHEC erasure coding support is no longer flagged as > > >> experimental. SHEC trades some additional storage space for > faster > > >> repair. > > >>* There is now a unified queue (and thus prioritization) of client > > >> IO, recovery, scrubbing, and snapshot trimming. > > >>* There have been many improvements to low-level repair tooling > > >> (ceph-objectstore-tool). > > >>* The internal ObjectStore API has been significantly cleaned up in > > >> order > > >> to faciliate new storage backends like
Re: [ceph-users] v9.1.0 Infernalis release candidate released
> Which bug? We want to fix hammer, too! This one: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg23915.html (Adam sits about 5' from me.) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS and caching
A 'rados -p cachepool ls' takes about 3 hours - not exactly useful. I'm intrigued that you say a single read may not promote it into the cache. My understanding is that if you have an EC-backed pool the clients can't talk to them directly, which means they would necessarily be promoted to the cache pool so the client could read it. Is my understanding wrong? I'm also wondering if it's possible to use RAM as a read-cache layer. Obviously, we don't want this for write-cache because of power outages, motherboard failures, etc., but it seems to make sense for a read-cache. Is that something that's being done, can be done, is going to be done, or has even been considered? On Wed, Sep 9, 2015 at 10:33 AM, Gregory Farnum <gfar...@redhat.com> wrote: > On Wed, Sep 9, 2015 at 4:26 PM, Kyle Hutson <kylehut...@ksu.edu> wrote: > > > > > > On Wed, Sep 9, 2015 at 9:34 AM, Gregory Farnum <gfar...@redhat.com> > wrote: > >> > >> On Wed, Sep 9, 2015 at 3:27 PM, Kyle Hutson <kylehut...@ksu.edu> wrote: > >> > We are using Hammer - latest released version. How do I check if it's > >> > getting promoted into the cache? > >> > >> Umm...that's a good question. You can run rados ls on the cache pool, > >> but that's not exactly scalable; you can turn up logging and dig into > >> them to see if redirects are happening, or watch the OSD operations > >> happening via the admin socket. But I don't know if there's a good > >> interface for users to just query the cache state of a single object. > >> :/ > > > > > > even using 'rados ls', I (naturally) get cephfs object names - is there a > > way to see a filename -> objectname conversion ... or objectname -> > filename > > ? > > The object name is .. So you can > look at the file inode and then see which of its objects are actually > in the pool. > -Greg > > > > >> > >> > We're using the latest ceph kernel client. Where do I poke at > readahead > >> > settings there? > >> > >> Just the standard kernel readahead settings; I'm not actually familiar > >> with how to configure those but I don't believe Ceph's are in any way > >> special. What do you mean by "latest ceph kernel client"; are you > >> running one of the developer testing kernels or something? > > > > > > No, just what comes with the latest stock kernel. Sorry for any > confusion. > > > >> > >> I think > >> Ilya might have mentioned some issues with readahead being > >> artificially blocked, but that might have only been with RBD. > >> > >> Oh, are the files you're using sparse? There was a bug with sparse > >> files not filling in pages that just got patched yesterday or > >> something. > > > > > > No, these are not sparse files. Just really big. > > > >> > >> > > >> > On Tue, Sep 8, 2015 at 8:29 AM, Gregory Farnum <gfar...@redhat.com> > >> > wrote: > >> >> > >> >> On Thu, Sep 3, 2015 at 11:58 PM, Kyle Hutson <kylehut...@ksu.edu> > >> >> wrote: > >> >> > I was wondering if anybody could give me some insight as to how > >> >> > CephFS > >> >> > does > >> >> > its caching - read-caching in particular. > >> >> > > >> >> > We are using CephFS with an EC pool on the backend with a > replicated > >> >> > cache > >> >> > pool in front of it. We're seeing some very slow read times. Trying > >> >> > to > >> >> > compute an md5sum on a 15GB file twice in a row (so it should be in > >> >> > cache) > >> >> > takes the time from 23 minutes down to 17 minutes, but this is > over a > >> >> > 10Gbps > >> >> > network and with a crap-ton of OSDs (over 300), so I would expect > it > >> >> > to > >> >> > be > >> >> > down in the 2-3 minute range. > >> >> > >> >> A single sequential read won't necessarily promote an object into the > >> >> cache pool (although if you're using Hammer I think it will), so you > >> >> want to check if it's actually getting promoted into the cache before > >> >> assuming that's happened. > >> >> > >> >> > > >> >> > I'm just trying to figure out what we can do to increase the > >> >> > performance. I > >> >> > have over 30
Re: [ceph-users] CephFS and caching
We are using Hammer - latest released version. How do I check if it's getting promoted into the cache? We're using the latest ceph kernel client. Where do I poke at readahead settings there? On Tue, Sep 8, 2015 at 8:29 AM, Gregory Farnum <gfar...@redhat.com> wrote: > On Thu, Sep 3, 2015 at 11:58 PM, Kyle Hutson <kylehut...@ksu.edu> wrote: > > I was wondering if anybody could give me some insight as to how CephFS > does > > its caching - read-caching in particular. > > > > We are using CephFS with an EC pool on the backend with a replicated > cache > > pool in front of it. We're seeing some very slow read times. Trying to > > compute an md5sum on a 15GB file twice in a row (so it should be in > cache) > > takes the time from 23 minutes down to 17 minutes, but this is over a > 10Gbps > > network and with a crap-ton of OSDs (over 300), so I would expect it to > be > > down in the 2-3 minute range. > > A single sequential read won't necessarily promote an object into the > cache pool (although if you're using Hammer I think it will), so you > want to check if it's actually getting promoted into the cache before > assuming that's happened. > > > > > I'm just trying to figure out what we can do to increase the > performance. I > > have over 300 TB of live data that I have to be careful with, though, so > I > > have to have some level of caution. > > > > Is there some other caching we can do (client-side or server-side) that > > might give us a decent performance boost? > > Which client are you using for this testing? Have you looked at the > readahead settings? That's usually the big one; if you're only asking > for 4KB at once then stuff is going to be slow no matter what (a > single IO takes at minimum about 2 milliseconds right now, although > the RADOS team is working to improve that). > -Greg > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS and caching
On Wed, Sep 9, 2015 at 9:34 AM, Gregory Farnum <gfar...@redhat.com> wrote: > On Wed, Sep 9, 2015 at 3:27 PM, Kyle Hutson <kylehut...@ksu.edu> wrote: > > We are using Hammer - latest released version. How do I check if it's > > getting promoted into the cache? > > Umm...that's a good question. You can run rados ls on the cache pool, > but that's not exactly scalable; you can turn up logging and dig into > them to see if redirects are happening, or watch the OSD operations > happening via the admin socket. But I don't know if there's a good > interface for users to just query the cache state of a single object. > :/ > even using 'rados ls', I (naturally) get cephfs object names - is there a way to see a filename -> objectname conversion ... or objectname -> filename ? > > We're using the latest ceph kernel client. Where do I poke at readahead > > settings there? > > Just the standard kernel readahead settings; I'm not actually familiar > with how to configure those but I don't believe Ceph's are in any way > special. What do you mean by "latest ceph kernel client"; are you > running one of the developer testing kernels or something? No, just what comes with the latest stock kernel. Sorry for any confusion. > I think > Ilya might have mentioned some issues with readahead being > artificially blocked, but that might have only been with RBD. > > Oh, are the files you're using sparse? There was a bug with sparse > files not filling in pages that just got patched yesterday or > something. > No, these are not sparse files. Just really big. > > > > On Tue, Sep 8, 2015 at 8:29 AM, Gregory Farnum <gfar...@redhat.com> > wrote: > >> > >> On Thu, Sep 3, 2015 at 11:58 PM, Kyle Hutson <kylehut...@ksu.edu> > wrote: > >> > I was wondering if anybody could give me some insight as to how CephFS > >> > does > >> > its caching - read-caching in particular. > >> > > >> > We are using CephFS with an EC pool on the backend with a replicated > >> > cache > >> > pool in front of it. We're seeing some very slow read times. Trying to > >> > compute an md5sum on a 15GB file twice in a row (so it should be in > >> > cache) > >> > takes the time from 23 minutes down to 17 minutes, but this is over a > >> > 10Gbps > >> > network and with a crap-ton of OSDs (over 300), so I would expect it > to > >> > be > >> > down in the 2-3 minute range. > >> > >> A single sequential read won't necessarily promote an object into the > >> cache pool (although if you're using Hammer I think it will), so you > >> want to check if it's actually getting promoted into the cache before > >> assuming that's happened. > >> > >> > > >> > I'm just trying to figure out what we can do to increase the > >> > performance. I > >> > have over 300 TB of live data that I have to be careful with, though, > so > >> > I > >> > have to have some level of caution. > >> > > >> > Is there some other caching we can do (client-side or server-side) > that > >> > might give us a decent performance boost? > >> > >> Which client are you using for this testing? Have you looked at the > >> readahead settings? That's usually the big one; if you're only asking > >> for 4KB at once then stuff is going to be slow no matter what (a > >> single IO takes at minimum about 2 milliseconds right now, although > >> the RADOS team is working to improve that). > >> -Greg > >> > >> > > >> > ___ > >> > ceph-users mailing list > >> > ceph-users@lists.ceph.com > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > > > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CephFS and caching
I was wondering if anybody could give me some insight as to how CephFS does its caching - read-caching in particular. We are using CephFS with an EC pool on the backend with a replicated cache pool in front of it. We're seeing some very slow read times. Trying to compute an md5sum on a 15GB file twice in a row (so it should be in cache) takes the time from 23 minutes down to 17 minutes, but this is over a 10Gbps network and with a crap-ton of OSDs (over 300), so I would expect it to be down in the 2-3 minute range. I'm just trying to figure out what we can do to increase the performance. I have over 300 TB of live data that I have to be careful with, though, so I have to have some level of caution. Is there some other caching we can do (client-side or server-side) that might give us a decent performance boost? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] mds crashing
I upgraded to 0.94.1 from 0.94 on Monday, and everything had been going pretty well. Then, about noon today, we had an mds crash. And then the failover mds crashed. And this cascaded through all 4 mds servers we have. If I try to start it ('service ceph start mds' on CentOS 7.1), it appears to be OK for a little while. ceph -w goes through 'replay' 'reconnect' 'rejoin' 'clientreplay' and 'active' but nearly immediately after getting to 'active', it crashes again. I have the mds log at http://people.beocat.cis.ksu.edu/~kylehutson/ceph-mds.hobbit01.log For the possibly, but not necessarily, useful background info. - Yesterday we took our erasure coded pool and increased both pg_num and pgp_num from 2048 to 4096. We still have several objects misplaced (~17%), but those seem to be continuing to clean themselves up. - We are in the midst of a large (300+ TB) rsync from our old (non-ceph) filesystem to this filesystem. - Before we realized the mds crashes, we had just changed the size of our metadata pool from 2 to 4. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds crashing
Thank you, John! That was exactly the bug we were hitting. My Google-fu didn't lead me to this one. On Wed, Apr 15, 2015 at 4:16 PM, John Spray john.sp...@redhat.com wrote: On 15/04/2015 20:02, Kyle Hutson wrote: I upgraded to 0.94.1 from 0.94 on Monday, and everything had been going pretty well. Then, about noon today, we had an mds crash. And then the failover mds crashed. And this cascaded through all 4 mds servers we have. If I try to start it ('service ceph start mds' on CentOS 7.1), it appears to be OK for a little while. ceph -w goes through 'replay' 'reconnect' 'rejoin' 'clientreplay' and 'active' but nearly immediately after getting to 'active', it crashes again. I have the mds log at http://people.beocat.cis.ksu. edu/~kylehutson/ceph-mds.hobbit01.log http://people.beocat.cis.ksu. edu/%7Ekylehutson/ceph-mds.hobbit01.log For the possibly, but not necessarily, useful background info. - Yesterday we took our erasure coded pool and increased both pg_num and pgp_num from 2048 to 4096. We still have several objects misplaced (~17%), but those seem to be continuing to clean themselves up. - We are in the midst of a large (300+ TB) rsync from our old (non-ceph) filesystem to this filesystem. - Before we realized the mds crashes, we had just changed the size of our metadata pool from 2 to 4. It looks like you're seeing http://tracker.ceph.com/issues/10449, which is a situation where the SessionMap object becomes too big for the MDS to save.The cause of it in that case was stuck requests from a misbehaving client running a slightly older kernel. Assuming you're using the kernel client and having a similar problem, you could try to work around this situation by forcibly unmounting the clients while the MDS is offline, such that during clientreplay the MDS will remove them from the SessionMap after timing out, and then next time it tries to save the map it won't be oversized. If that works, you could then look into getting newer kernels on the clients to avoid hitting the issue again -- the #10449 ticket has some pointers about which kernel changes were relevant. Cheers, John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] protocol feature mismatch after upgrading to Hammer
I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly repeating this message: 2015-04-09 08:50:26.318042 7f95dbf86700 0 -- 10.5.38.1:0/2037478 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1 c=0x7f95e0023670).connect protocol feature mismatch, my 3fff peer 13fff missing 1 It isn't always the same IP for the destination - here's another: 2015-04-09 08:50:20.322059 7f95dc087700 0 -- 10.5.38.1:0/2037478 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1 c=0x7f95e002b480).connect protocol feature mismatch, my 3fff peer 13fff missing 1 Some details about our install: We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts are mds. We are running cephfs with a client trying to write data over cephfs when we're seeing these messages. Any ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] protocol feature mismatch after upgrading to Hammer
http://people.beocat.cis.ksu.edu/~kylehutson/crushmap On Thu, Apr 9, 2015 at 11:25 AM, Gregory Farnum g...@gregs42.com wrote: Hmmm. That does look right and neither I nor Sage can come up with anything via code inspection. Can you post the actual binary crush map somewhere for download so that we can inspect it with our tools? -Greg On Thu, Apr 9, 2015 at 7:57 AM, Kyle Hutson kylehut...@ksu.edu wrote: Here 'tis: https://dpaste.de/POr1 On Thu, Apr 9, 2015 at 9:49 AM, Gregory Farnum g...@gregs42.com wrote: Can you dump your crush map and post it on pastebin or something? On Thu, Apr 9, 2015 at 7:26 AM, Kyle Hutson kylehut...@ksu.edu wrote: Nope - it's 64-bit. (Sorry, I missed the reply-all last time.) On Thu, Apr 9, 2015 at 9:24 AM, Gregory Farnum g...@gregs42.com wrote: [Re-added the list] Hmm, I'm checking the code and that shouldn't be possible. What's your ciient? (In particular, is it 32-bit? That's the only thing i can think of that might have slipped through our QA.) On Thu, Apr 9, 2015 at 7:17 AM, Kyle Hutson kylehut...@ksu.edu wrote: I did nothing to enable anything else. Just changed my ceph repo from 'giant' to 'hammer', then did 'yum update' and restarted services. On Thu, Apr 9, 2015 at 9:15 AM, Gregory Farnum g...@gregs42.com wrote: Did you enable the straw2 stuff? CRUSHV4 shouldn't be required by the cluster unless you made changes to the layout requiring it. If you did, the clients have to be upgraded to understand it. You could disable all the v4 features; that should let them connect again. -Greg On Thu, Apr 9, 2015 at 7:07 AM, Kyle Hutson kylehut...@ksu.edu wrote: This particular problem I just figured out myself ('ceph -w' was still running from before the upgrade, and ctrl-c and restarting solved that issue), but I'm still having a similar problem on the ceph client: libceph: mon19 10.5.38.20:6789 feature set mismatch, my 2b84a042aca server's 102b84a042aca, missing 1 It appears that even the latest kernel doesn't have support for CEPH_FEATURE_CRUSH_V4 How do I make my ceph cluster backward-compatible with the old cephfs client? On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu wrote: I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly repeating this message: 2015-04-09 08:50:26.318042 7f95dbf86700 0 -- 10.5.38.1:0/2037478 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1 c=0x7f95e0023670).connect protocol feature mismatch, my 3fff peer 13fff missing 1 It isn't always the same IP for the destination - here's another: 2015-04-09 08:50:20.322059 7f95dc087700 0 -- 10.5.38.1:0/2037478 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1 c=0x7f95e002b480).connect protocol feature mismatch, my 3fff peer 13fff missing 1 Some details about our install: We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts are mds. We are running cephfs with a client trying to write data over cephfs when we're seeing these messages. Any ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] protocol feature mismatch after upgrading to Hammer
Here 'tis: https://dpaste.de/POr1 On Thu, Apr 9, 2015 at 9:49 AM, Gregory Farnum g...@gregs42.com wrote: Can you dump your crush map and post it on pastebin or something? On Thu, Apr 9, 2015 at 7:26 AM, Kyle Hutson kylehut...@ksu.edu wrote: Nope - it's 64-bit. (Sorry, I missed the reply-all last time.) On Thu, Apr 9, 2015 at 9:24 AM, Gregory Farnum g...@gregs42.com wrote: [Re-added the list] Hmm, I'm checking the code and that shouldn't be possible. What's your ciient? (In particular, is it 32-bit? That's the only thing i can think of that might have slipped through our QA.) On Thu, Apr 9, 2015 at 7:17 AM, Kyle Hutson kylehut...@ksu.edu wrote: I did nothing to enable anything else. Just changed my ceph repo from 'giant' to 'hammer', then did 'yum update' and restarted services. On Thu, Apr 9, 2015 at 9:15 AM, Gregory Farnum g...@gregs42.com wrote: Did you enable the straw2 stuff? CRUSHV4 shouldn't be required by the cluster unless you made changes to the layout requiring it. If you did, the clients have to be upgraded to understand it. You could disable all the v4 features; that should let them connect again. -Greg On Thu, Apr 9, 2015 at 7:07 AM, Kyle Hutson kylehut...@ksu.edu wrote: This particular problem I just figured out myself ('ceph -w' was still running from before the upgrade, and ctrl-c and restarting solved that issue), but I'm still having a similar problem on the ceph client: libceph: mon19 10.5.38.20:6789 feature set mismatch, my 2b84a042aca server's 102b84a042aca, missing 1 It appears that even the latest kernel doesn't have support for CEPH_FEATURE_CRUSH_V4 How do I make my ceph cluster backward-compatible with the old cephfs client? On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu wrote: I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly repeating this message: 2015-04-09 08:50:26.318042 7f95dbf86700 0 -- 10.5.38.1:0/2037478 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1 c=0x7f95e0023670).connect protocol feature mismatch, my 3fff peer 13fff missing 1 It isn't always the same IP for the destination - here's another: 2015-04-09 08:50:20.322059 7f95dc087700 0 -- 10.5.38.1:0/2037478 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1 c=0x7f95e002b480).connect protocol feature mismatch, my 3fff peer 13fff missing 1 Some details about our install: We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts are mds. We are running cephfs with a client trying to write data over cephfs when we're seeing these messages. Any ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] protocol feature mismatch after upgrading to Hammer
Nope - it's 64-bit. (Sorry, I missed the reply-all last time.) On Thu, Apr 9, 2015 at 9:24 AM, Gregory Farnum g...@gregs42.com wrote: [Re-added the list] Hmm, I'm checking the code and that shouldn't be possible. What's your ciient? (In particular, is it 32-bit? That's the only thing i can think of that might have slipped through our QA.) On Thu, Apr 9, 2015 at 7:17 AM, Kyle Hutson kylehut...@ksu.edu wrote: I did nothing to enable anything else. Just changed my ceph repo from 'giant' to 'hammer', then did 'yum update' and restarted services. On Thu, Apr 9, 2015 at 9:15 AM, Gregory Farnum g...@gregs42.com wrote: Did you enable the straw2 stuff? CRUSHV4 shouldn't be required by the cluster unless you made changes to the layout requiring it. If you did, the clients have to be upgraded to understand it. You could disable all the v4 features; that should let them connect again. -Greg On Thu, Apr 9, 2015 at 7:07 AM, Kyle Hutson kylehut...@ksu.edu wrote: This particular problem I just figured out myself ('ceph -w' was still running from before the upgrade, and ctrl-c and restarting solved that issue), but I'm still having a similar problem on the ceph client: libceph: mon19 10.5.38.20:6789 feature set mismatch, my 2b84a042aca server's 102b84a042aca, missing 1 It appears that even the latest kernel doesn't have support for CEPH_FEATURE_CRUSH_V4 How do I make my ceph cluster backward-compatible with the old cephfs client? On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu wrote: I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly repeating this message: 2015-04-09 08:50:26.318042 7f95dbf86700 0 -- 10.5.38.1:0/2037478 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1 c=0x7f95e0023670).connect protocol feature mismatch, my 3fff peer 13fff missing 1 It isn't always the same IP for the destination - here's another: 2015-04-09 08:50:20.322059 7f95dc087700 0 -- 10.5.38.1:0/2037478 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1 c=0x7f95e002b480).connect protocol feature mismatch, my 3fff peer 13fff missing 1 Some details about our install: We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts are mds. We are running cephfs with a client trying to write data over cephfs when we're seeing these messages. Any ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] protocol feature mismatch after upgrading to Hammer
This particular problem I just figured out myself ('ceph -w' was still running from before the upgrade, and ctrl-c and restarting solved that issue), but I'm still having a similar problem on the ceph client: libceph: mon19 10.5.38.20:6789 feature set mismatch, my 2b84a042aca server's 102b84a042aca, missing 1 It appears that even the latest kernel doesn't have support for CEPH_FEATURE_CRUSH_V4 How do I make my ceph cluster backward-compatible with the old cephfs client? On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu wrote: I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly repeating this message: 2015-04-09 08:50:26.318042 7f95dbf86700 0 -- 10.5.38.1:0/2037478 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1 c=0x7f95e0023670).connect protocol feature mismatch, my 3fff peer 13fff missing 1 It isn't always the same IP for the destination - here's another: 2015-04-09 08:50:20.322059 7f95dc087700 0 -- 10.5.38.1:0/2037478 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1 c=0x7f95e002b480).connect protocol feature mismatch, my 3fff peer 13fff missing 1 Some details about our install: We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts are mds. We are running cephfs with a client trying to write data over cephfs when we're seeing these messages. Any ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how do I destroy cephfs? (interested in cephfs + tiering + erasure coding)
For what it's worth, I don't think being patient was the answer. I was having the same problem a couple of weeks ago, and I waited from before 5pm one day until after 8am the next, and still got the same errors. I ended up adding a new cephfs pool with a newly-created small pool, but was never able to actually remove cephfs altogether. On Thu, Mar 26, 2015 at 12:45 PM, Jake Grimmett j...@mrc-lmb.cam.ac.uk wrote: On 03/25/2015 05:44 PM, Gregory Farnum wrote: On Wed, Mar 25, 2015 at 10:36 AM, Jake Grimmett j...@mrc-lmb.cam.ac.uk wrote: Dear All, Please forgive this post if it's naive, I'm trying to familiarise myself with cephfs! I'm using Scientific Linux 6.6. with Ceph 0.87.1 My first steps with cephfs using a replicated pool worked OK. Now trying now to test cephfs via a replicated caching tier on top of an erasure pool. I've created an erasure pool, cannot put it under the existing replicated pool. My thoughts were to delete the existing cephfs, and start again, however I cannot delete the existing cephfs: errors are as follows: [root@ceph1 ~]# ceph fs rm cephfs2 Error EINVAL: all MDS daemons must be inactive before removing filesystem I've tried killing the ceph-mds process, but this does not prevent the above error. I've also tried this, which also errors: [root@ceph1 ~]# ceph mds stop 0 Error EBUSY: must decrease max_mds or else MDS will immediately reactivate Right, so did you run ceph mds set_max_mds 0 and then repeating the stop command? :) This also fail... [root@ceph1 ~]# ceph-deploy mds destroy [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.21): /usr/bin/ceph-deploy mds destroy [ceph_deploy.mds][ERROR ] subcommand destroy not implemented Am I doing the right thing in trying to wipe the original cephfs config before attempting to use an erasure cold tier? Or can I just redefine the cephfs? Yeah, unfortunately you need to recreate it if you want to try and use an EC pool with cache tiering, because CephFS knows what pools it expects data to belong to. Things are unlikely to behave correctly if you try and stick an EC pool under an existing one. :( Sounds like this is all just testing, which is good because the suitability of EC+cache is very dependent on how much hot data you have, etc...good luck! -Greg many thanks, Jake Grimmett ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Thanks for your help - much appreciated. The set_max_mds 0 command worked, but only after I rebooted the server, and restarted ceph twice. Before this I still got an mds active error, and so was unable to destroy the cephfs. Possibly I was being impatient, and needed to let mds go inactive? there were ~1 million files on the system. [root@ceph1 ~]# ceph mds set_max_mds 0 max_mds = 0 [root@ceph1 ~]# ceph mds stop 0 telling mds.0 10.1.0.86:6811/3249 to deactivate [root@ceph1 ~]# ceph mds stop 0 Error EEXIST: mds.0 not active (up:stopping) [root@ceph1 ~]# ceph fs rm cephfs2 Error EINVAL: all MDS daemons must be inactive before removing filesystem There shouldn't be any other mds servers running.. [root@ceph1 ~]# ceph mds stop 1 Error EEXIST: mds.1 not active (down:dne) At this point I rebooted the server, did a service ceph restart twice. Shutdown ceph, then restarted ceph before this command worked: [root@ceph1 ~]# ceph fs rm cephfs2 --yes-i-really-mean-it Anyhow, I've now been able to create an erasure coded pool, with a replicated tier which cephfs is running on :) *Lots* of testing to go! Again, many thanks Jake ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] New EC pool undersized
Last night I blew away my previous ceph configuration (this environment is pre-production) and have 0.87.1 installed. I've manually edited the crushmap so it down looks like https://dpaste.de/OLEa I currently have 144 OSDs on 8 nodes. After increasing pg_num and pgp_num to a more suitable 1024 (due to the high number of OSDs), everything looked happy. So, now I'm trying to play with an erasure-coded pool. I did: ceph osd erasure-code-profile set ec44profile k=4 m=4 ruleset-failure-domain=rack ceph osd pool create ec44pool 8192 8192 erasure ec44profile After settling for a bit 'ceph status' gives cluster 196e5eb8-d6a7-4435-907e-ea028e946923 health HEALTH_WARN 7 pgs degraded; 7 pgs stuck degraded; 7 pgs stuck unclean; 7 pgs stuck undersized; 7 pgs undersized monmap e1: 4 mons at {hobbit01= 10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0}, election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14 osdmap e409: 144 osds: 144 up, 144 in pgmap v6763: 12288 pgs, 2 pools, 0 bytes data, 0 objects 90598 MB used, 640 TB / 640 TB avail 7 active+undersized+degraded 12281 active+clean So to troubleshoot the undersized pgs, I issued 'ceph pg dump_stuck' ok pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 1.d77 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:33:57.502849 0'0 408:12 [15,95,58,73,52,31,116,2147483647] 15 [15,95,58,73,52,31,116,2147483647] 15 0'0 2015-03-04 11:33:42.100752 0'0 2015-03-04 11:33:42.100752 1.10fa 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:34:29.362554 0'0 408:12 [23,12,99,114,132,53,56,2147483647] 23 [23,12,99,114,132,53,56,2147483647] 23 0'0 2015-03-04 11:33:42.168571 0'0 2015-03-04 11:33:42.168571 1.1271 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:33:48.795742 0'0 408:12 [135,112,69,4,22,95,2147483647,83] 135 [135,112,69,4,22,95,2147483647,83] 135 0'0 2015-03-04 11:33:42.139555 0'0 2015-03-04 11:33:42.139555 1.2b5 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:34:32.189738 0'0 408:12 [11,115,139,19,76,52,94,2147483647] 11 [11,115,139,19,76,52,94,2147483647] 11 0'0 2015-03-04 11:33:42.079673 0'0 2015-03-04 11:33:42.079673 1.7ae 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:34:26.848344 0'0 408:12 [27,5,132,119,94,56,52,2147483647] 27 [27,5,132,119,94,56,52,2147483647] 27 0'0 2015-03-04 11:33:42.109832 0'0 2015-03-04 11:33:42.109832 1.1a97 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:34:25.457454 0'0 408:12 [20,53,14,54,102,118,2147483647,72] 20 [20,53,14,54,102,118,2147483647,72] 20 0'0 2015-03-04 11:33:42.833850 0'0 2015-03-04 11:33:42.833850 1.10a6 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:34:30.059936 0'0 408:12 [136,22,4,2147483647,72,52,101,55] 136 [136,22,4,2147483647,72,52,101,55] 136 0'0 2015-03-04 11:33:42.125871 0'0 2015-03-04 11:33:42.125871 This appears to have a number on all these (2147483647) that is way out of line from what I would expect. Thoughts? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] New EC pool undersized
So it sounds like I should figure out at 'how many nodes' do I need to increase pg_num to 4096, and again for 8192, and increase those incrementally when as I add more hosts, correct? On Wed, Mar 4, 2015 at 3:04 PM, Don Doerner don.doer...@quantum.com wrote: Sorry, I missed your other questions, down at the bottom. See here http://ceph.com/docs/master/rados/operations/placement-groups/ (look for “number of replicas for replicated pools or the K+M sum for erasure coded pools”) for the formula; 38400/8 probably implies 8192. The thing is, you’ve got to think about how many ways you can form combinations of 8 unique OSDs (with replacement) that match your failure domain rules. If you’ve only got 8 hosts, and your failure domain is hosts, it severely limits this number. And I have read that too many isn’t good either – a serialization issue, I believe. -don- *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *Don Doerner *Sent:* 04 March, 2015 12:49 *To:* Kyle Hutson *Cc:* ceph-users@lists.ceph.com *Subject:* Re: [ceph-users] New EC pool undersized Hmmm, I just struggled through this myself. How many racks do you have? If not more than 8, you might want to make your failure domain smaller? I.e., maybe host? That, at least, would allow you to debug the situation… -don- *From:* Kyle Hutson [mailto:kylehut...@ksu.edu kylehut...@ksu.edu] *Sent:* 04 March, 2015 12:43 *To:* Don Doerner *Cc:* Ceph Users *Subject:* Re: [ceph-users] New EC pool undersized It wouldn't let me simply change the pg_num, giving Error EEXIST: specified pg_num 2048 = current 8192 But that's not a big deal, I just deleted the pool and recreated with 'ceph osd pool create ec44pool 2048 2048 erasure ec44profile' ...and the result is quite similar: 'ceph status' is now ceph status cluster 196e5eb8-d6a7-4435-907e-ea028e946923 health HEALTH_WARN 4 pgs degraded; 4 pgs stuck unclean; 4 pgs undersized monmap e1: 4 mons at {hobbit01= 10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0 https://urldefense.proofpoint.com/v1/url?u=http://10.5.38.1:6789/0%2Chobbit02%3D10.5.38.2:6789/0%2Chobbit13%3D10.5.38.13:6789/0%2Chobbit14%3D10.5.38.14:6789/0k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0Ar=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0Am=fHQcjtxx3uADdikQAQAh65Z0s%2FzNFIj544bRY5zThgI%3D%0As=01b7463be37041310163f5d75abc634fab3280633eaef2158ed6609c6f3978d8}, election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14 osdmap e412: 144 osds: 144 up, 144 in pgmap v6798: 6144 pgs, 2 pools, 0 bytes data, 0 objects 90590 MB used, 640 TB / 640 TB avail 4 active+undersized+degraded 6140 active+clean 'ceph pg dump_stuck results' in ok pg_stat objects mip degr misp unf bytes log disklog state state_stampvreported up up_primary actingacting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 2.296 00000000 active+undersized+degraded2015-03-04 14:33:26.672224 0'0 412:9 [5,55,91,2147483647,83,135,53,26] 5 [5,55,91,2147483647,83,135,53,26] 50'0 2015-03-04 14:33:15.649911 0'0 2015-03-04 14:33:15.649911 2.69c 00000000 active+undersized+degraded2015-03-04 14:33:24.984802 0'0 412:9 [93,134,1,74,112,28,2147483647,60] 93 [93,134,1,74,112,28,2147483647 ,60] 93 0'0 2015-03-04 14:33:15.695747 0'0 2015-03-04 14:33:15.695747 2.36d 00000000 active+undersized+degraded2015-03-04 14:33:21.937620 0'0 412:9 [12,108,136,104,52,18,63,2147483647]12 [12,108,136,104,52,18,63, 2147483647]12 0'0 2015-03-04 14:33:15.6524800'0 2015-03-04 14:33:15.652480 2.5f7 00000000 active+undersized+degraded2015-03-04 14:33:26.169242 0'0 412:9 [94,128,73,22,4,60,2147483647,113] 94 [94,128,73,22,4,60,2147483647 ,113] 94 0'0 2015-03-04 14:33:15.687695 0'0 2015-03-04 14:33:15.687695 I do have questions for you, even at this point, though. 1) Where did you find the formula (14400/(k+m))? 2) I was really trying to size this for when it goes to production, at which point it may have as many as 384 OSDs. Doesn't that imply I should have even more pgs? On Wed, Mar 4, 2015 at 2:15 PM, Don Doerner don.doer...@quantum.com wrote: Oh duh… OK, then given a 4+4 erasure coding scheme, 14400/8 is 1800, so try 2048. -don- *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *Don Doerner *Sent:* 04 March, 2015 12:14 *To:* Kyle Hutson; Ceph Users *Subject:* Re: [ceph-users] New EC pool undersized In this case, that number means that there is not an OSD that can be assigned. What’s your k, m from you erasure coded pool? You’ll need
Re: [ceph-users] New EC pool undersized
That did it. 'step set_choose_tries 200' fixed the problem right away. Thanks Yann! On Wed, Mar 4, 2015 at 2:59 PM, Yann Dupont y...@objoo.org wrote: Le 04/03/2015 21:48, Don Doerner a écrit : Hmmm, I just struggled through this myself. How many racks do you have? If not more than 8, you might want to make your failure domain smaller? I.e., maybe host? That, at least, would allow you to debug the situation… -don- Hello, I think I already had this problem. It's explained here http://tracker.ceph.com/issues/10350 And solution is probably here : http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/ Section : CRUSH gives up too soon Cheers, Yann ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] New EC pool undersized
It wouldn't let me simply change the pg_num, giving Error EEXIST: specified pg_num 2048 = current 8192 But that's not a big deal, I just deleted the pool and recreated with 'ceph osd pool create ec44pool 2048 2048 erasure ec44profile' ...and the result is quite similar: 'ceph status' is now ceph status cluster 196e5eb8-d6a7-4435-907e-ea028e946923 health HEALTH_WARN 4 pgs degraded; 4 pgs stuck unclean; 4 pgs undersized monmap e1: 4 mons at {hobbit01= 10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0}, election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14 osdmap e412: 144 osds: 144 up, 144 in pgmap v6798: 6144 pgs, 2 pools, 0 bytes data, 0 objects 90590 MB used, 640 TB / 640 TB avail 4 active+undersized+degraded 6140 active+clean 'ceph pg dump_stuck results' in ok pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 2.296 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 14:33:26.672224 0'0 412:9 [5,55,91,2147483647,83,135,53,26] 5 [5,55,91,2147483647,83,135,53,26] 5 0'0 2015-03-04 14:33:15.649911 0'0 2015-03-04 14:33:15.649911 2.69c 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 14:33:24.984802 0'0 412:9 [93,134,1,74,112,28,2147483647,60] 93 [93,134,1,74,112,28,2147483647,60] 93 0'0 2015-03-04 14:33:15.695747 0'0 2015-03-04 14:33:15.695747 2.36d 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 14:33:21.937620 0'0 412:9 [12,108,136,104,52,18,63,2147483647] 12 [12,108,136,104,52,18,63,2147483647] 12 0'0 2015-03-04 14:33:15.652480 0'0 2015-03-04 14:33:15.652480 2.5f7 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 14:33:26.169242 0'0 412:9 [94,128,73,22,4,60,2147483647,113] 94 [94,128,73,22,4,60,2147483647,113] 94 0'0 2015-03-04 14:33:15.687695 0'0 2015-03-04 14:33:15.687695 I do have questions for you, even at this point, though. 1) Where did you find the formula (14400/(k+m))? 2) I was really trying to size this for when it goes to production, at which point it may have as many as 384 OSDs. Doesn't that imply I should have even more pgs? On Wed, Mar 4, 2015 at 2:15 PM, Don Doerner don.doer...@quantum.com wrote: Oh duh… OK, then given a 4+4 erasure coding scheme, 14400/8 is 1800, so try 2048. -don- *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *Don Doerner *Sent:* 04 March, 2015 12:14 *To:* Kyle Hutson; Ceph Users *Subject:* Re: [ceph-users] New EC pool undersized In this case, that number means that there is not an OSD that can be assigned. What’s your k, m from you erasure coded pool? You’ll need approximately (14400/(k+m)) PGs, rounded up to the next power of 2… -don- *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com ceph-users-boun...@lists.ceph.com] *On Behalf Of *Kyle Hutson *Sent:* 04 March, 2015 12:06 *To:* Ceph Users *Subject:* [ceph-users] New EC pool undersized Last night I blew away my previous ceph configuration (this environment is pre-production) and have 0.87.1 installed. I've manually edited the crushmap so it down looks like https://dpaste.de/OLEa https://urldefense.proofpoint.com/v1/url?u=https://dpaste.de/OLEak=8F5TVnBDKF32UabxXsxZiA%3D%3D%0Ar=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0Am=JSfAuDHRgKln0yM%2FQGMT3hZb3rVLUpdn2wGdV3C0Rbk%3D%0As=c1bd46dcd96e656554817882d7f6581903b1e3c6a50313f4bf7494acfd12b442 I currently have 144 OSDs on 8 nodes. After increasing pg_num and pgp_num to a more suitable 1024 (due to the high number of OSDs), everything looked happy. So, now I'm trying to play with an erasure-coded pool. I did: ceph osd erasure-code-profile set ec44profile k=4 m=4 ruleset-failure-domain=rack ceph osd pool create ec44pool 8192 8192 erasure ec44profile After settling for a bit 'ceph status' gives cluster 196e5eb8-d6a7-4435-907e-ea028e946923 health HEALTH_WARN 7 pgs degraded; 7 pgs stuck degraded; 7 pgs stuck unclean; 7 pgs stuck undersized; 7 pgs undersized monmap e1: 4 mons at {hobbit01= 10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0 https://urldefense.proofpoint.com/v1/url?u=http://10.5.38.1:6789/0%2Chobbit02%3D10.5.38.2:6789/0%2Chobbit13%3D10.5.38.13:6789/0%2Chobbit14%3D10.5.38.14:6789/0k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0Ar=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0Am=JSfAuDHRgKln0yM%2FQGMT3hZb3rVLUpdn2wGdV3C0Rbk%3D%0As=6fe07b47a00235857630057e09cfb702dcddcea1d3f98d81a574020ee95dee44}, election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14 osdmap e409: 144 osds: 144 up, 144 in pgmap v6763: 12288 pgs, 2 pools, 0 bytes data, 0 objects 90598 MB used, 640 TB / 640 TB avail 7 active+undersized+degraded 12281 active+clean So
Re: [ceph-users] New EC pool undersized
My lowest level (other than OSD) is 'disktype' (based on the crushmaps at http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ ) since I have SSDs and HDDs on the same host. I just made that change (deleted the pool, deleted the profile, deleted the crush ruleset), then re-created using ruleset-failure-domain=disktype. Very similar results. health HEALTH_WARN 3 pgs degraded; 3 pgs stuck unclean; 3 pgs undersized 'ceph pg dump stuck' looks very similar to the last one I posted. On Wed, Mar 4, 2015 at 2:48 PM, Don Doerner don.doer...@quantum.com wrote: Hmmm, I just struggled through this myself. How many racks do you have? If not more than 8, you might want to make your failure domain smaller? I.e., maybe host? That, at least, would allow you to debug the situation… -don- *From:* Kyle Hutson [mailto:kylehut...@ksu.edu] *Sent:* 04 March, 2015 12:43 *To:* Don Doerner *Cc:* Ceph Users *Subject:* Re: [ceph-users] New EC pool undersized It wouldn't let me simply change the pg_num, giving Error EEXIST: specified pg_num 2048 = current 8192 But that's not a big deal, I just deleted the pool and recreated with 'ceph osd pool create ec44pool 2048 2048 erasure ec44profile' ...and the result is quite similar: 'ceph status' is now ceph status cluster 196e5eb8-d6a7-4435-907e-ea028e946923 health HEALTH_WARN 4 pgs degraded; 4 pgs stuck unclean; 4 pgs undersized monmap e1: 4 mons at {hobbit01= 10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0 https://urldefense.proofpoint.com/v1/url?u=http://10.5.38.1:6789/0%2Chobbit02%3D10.5.38.2:6789/0%2Chobbit13%3D10.5.38.13:6789/0%2Chobbit14%3D10.5.38.14:6789/0k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0Ar=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0Am=fHQcjtxx3uADdikQAQAh65Z0s%2FzNFIj544bRY5zThgI%3D%0As=01b7463be37041310163f5d75abc634fab3280633eaef2158ed6609c6f3978d8}, election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14 osdmap e412: 144 osds: 144 up, 144 in pgmap v6798: 6144 pgs, 2 pools, 0 bytes data, 0 objects 90590 MB used, 640 TB / 640 TB avail 4 active+undersized+degraded 6140 active+clean 'ceph pg dump_stuck results' in ok pg_stat objects mip degr misp unf bytes log disklog state state_stampvreported up up_primary actingacting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 2.296 00000000 active+undersized+degraded2015-03-04 14:33:26.672224 0'0 412:9 [5,55,91,2147483647,83,135,53,26] 5 [5,55,91,2147483647,83,135,53,26] 50'0 2015-03-04 14:33:15.649911 0'0 2015-03-04 14:33:15.649911 2.69c 00000000 active+undersized+degraded2015-03-04 14:33:24.984802 0'0 412:9 [93,134,1,74,112,28,2147483647,60] 93 [93,134,1,74,112,28,2147483647 ,60] 93 0'0 2015-03-04 14:33:15.695747 0'0 2015-03-04 14:33:15.695747 2.36d 00000000 active+undersized+degraded2015-03-04 14:33:21.937620 0'0 412:9 [12,108,136,104,52,18,63,2147483647]12 [12,108,136,104,52,18,63, 2147483647]12 0'0 2015-03-04 14:33:15.6524800'0 2015-03-04 14:33:15.652480 2.5f7 00000000 active+undersized+degraded2015-03-04 14:33:26.169242 0'0 412:9 [94,128,73,22,4,60,2147483647,113] 94 [94,128,73,22,4,60,2147483647 ,113] 94 0'0 2015-03-04 14:33:15.687695 0'0 2015-03-04 14:33:15.687695 I do have questions for you, even at this point, though. 1) Where did you find the formula (14400/(k+m))? 2) I was really trying to size this for when it goes to production, at which point it may have as many as 384 OSDs. Doesn't that imply I should have even more pgs? On Wed, Mar 4, 2015 at 2:15 PM, Don Doerner don.doer...@quantum.com wrote: Oh duh… OK, then given a 4+4 erasure coding scheme, 14400/8 is 1800, so try 2048. -don- *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *Don Doerner *Sent:* 04 March, 2015 12:14 *To:* Kyle Hutson; Ceph Users *Subject:* Re: [ceph-users] New EC pool undersized In this case, that number means that there is not an OSD that can be assigned. What’s your k, m from you erasure coded pool? You’ll need approximately (14400/(k+m)) PGs, rounded up to the next power of 2… -don- *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com ceph-users-boun...@lists.ceph.com] *On Behalf Of *Kyle Hutson *Sent:* 04 March, 2015 12:06 *To:* Ceph Users *Subject:* [ceph-users] New EC pool undersized Last night I blew away my previous ceph configuration (this environment is pre-production) and have 0.87.1 installed. I've manually edited the crushmap so it down looks like https://dpaste.de/OLEa https://urldefense.proofpoint.com/v1/url?u
Re: [ceph-users] Centos 7 OSD silently fail to start
I'm having a similar issue. I'm following http://ceph.com/docs/master/install/manual-deployment/ to a T. I have OSDs on the same host deployed with the short-form and they work fine. I am trying to deploy some more via the long form (because I want them to appear in a different location in the crush map). Everything through step 10 (i.e. ceph osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...] ) works just fine. When I go to step 11 (sudo /etc/init.d/ceph start osd.{osd-num}) I get: /etc/init.d/ceph: osd.16 not found (/etc/ceph/ceph.conf defines mon.hobbit01 osd.7 osd.15 osd.10 osd.9 osd.1 osd.14 osd.2 osd.3 osd.13 osd.8 osd.12 osd.6 osd.11 osd.5 osd.4 osd.0 , /var/lib/ceph defines mon.hobbit01 osd.7 osd.15 osd.10 osd.9 osd.1 osd.14 osd.2 osd.3 osd.13 osd.8 osd.12 osd.6 osd.11 osd.5 osd.4 osd.0) On Wed, Feb 25, 2015 at 11:55 AM, Travis Rhoden trho...@gmail.com wrote: Also, did you successfully start your monitor(s), and define/create the OSDs within the Ceph cluster itself? There are several steps to creating a Ceph cluster manually. I'm unsure if you have done the steps to actually create and register the OSDs with the cluster. - Travis On Wed, Feb 25, 2015 at 9:49 AM, Leszek Master keks...@gmail.com wrote: Check firewall rules and selinux. It sometimes is a pain in the ... :) 25 lut 2015 01:46 Barclay Jameson almightybe...@gmail.com napisał(a): I have tried to install ceph using ceph-deploy but sgdisk seems to have too many issues so I did a manual install. After mkfs.btrfs on the disks and journals and mounted them I then tried to start the osds which failed. The first error was: #/etc/init.d/ceph start osd.0 /etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines ) I then manually added the osds to the conf file with the following as an example: [osd.0] osd_host = node01 Now when I run the command : # /etc/init.d/ceph start osd.0 There is no error or output from the command and in fact when I do a ceph -s no osds are listed as being up. Doing as ps aux | grep -i ceph or ps aux | grep -i osd shows there are no osd running. I also have done htop to see if any process are running and none are shown. I had this working on SL6.5 with Firefly but Giant on Centos 7 has been nothing but a giant pain. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Centos 7 OSD silently fail to start
But I already issued that command (back in step 6). The interesting part is that ceph-disk activate apparently does it correctly. Even after reboot, the services start as they should. On Wed, Feb 25, 2015 at 3:54 PM, Robert LeBlanc rob...@leblancnet.us wrote: I think that your problem lies with systemd (even though you are using SysV syntax, systemd is really doing the work). Systemd does not like multiple arguments and I think this is why it is failing. There is supposed to be some work done to get systemd working ok, but I think it has the limitation of only working with a cluster named 'ceph' currently. What I did to get around the problem was to run the osd command manually: ceph-osd -i osd# Once I understand the under-the-hood stuff, I moved to ceph-disk and now because of the GPT partition IDs, udev automatically starts up the OSD process at boot/creation and moves to the appropiate CRUSH location (configuratble in ceph.conf http://ceph.com/docs/master/rados/operations/crush-map/#crush-location, an example: crush location = host=test rack=rack3 row=row8 datacenter=local region=na-west root=default). To restart an OSD process, I just kill the PID for the OSD then issue ceph-disk activate /dev/sdx1 to restart the OSD process. You probably could stop it with systemctl since I believe udev creates a resource for it (I should probably look into that now that this system will be going production soon). On Wed, Feb 25, 2015 at 2:13 PM, Kyle Hutson kylehut...@ksu.edu wrote: I'm having a similar issue. I'm following http://ceph.com/docs/master/install/manual-deployment/ to a T. I have OSDs on the same host deployed with the short-form and they work fine. I am trying to deploy some more via the long form (because I want them to appear in a different location in the crush map). Everything through step 10 (i.e. ceph osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...] ) works just fine. When I go to step 11 (sudo /etc/init.d/ceph start osd.{osd-num}) I get: /etc/init.d/ceph: osd.16 not found (/etc/ceph/ceph.conf defines mon.hobbit01 osd.7 osd.15 osd.10 osd.9 osd.1 osd.14 osd.2 osd.3 osd.13 osd.8 osd.12 osd.6 osd.11 osd.5 osd.4 osd.0 , /var/lib/ceph defines mon.hobbit01 osd.7 osd.15 osd.10 osd.9 osd.1 osd.14 osd.2 osd.3 osd.13 osd.8 osd.12 osd.6 osd.11 osd.5 osd.4 osd.0) On Wed, Feb 25, 2015 at 11:55 AM, Travis Rhoden trho...@gmail.com wrote: Also, did you successfully start your monitor(s), and define/create the OSDs within the Ceph cluster itself? There are several steps to creating a Ceph cluster manually. I'm unsure if you have done the steps to actually create and register the OSDs with the cluster. - Travis On Wed, Feb 25, 2015 at 9:49 AM, Leszek Master keks...@gmail.com wrote: Check firewall rules and selinux. It sometimes is a pain in the ... :) 25 lut 2015 01:46 Barclay Jameson almightybe...@gmail.com napisał(a): I have tried to install ceph using ceph-deploy but sgdisk seems to have too many issues so I did a manual install. After mkfs.btrfs on the disks and journals and mounted them I then tried to start the osds which failed. The first error was: #/etc/init.d/ceph start osd.0 /etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines ) I then manually added the osds to the conf file with the following as an example: [osd.0] osd_host = node01 Now when I run the command : # /etc/init.d/ceph start osd.0 There is no error or output from the command and in fact when I do a ceph -s no osds are listed as being up. Doing as ps aux | grep -i ceph or ps aux | grep -i osd shows there are no osd running. I also have done htop to see if any process are running and none are shown. I had this working on SL6.5 with Firefly but Giant on Centos 7 has been nothing but a giant pain. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Centos 7 OSD silently fail to start
So I issue it twice? e.g. ceph-osd -i X --mkfs --mkkey ...other commands... ceph-osd -i X ? On Wed, Feb 25, 2015 at 4:03 PM, Robert LeBlanc rob...@leblancnet.us wrote: Step #6 in http://ceph.com/docs/master/install/manual-deployment/#long-form only set-ups the file structure for the OSD, it doesn't start the long running process. On Wed, Feb 25, 2015 at 2:59 PM, Kyle Hutson kylehut...@ksu.edu wrote: But I already issued that command (back in step 6). The interesting part is that ceph-disk activate apparently does it correctly. Even after reboot, the services start as they should. On Wed, Feb 25, 2015 at 3:54 PM, Robert LeBlanc rob...@leblancnet.us wrote: I think that your problem lies with systemd (even though you are using SysV syntax, systemd is really doing the work). Systemd does not like multiple arguments and I think this is why it is failing. There is supposed to be some work done to get systemd working ok, but I think it has the limitation of only working with a cluster named 'ceph' currently. What I did to get around the problem was to run the osd command manually: ceph-osd -i osd# Once I understand the under-the-hood stuff, I moved to ceph-disk and now because of the GPT partition IDs, udev automatically starts up the OSD process at boot/creation and moves to the appropiate CRUSH location (configuratble in ceph.conf http://ceph.com/docs/master/rados/operations/crush-map/#crush-location, an example: crush location = host=test rack=rack3 row=row8 datacenter=local region=na-west root=default). To restart an OSD process, I just kill the PID for the OSD then issue ceph-disk activate /dev/sdx1 to restart the OSD process. You probably could stop it with systemctl since I believe udev creates a resource for it (I should probably look into that now that this system will be going production soon). On Wed, Feb 25, 2015 at 2:13 PM, Kyle Hutson kylehut...@ksu.edu wrote: I'm having a similar issue. I'm following http://ceph.com/docs/master/install/manual-deployment/ to a T. I have OSDs on the same host deployed with the short-form and they work fine. I am trying to deploy some more via the long form (because I want them to appear in a different location in the crush map). Everything through step 10 (i.e. ceph osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...] ) works just fine. When I go to step 11 (sudo /etc/init.d/ceph start osd.{osd-num}) I get: /etc/init.d/ceph: osd.16 not found (/etc/ceph/ceph.conf defines mon.hobbit01 osd.7 osd.15 osd.10 osd.9 osd.1 osd.14 osd.2 osd.3 osd.13 osd.8 osd.12 osd.6 osd.11 osd.5 osd.4 osd.0 , /var/lib/ceph defines mon.hobbit01 osd.7 osd.15 osd.10 osd.9 osd.1 osd.14 osd.2 osd.3 osd.13 osd.8 osd.12 osd.6 osd.11 osd.5 osd.4 osd.0) On Wed, Feb 25, 2015 at 11:55 AM, Travis Rhoden trho...@gmail.com wrote: Also, did you successfully start your monitor(s), and define/create the OSDs within the Ceph cluster itself? There are several steps to creating a Ceph cluster manually. I'm unsure if you have done the steps to actually create and register the OSDs with the cluster. - Travis On Wed, Feb 25, 2015 at 9:49 AM, Leszek Master keks...@gmail.com wrote: Check firewall rules and selinux. It sometimes is a pain in the ... :) 25 lut 2015 01:46 Barclay Jameson almightybe...@gmail.com napisał(a): I have tried to install ceph using ceph-deploy but sgdisk seems to have too many issues so I did a manual install. After mkfs.btrfs on the disks and journals and mounted them I then tried to start the osds which failed. The first error was: #/etc/init.d/ceph start osd.0 /etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines ) I then manually added the osds to the conf file with the following as an example: [osd.0] osd_host = node01 Now when I run the command : # /etc/init.d/ceph start osd.0 There is no error or output from the command and in fact when I do a ceph -s no osds are listed as being up. Doing as ps aux | grep -i ceph or ps aux | grep -i osd shows there are no osd running. I also have done htop to see if any process are running and none are shown. I had this working on SL6.5 with Firefly but Giant on Centos 7 has been nothing but a giant pain. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph
Re: [ceph-users] Centos 7 OSD silently fail to start
Thank you Thomas. You at least made me look it the right spot. Their long-form is showing what to do for a mon, not an osd. At the bottom of step 11, instead of sudo touch /var/lib/ceph/mon/{cluster-name}-{hostname}/sysvinit It should read sudo touch /var/lib/ceph/osd/{cluster-name}-{osd-num}/sysvinit Once I did that 'service ceph status' correctly shows that I have that OSD available and I can start or stop it from there. On Wed, Feb 25, 2015 at 4:55 PM, Thomas Foster thomas.foste...@gmail.com wrote: I am using the long form and have it working. The one thing that I saw was to change from osd_host to just host. See if that works. On Feb 25, 2015 5:44 PM, Kyle Hutson kylehut...@ksu.edu wrote: I just tried it, and that does indeed get the OSD to start. However, it doesn't add it to the appropriate place so it would survive a reboot. In my case, running 'service ceph status osd.16' still results in the same line I posted above. There's still something broken such that 'ceph-disk activate' works correctly, but using the long-form version does not. On Wed, Feb 25, 2015 at 4:03 PM, Robert LeBlanc rob...@leblancnet.us wrote: Step #6 in http://ceph.com/docs/master/install/manual-deployment/#long-form only set-ups the file structure for the OSD, it doesn't start the long running process. On Wed, Feb 25, 2015 at 2:59 PM, Kyle Hutson kylehut...@ksu.edu wrote: But I already issued that command (back in step 6). The interesting part is that ceph-disk activate apparently does it correctly. Even after reboot, the services start as they should. On Wed, Feb 25, 2015 at 3:54 PM, Robert LeBlanc rob...@leblancnet.us wrote: I think that your problem lies with systemd (even though you are using SysV syntax, systemd is really doing the work). Systemd does not like multiple arguments and I think this is why it is failing. There is supposed to be some work done to get systemd working ok, but I think it has the limitation of only working with a cluster named 'ceph' currently. What I did to get around the problem was to run the osd command manually: ceph-osd -i osd# Once I understand the under-the-hood stuff, I moved to ceph-disk and now because of the GPT partition IDs, udev automatically starts up the OSD process at boot/creation and moves to the appropiate CRUSH location (configuratble in ceph.conf http://ceph.com/docs/master/rados/operations/crush-map/#crush-location, an example: crush location = host=test rack=rack3 row=row8 datacenter=local region=na-west root=default). To restart an OSD process, I just kill the PID for the OSD then issue ceph-disk activate /dev/sdx1 to restart the OSD process. You probably could stop it with systemctl since I believe udev creates a resource for it (I should probably look into that now that this system will be going production soon). On Wed, Feb 25, 2015 at 2:13 PM, Kyle Hutson kylehut...@ksu.edu wrote: I'm having a similar issue. I'm following http://ceph.com/docs/master/install/manual-deployment/ to a T. I have OSDs on the same host deployed with the short-form and they work fine. I am trying to deploy some more via the long form (because I want them to appear in a different location in the crush map). Everything through step 10 (i.e. ceph osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...] ) works just fine. When I go to step 11 (sudo /etc/init.d/ceph start osd.{osd-num}) I get: /etc/init.d/ceph: osd.16 not found (/etc/ceph/ceph.conf defines mon.hobbit01 osd.7 osd.15 osd.10 osd.9 osd.1 osd.14 osd.2 osd.3 osd.13 osd.8 osd.12 osd.6 osd.11 osd.5 osd.4 osd.0 , /var/lib/ceph defines mon.hobbit01 osd.7 osd.15 osd.10 osd.9 osd.1 osd.14 osd.2 osd.3 osd.13 osd.8 osd.12 osd.6 osd.11 osd.5 osd.4 osd.0) On Wed, Feb 25, 2015 at 11:55 AM, Travis Rhoden trho...@gmail.com wrote: Also, did you successfully start your monitor(s), and define/create the OSDs within the Ceph cluster itself? There are several steps to creating a Ceph cluster manually. I'm unsure if you have done the steps to actually create and register the OSDs with the cluster. - Travis On Wed, Feb 25, 2015 at 9:49 AM, Leszek Master keks...@gmail.com wrote: Check firewall rules and selinux. It sometimes is a pain in the ... :) 25 lut 2015 01:46 Barclay Jameson almightybe...@gmail.com napisał(a): I have tried to install ceph using ceph-deploy but sgdisk seems to have too many issues so I did a manual install. After mkfs.btrfs on the disks and journals and mounted them I then tried to start the osds which failed. The first error was: #/etc/init.d/ceph start osd.0 /etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines ) I then manually added the osds to the conf file with the following as an example: [osd.0
Re: [ceph-users] Fixing a crushmap
Here was the process I went through. 1) I created an EC pool which created ruleset 1 2) I edited the crushmap to approximately its current form 3) I discovered my previous EC pool wasn't doing what I meant for it to do, so I deleted it. 4) I created a new EC pool with the parameters I wanted and told it to use ruleset 3 On Fri, Feb 20, 2015 at 10:55 AM, Luis Periquito periqu...@gmail.com wrote: The process of creating an erasure coded pool and a replicated one is slightly different. You can use Sebastian's guide to create/manage the osd tree, but you should follow this guide http://ceph.com/docs/giant/dev/erasure-coded-pool/ to create the EC pool. I'm not sure (i.e. I never tried) to create a EC pool the way you did. The normal replicated ones do work like this. On Fri, Feb 20, 2015 at 4:49 PM, Kyle Hutson kylehut...@ksu.edu wrote: I manually edited my crushmap, basing my changes on http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ I have SSDs and HDDs in the same box and was wanting to separate them by ruleset. My current crushmap can be seen at http://pastie.org/9966238 I had it installed and everything looked gooduntil I created a new pool. All of the new pgs are stuck in creating. I first tried creating an erasure-coded pool using ruleset 3, then created another pool using ruleset 0. Same result. I'm not opposed to an 'RTFM' answer, so long as you can point me to the right one. I've seen very little documentation on crushmap rules, in particular. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fixing a crushmap
Oh, and I don't yet have any important data here, so I'm not worried about losing anything at this point. I just need to get my cluster happy again so I can play with it some more. On Fri, Feb 20, 2015 at 11:00 AM, Kyle Hutson kylehut...@ksu.edu wrote: Here was the process I went through. 1) I created an EC pool which created ruleset 1 2) I edited the crushmap to approximately its current form 3) I discovered my previous EC pool wasn't doing what I meant for it to do, so I deleted it. 4) I created a new EC pool with the parameters I wanted and told it to use ruleset 3 On Fri, Feb 20, 2015 at 10:55 AM, Luis Periquito periqu...@gmail.com wrote: The process of creating an erasure coded pool and a replicated one is slightly different. You can use Sebastian's guide to create/manage the osd tree, but you should follow this guide http://ceph.com/docs/giant/dev/erasure-coded-pool/ to create the EC pool. I'm not sure (i.e. I never tried) to create a EC pool the way you did. The normal replicated ones do work like this. On Fri, Feb 20, 2015 at 4:49 PM, Kyle Hutson kylehut...@ksu.edu wrote: I manually edited my crushmap, basing my changes on http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ I have SSDs and HDDs in the same box and was wanting to separate them by ruleset. My current crushmap can be seen at http://pastie.org/9966238 I had it installed and everything looked gooduntil I created a new pool. All of the new pgs are stuck in creating. I first tried creating an erasure-coded pool using ruleset 3, then created another pool using ruleset 0. Same result. I'm not opposed to an 'RTFM' answer, so long as you can point me to the right one. I've seen very little documentation on crushmap rules, in particular. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Fixing a crushmap
I manually edited my crushmap, basing my changes on http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ I have SSDs and HDDs in the same box and was wanting to separate them by ruleset. My current crushmap can be seen at http://pastie.org/9966238 I had it installed and everything looked gooduntil I created a new pool. All of the new pgs are stuck in creating. I first tried creating an erasure-coded pool using ruleset 3, then created another pool using ruleset 0. Same result. I'm not opposed to an 'RTFM' answer, so long as you can point me to the right one. I've seen very little documentation on crushmap rules, in particular. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.67 Dumpling released
Thanks for that bit, too, Ian. For what it's worth, I updated /etc/yum.repos.d/ceph.repo , installed the latest version (from cuttlefish), restarted (monitors first, then everything else) and everything looks great. On Wed, Aug 14, 2013 at 1:28 PM, Ian Colle ian.co...@inktank.com wrote: There are version specific repos, but you shouldn't need them if you want the latest. In fact, http://ceph.com/rpm/ is simply a link to http://ceph.com/rpm-dumpling Ian R. Colle Director of Engineering Inktank Cell: +1.303.601.7713 Email: i...@inktank.com Delivering the Future of Storage http://www.linkedin.com/in/ircolle [image: Follow teststamp on Twitter] http://www.twitter.com/ircolle On 8/14/13 8:28 AM, Kyle Hutson kylehut...@k-state.edu wrote: Ah, didn't realize the repos were version-specific. Thanks Dan! On Wed, Aug 14, 2013 at 9:20 AM, Dan van der Ster daniel.vanders...@cern.ch wrote: http://ceph.com/rpm-dumpling/el6/x86_64/ -- Dan van der Ster CERN IT-DSS On Wednesday, August 14, 2013 at 4:17 PM, Kyle Hutson wrote: Any suggestions for upgrading CentOS/RHEL? The yum repos don't appear to have been updated yet. I thought maybe with the improved support for Red Hat platforms that would be the easy way of going about it. On Wed, Aug 14, 2013 at 5:08 AM, pe...@2force.nl (mailto: pe...@2force.nl) wrote: On 2013-08-14 07:32, Sage Weil wrote: Another three months have gone by, and the next stable release of Ceph is ready: Dumpling! Thank you to everyone who has contributed to this release! This release focuses on a few major themes since v0.61 (Cuttlefish): * rgw: multi-site, multi-datacenter support for S3/Swift object storage * new RESTful API endpoint for administering the cluster, based on a new and improved management API and updated CLI * mon: stability and performance * osd: stability performance * cephfs: open-by-ino support (for improved NFS reexport) * improved support for Red Hat platforms * use of the Intel CRC32c instruction when available As with previous stable releases, you can upgrade from previous versions of Ceph without taking the entire cluster online, as long as a few simple guidelines are followed. * For Dumpling, we have tested upgrades from both Bobtail and Cuttlefish. If you are running Argonaut, please upgrade to Bobtail and then to Dumpling. * Please upgrade daemons/hosts in the following order: 1. Upgrade ceph-common on all nodes that will use the command line ceph utility. 2. Upgrade all monitors (upgrade ceph package, restart ceph-mon daemons). This can happen one daemon or host at a time. Note that because cuttlefish and dumpling monitors cant talk to each other, all monitors should be upgraded in relatively short succession to minimize the risk that an untimely failure will reduce availability. 3. Upgrade all osds (upgrade ceph package, restart ceph-osd daemons). This can happen one daemon or host at a time. 4. Upgrade radosgw (upgrade radosgw package, restart radosgw daemons). There are several small compatibility changes between Cuttlefish and Dumpling, particularly with the CLI interface. Please see the complete release notes for a summary of the changes since v0.66 and v0.61 Cuttlefish, and other possible issues that should be considered before upgrading: http://ceph.com/docs/master/release-notes/#v0-67-dumpling Dumpling is the second Ceph release on our new three-month stable release cycle. We are very pleased to have pulled everything together on schedule. The next stable release, which will be code-named Emperor, is slated for three months from now (beginning of November). You can download v0.67 Dumpling from the usual locations: * Git at git://github.com/ceph/ceph.git ( http://github.com/ceph/ceph.git) * Tarball at http://ceph.com/download/ceph-0.67.tar.gz * For Debian/Ubuntu packages, see http://ceph.com/docs/master/install/debian * For RPMs, see http://ceph.com/docs/master/install/rpm ___ ceph-users mailing list ceph-users@lists.ceph.com (mailto:ceph-users@lists.ceph.com) http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Hi Sage, I just upgraded and everything went quite smoothly with osds, mons and mds, good work guys! :) The only problem I have ran into is with radosgw. It is unable to start after the upgrade with the following message: 2013-08-14 11:57:25.841310 7ffd0d2ae780 0 ceph version 0.67 (e3b7bc5bce8ab330ec1661381072368af3c218a0), process radosgw, pid 5612 2013-08-14 11:57:25.841328 7ffd0d2ae780 -1 WARNING: libcurl doesn't support curl_multi_wait() 2013-08-14 11:57:25.841335 7ffd0d2ae780 -1 WARNING: cross zone / region transfer performance may be affected 2013-08-14 11:57
[ceph-users] Stuck pages and other bad things
I'm presuming this is the correct list (rather than the -devel list) please correct me if I'm wrong there. I setup ceph (0.56.4) a few months ago with two disk servers and one dedicated monitor. The disk servers also have monitors, so there are a total of 3 monitors for the cluster. Each of the disk servers has 8 OSDs. I didn't actually get a 'ceph osd tree' output from that, but cutting-and-pasting relevant parts from what I have now, it probably looked like this: # id weight type name up/down reweight -1 16 root default -3 16 rack unknownrack -2 0 host leviathan 100 1 osd.100 up 1 101 1 osd.101 up 1 102 1 osd.102 up 1 103 1 osd.103 up 1 104 1 osd.104 up 1 105 1 osd.105 up 1 106 1 osd.106 up 1 107 1 osd.107 up 1 -4 8 host minotaur 200 1 osd.200 up 1 201 1 osd.201 up 1 202 1 osd.202 up 1 203 1 osd.203 up 1 204 1 osd.204 up 1 205 1 osd.205 up 1 206 1 osd.206 up 1 207 1 osd.207 up 1 A couple of weeks ago, for valid reasons that aren't relevant here, we decided to repurpose one of the disk servers (leviathan) and replace the ceph fileserver with some other hardware. I created a new server (aergia). That changed the 'ceph osd tree' to this: # id weight type name up/down reweight -1 16 root default -3 16 rack unknownrack -2 0 host leviathan 100 1 osd.100 up 1 101 1 osd.101 up 1 102 1 osd.102 up 1 103 1 osd.103 up 1 104 1 osd.104 up 1 105 1 osd.105 up 1 106 1 osd.106 up 1 107 1 osd.107 up 1 -4 8 host minotaur 200 1 osd.200 up 1 201 1 osd.201 up 1 202 1 osd.202 up 1 203 1 osd.203 up 1 204 1 osd.204 up 1 205 1 osd.205 up 1 206 1 osd.206 up 1 207 1 osd.207 up 1 0 1 osd.0 up 1 1 1 osd.1 up 1 2 1 osd.2 up 1 3 1 osd.3 up 1 4 1 osd.4 up 1 5 1 osd.5 up 1 6 1 osd.6 up 1 7 1 osd.7 up 1 Everything was looking happy, so I began removing the OSDs on leviathan. That's when the problems stared. 'ceph health detail' shows that there are several pages that either existed only on that disk server, e.g. pg 0.312 is stuck unclean since forever, current state stale+active+degraded+remapped, last acting [103] or pages that were only replicated back onto the same host, e.g. pg 0.2f4 is stuck unclean since forever, current state stale+active+remapped, last acting [106,101] I brought leviathan back up, and I *think* everything is at least responding now. But 'ceph health' still shows HEALTH_WARN 302 pgs degraded; 810 pgs stale; 810 pgs stuck stale; 3562 pgs stuck unclean; recovery 44951/2289634 degraded (1.963%) ...and it's been stuck there for a long time. So my question is, how do I force data off the to-be-decommissioned server safely and get back to HEALTH_OK? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com