Re: [ceph-users] Uniquely identifying a Ceph client
On Tue, Nov 1, 2016 at 11:45 AM, Sage Weil <s...@newdream.net> wrote: > On Tue, 1 Nov 2016, Travis Rhoden wrote: >> Hello, >> Is there a consistent, reliable way to identify a Ceph client? I'm looking >> for a string/ID (UUID, for example) that can be traced back to a client >> doing RBD maps. >> >> There are a couple of possibilities out there, but they aren't quite what >> I'm looking for. When checking "rbd status", for example, the output is the >> following: >> >> # rbd status travis2 >> Watchers: >> watcher=172.21.12.10:0/1492902152 client.4100 cookie=1 >> # rbd status travis3 >> Watchers: >> watcher=172.21.12.10:0/1492902152 client.4100 cookie=2 >> >> >> The IP:port/nonce string is an option, and so is the "client." string, >> but neither of these is actually that helpful because they don't the same >> strings when an advisory lock is added to the RBD images. For example: > > Both are sufficient. The in client. is the most concise and is > unique per client instance. > > I think the problem you're seeing is actually that qemu is using two > different librbd/librados instances, one for each mapped device? Not using qemu in this scenario. Just rbd map && rbd lock. It's more that I can't match the output from "rbd lock" against the output from "rbd status", because they are using different librados instances. I'm just trying to capture who has an image mapped and locked, and to those not in the know, it would be a surprise that client. and client. are actually the same host. :) I understand why it is, I was checking to see if there was another field or indicator that I should use instead. I think I'm just going to have to use the IP address, because that's the value that will have real meaning to people. Thanks! > >> # rbd lock list travis2 >> There is 1 exclusive lock on this image. >> Locker ID Address >> client.4201 test 172.21.12.100:0/967432549 >> # rbd lock list travis3 >> There is 1 exclusive lock on this image. >> Locker ID Address >> client.4240 test 172.21.12.10:0/2888955091 >> >> Note that neither the nonce nor the client ID match -- so by looking at the >> rbd lock output, you can't match that information against the output from >> "rbd status". I believe this is because the nonce the client identifier is >> reflecting the CephX session between client and cluster, and while this is >> persistent across "rbd map" calls (because the rbd kmod has a shared session >> by default, though that can be changed as well), each call to "rbd lock" >> initiates a new session. Hence a new nonce and client ID. >> >> That pretty much leaves the IP address. These would seem to be problematic >> as an identifier if the client happened to behind NAT. >> >> I am trying to be able to definitely determine what client has an RBD mapped >> and locked, but I'm not seeing a way to guarantee that you've uniquely >> identified a client. Am I missing something obvious? >> >> Perhaps my concern about NAT is overblown -- I've never mounted an RBD from >> a client that is behind NAT, and I'm not sure how common that would be >> (though I think it would work). > > It should work, but it's untested. :) > > sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Uniquely identifying a Ceph client
Hello, Is there a consistent, reliable way to identify a Ceph client? I'm looking for a string/ID (UUID, for example) that can be traced back to a client doing RBD maps. There are a couple of possibilities out there, but they aren't quite what I'm looking for. When checking "rbd status", for example, the output is the following: # rbd status travis2 Watchers: watcher=172.21.12.10:0/1492902152 client.4100 cookie=1 # rbd status travis3 Watchers: watcher=172.21.12.10:0/1492902152 client.4100 cookie=2 The IP:port/nonce string is an option, and so is the "client." string, but neither of these is actually that helpful because they don't the same strings when an advisory lock is added to the RBD images. For example: # rbd lock list travis2 There is 1 exclusive lock on this image. Locker ID Address client.4201 test 172.21.12.100:0/967432549 # rbd lock list travis3 There is 1 exclusive lock on this image. Locker ID Address client.4240 test 172.21.12.10:0/2888955091 Note that neither the nonce nor the client ID match -- so by looking at the rbd lock output, you can't match that information against the output from "rbd status". I believe this is because the nonce the client identifier is reflecting the CephX session between client and cluster, and while this is persistent across "rbd map" calls (because the rbd kmod has a shared session by default, though that can be changed as well), each call to "rbd lock" initiates a new session. Hence a new nonce and client ID. That pretty much leaves the IP address. These would seem to be problematic as an identifier if the client happened to behind NAT. I am trying to be able to definitely determine what client has an RBD mapped and locked, but I'm not seeing a way to guarantee that you've uniquely identified a client. Am I missing something obvious? Perhaps my concern about NAT is overblown -- I've never mounted an RBD from a client that is behind NAT, and I'm not sure how common that would be (though I think it would work). - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy: too many argument: --setgroup 10
Hi Noah, What is the ownership on /var/lib/ceph ? ceph-deploy should only be trying to use --setgroup if /var/lib/ceph is owned by non-root. On a fresh install of Hammer, this should be root:root. The --setgroup flag was added to ceph-deploy in 1.5.26. - Travis On Wed, Sep 2, 2015 at 1:59 PM, Noah Watkinswrote: > I'm getting the following error using ceph-deploy to setup a cluster. > It's Centos6.6 and I'm using Hammer and the latest ceph-deploy. It > looks like setgroup wasn't an option in Hammer, but ceph-deploy adds > it. Is there a trick or older version of ceph-deploy I should try? > > - Noah > > [cn67][INFO ] Running command: sudo ceph-mon --cluster ceph --mkfs -i > cn67 --keyring /var/lib/ceph/tmp/ceph-cn67.mon.keyring --setgroup 10 > [cn67][WARNIN] too many arguments: [--setgroup,10] > [cn67][DEBUG ] --conf/-c FILEread configuration from the given > configuration file > [cn67][WARNIN] usage: ceph-mon -i monid [flags] > [cn67][DEBUG ] --id/-i IDset ID portion of my name > [cn67][WARNIN] --debug_mon n > [cn67][DEBUG ] --name/-n TYPE.ID set name > [cn67][WARNIN] debug monitor level (e.g. 10) > [cn67][DEBUG ] --cluster NAMEset cluster name (default: ceph) > [cn67][WARNIN] --mkfs > [cn67][DEBUG ] --version show version and quit > [cn67][WARNIN] build fresh monitor fs > [cn67][DEBUG ] > [cn67][WARNIN] --force-sync > [cn67][DEBUG ] -drun in foreground, log to stderr. > [cn67][WARNIN] force a sync from another mon by wiping local > data (BE CAREFUL) > [cn67][DEBUG ] -frun in foreground, log to usual > location. > [cn67][WARNIN] --yes-i-really-mean-it > [cn67][DEBUG ] --debug_ms N set message debug level (e.g. 1) > [cn67][WARNIN] mandatory safeguard for --force-sync > [cn67][WARNIN] --compact > [cn67][WARNIN] compact the monitor store > [cn67][WARNIN] --osdmap > [cn67][WARNIN] only used when --mkfs is provided: load the > osdmap from > [cn67][WARNIN] --inject-monmap > [cn67][WARNIN] write the monmap to the local > monitor store and exit > [cn67][WARNIN] --extract-monmap > [cn67][WARNIN] extract the monmap from the local monitor store and > exit > [cn67][WARNIN] --mon-data > [cn67][WARNIN] where the mon store and keyring are located > [cn67][ERROR ] RuntimeError: command returned non-zero exit status: 1 > [ceph_deploy.mon][ERROR ] Failed to execute command: ceph-mon > --cluster ceph --mkfs -i cn67 --keyring > /var/lib/ceph/tmp/ceph-cn67.mon.keyring --setgroup 10 > [ceph_deploy][ERROR ] GenericError: Failed to create 1 monitors > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Error while installing ceph
A couple of things here... Looks like you are on RHEL. If you are on RHEL, but *not* trying to install RHCS (Red Hat Ceph Storage), a few extra flags are required. You must use --release. For example, ceph-deploy install --release hammer in order to get the Hammer upstream release. The docs need to make this more clear (I don't think it's mentioned anywhere -- upstream Ceph on RHEL is not a very common case, but it is supposed to work. :)) That will at least install the right packages. however, there is still one more issue you will hit, which is that when installing upstream Ceph on RHEL, it knows that it needs EPEL (EPEL is not needed with RHCS), and it will try to install it by name yum install epel-release. But that doesn't work on RHEL. Until that is fixed, you will also have to install EPEL by hand on your nodes. On Fri, Aug 28, 2015 at 5:02 PM, Brad Hubbard bhubb...@redhat.com wrote: - Original Message - From: pavana bhat pavanakrishnab...@gmail.com To: Brad Hubbard bhubb...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Saturday, 29 August, 2015 9:40:50 AM Subject: Re: [ceph-users] Error while installing ceph Yes I did follow all the preflight steps. After yum install (sudo yum update sudo yum install ceph-deploy), it did show the following are installed rhel-7-ha-rpms rhel-7-optional-rpms rhel-7-server-rpms rhel-7-supplemental-rpms rhel-7-server-rpms/primary_db ceph-noarch Installed: ceph-deploy.noarch 0:1.5.28-0 Perhaps the --repo and/or --release flags are required? Thanks, Pavana On Fri, Aug 28, 2015 at 4:29 PM, Brad Hubbard bhubb...@redhat.com wrote: Did you follow this first? http://docs.ceph.com/docs/v0.80.5/start/quick-start-preflight/ It doesn't seem to be able to locate the repos for the ceph rpms. - Original Message - From: pavana bhat pavanakrishnab...@gmail.com To: ceph-users@lists.ceph.com Sent: Saturday, 29 August, 2015 8:55:14 AM Subject: [ceph-users] Error while installing ceph Hi, Im getting an error while ceph installation. Can you please help me? I'm exactly following the steps given in http://docs.ceph.com/docs/v0.80.5/start/quick-ceph-deploy/ to install ceph. These are pretty old docs (see the version number in the URL). It's probably always best to start at http://docs.ceph.com/docs/master instead. How did you get to this old version? If it was from a link, we would want to check that that link still made sense. But when I execute ceph-deploy install {ceph-node}[{ceph-node} ...], getting following error: [ ceph-vm-mon1 ][ DEBUG ] Cleaning up everything [ ceph-vm-mon1 ][ DEBUG ] Cleaning up list of fastest mirrors [ ceph-vm-mon1 ][ INFO ] Running command: sudo yum -y install ceph-osd ceph-mds ceph-mon ceph-radosgw [ ceph-vm-mon1 ][ DEBUG ] Loaded plugins: fastestmirror [ ceph-vm-mon1 ][ DEBUG ] Determining fastest mirrors [ ceph-vm-mon1 ][ DEBUG ] * rhel-7-ha-rpms: 203.36.4.124 [ ceph-vm-mon1 ][ DEBUG ] * rhel-7-optional-rpms: 203.36.4.124 [ ceph-vm-mon1 ][ DEBUG ] * rhel-7-server-rpms: 203.36.4.124 [ ceph-vm-mon1 ][ DEBUG ] * rhel-7-supplemental-rpms: 203.36.4.124 [ ceph-vm-mon1 ][ DEBUG ] No package ceph-osd available. [ ceph-vm-mon1 ][ DEBUG ] No package ceph-mds available. [ ceph-vm-mon1 ][ DEBUG ] No package ceph-mon available. [ ceph-vm-mon1 ][ DEBUG ] No package ceph-radosgw available. [ ceph-vm-mon1 ][ WARNIN ] Error: Nothing to do [ ceph-vm-mon1 ][ ERROR ] RuntimeError: command returned non-zero exit status: 1 [ ceph_deploy ][ ERROR ] RuntimeError: Failed to execute command: yum -y install ceph-osd ceph-mds ceph-mon ceph-radosgw I have finished the preflight steps and I'm able to connect to internet from my nodes. Thanks, Pavana ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] [ANN] ceph-deploy 1.5.28 released
Hi everyone, A new version of ceph-deploy has been released. Version 1.5.28 includes the following: - A fix for a regression introduced in 1.5.27 that prevented importing GPG keys on CentOS 6 only. - Will prevent Ceph daemon deployment on nodes that don't have Ceph installed on them. - Makes it possible to go from 1 monitor daemon to 2 without a 5 minute hang/delay. - More systemd enablement work. Full changelog is at [1]. Updated packages have been uploaded to {rpm,debian}-{firefly,hammer,testing} repos on ceph.com, and to PyPI. Cheers, - Travis [1] http://ceph.com/ceph-deploy/docs/changelog.html#id2 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [ANN] ceph-deploy 1.5.27 released
Hi Nigel, On Wed, Aug 5, 2015 at 9:00 PM, Nigel Williams nigel.willi...@utas.edu.au wrote: On 6/08/2015 9:45 AM, Travis Rhoden wrote: A new version of ceph-deploy has been released. Version 1.5.27 includes the following: Has the syntax for use of --zap-disk changed? I moved it around but it is no longer recognised; worked around by doing a ceph-disk zap before running ceph-deploy. A few things in this area changed with 1.5.26. ceph-deploys options are much more strictly attached only to the commands where they make sense. This worked previously: ceph-deploy --overwrite-conf osd --zap-disk prepare ceph05:/dev/sdb:/dev/sdd --zap-disk is an option to 'prepare', not to 'osd'. ceph-deploy osd --zap-disk list doesn't make any sense, for example. The help menus should make this clear: # ceph-deploy osd --help usage: ceph-deploy osd [-h] {list,create,prepare,activate} ... # ceph-deploy osd prepare --help usage: ceph-deploy osd prepare [-h] [--zap-disk] [--fs-type FS_TYPE] [--dmcrypt] [--dmcrypt-key-dir KEYDIR] -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] [ANN] ceph-deploy 1.5.27 released
Hi everyone, A new version of ceph-deploy has been released. Version 1.5.27 includes the following: - a new ceph-deploy repo command that allows for adding and removing custom repo definitions - Makes commands like ceph-deploy install --rgw only install the RGW component of Ceph. This works for daemons/components such as --rgw, --mds, and --cli, depending on how packages are split on your distro. For example, Debian packages the Ceph MDS into a separate 'ceph-mon' package, and therefore if you use install --mds only the ceph-mds package will be installed. RPM packages do not do this, so it has to install ceph, which includes MDS, MON, and OSD daemons. Further package splits are coming, but right now we do what we can. - Some fixes around using DNF (Fedora = 22) - Early support for systemd (Fedora 22 and development Ceph builds only) - Loads of internal changes. Full changelog is at [1]. Updated packages have been uploaded to {rpm,debian}-{firefly,hammer,testing} repos on ceph.com, and to PyPI. Cheers, - Travis [1] http://ceph.com/ceph-deploy/docs/changelog.html#id2 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] injectargs not working?
Hi Quentin, It may be the specific option you are trying to tweak. osd-scrub-begin-hour was first introduced in development release v0.93, which means it would be in 0.94.x (Hammer), but your cluster is 0.87.1 (Giant). Cheers, - Travis On Wed, Jul 29, 2015 at 4:28 PM, Quentin Hartman qhart...@direwolfdigital.com wrote: I'm running a 0.87.1 cluster, and my ceph tell seems to not be working: # ceph tell osd.0 injectargs '--osd-scrub-begin-hour 1' failed to parse arguments: --osd-scrub-begin-hour,1 I've also tried the daemon config set variant and it also fails: # ceph daemon osd.0 config set osd_scrub_begin_hour 1 { error: error setting 'osd_scrub_begin_hour' to '1': (2) No such file or directory} I'm guessing I have something goofed in my admin socket client config: [client] rbd cache = true rbd cache writethrough until flush = true admin socket = /var/run/ceph/$cluster-$type.$id.asok but that seems to correlate with the structure that exists: # ls ceph-osd.24.asok ceph-osd.25.asok ceph-osd.26.asok # pwd /var/run/ceph I can show my configs all over the place, but changing them seems to always fail. It behaves the same if I'm working on a local daemon, or on my config node trying to make changes globally. Thanks in advance for any ideas QH ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] debugging ceps-deploy warning: could not open file descriptor -1
Hi Noah, It does look like the two things are unrelated. But you are right, ceph-deploy stopped accepting that trailing hostname with the ceph-deploy mon create-initial command with 1.5.26. It was never a needed argument, and accepting it led to confusion. I tightened up the argument parsing for ceph-deploy quite a bit in 1.5.26. Looking at your logfile, I do not know what the apt-get errors. It does seem like the install proceeds successfully, and that the ceph setup will proceed once the extra arg to mon create-initial is removed. Here's hoping that is indeed nothing to worry about. :) - Travis On Tue, Jul 21, 2015 at 2:27 PM, Noah Watkins noahwatk...@gmail.com wrote: Nevermind. I see that `ceph-deploy mon create-initial` has stopped accepting the trailing hostname which was causing the failure. I don't know if those problems above I showed are actually anything to worry about :) On Tue, Jul 21, 2015 at 3:17 PM, Noah Watkins noahwatk...@gmail.com wrote: The docker/distribution project runs a continuous integration VM using CircleCI, and part of the VM setup installs Ceph packages using ceph-deploy. This has been working well for quite a while, but we are seeing a failure running `ceph-deploy install --release hammer`. The snippet is here where it looks the first problem shows up. ... [box156][DEBUG ] Get:24 http://ceph.com/debian-hammer/ precise/main ceph-mds amd64 0.94.2-1precise [10.5 MB] [box156][DEBUG ] Get:25 http://ceph.com/debian-hammer/ precise/main radosgw amd64 0.94.2-1precise [3,619 kB] [box156][WARNIN] E: Could not open file descriptor -1 [box156][WARNIN] E: Prior errors apply to /var/cache/apt/archives/parted_2.3-19ubuntu1_amd64.deb ... On the surface it seems that the problem is coming from apt-get under the hood. Any pointers here? It doesn't seem like anything has changed configuration wise. The full build log can be found here which starts off with the ceph-deploy command that is failing: https://circleci.com/gh/docker/distribution/1848 Thanks, -Noah ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy on ubuntu 15.04
Hi Bernhard, Thanks for your email. systemd support for Ceph in general is still a work in progress. It is actively being worked on, but the packages hosted on ceph.com are still using sysvinit (for RPM systems), and Upstart on Ubuntu. It is definitely a known issue. Along those lines, ceph.com only hosts packages for precise and trusty (the LTS releases), so there is no support for 15.04 either. - Travis On Fri, Jul 24, 2015 at 4:01 AM, Bernhard Duebi boom...@inbox.com wrote: Hi, I have a problem with ceph-deploy on Ubuntu 15.04 in the file /usr/local/lib/python2.7/dist-packages/ceph_deploy/hosts/debian/__init__.py def choose_init(): Select a init system Returns the name of a init system (upstart, sysvinit ...). if distro.lower() == 'ubuntu': return 'upstart' return 'sysvinit' This function assumes that Ubuntu is using upstart but at least Ubuntu 15.04 Server is using SystemD by default. I'm not a python hacker. As a quick fix I commented out the if statement and now it always returns 'sysvinit'. But for a real fix there should be something like if os == ubuntu and osrelease 15.04 I first noticed this problem in the package that came with the distribution. A few days ago I removed the package and installed the latest ceph-deploy using pip install. Maybe this is a know problem, then I'm sorry for the spam Regards Bernhard FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks orcas on your desktop! Check it out at http://www.inbox.com/marineaquarium ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] [ANN] ceps-deploy 1.5.26 released
Hi everyone, This is announcing a new release of ceph-deploy that focuses on usability improvements. - Most of the help menus for ceph-deploy subcommands (e.ge. “ceph-deploy mon” and “ceph-deploy osd”) have been improved to be more context aware, such that help for “ceph-deploy osd create --help “ and “ceph-deploy osd zap --help” return different output specific to the command. Previously it would show generic help for “ceph-deploy osd”. Additionally, the list of optional arguments shown for the command are always correct for the subcommand in question. Previously the options shown were the aggregate of all options. - ceph-deploy now points to git.ceph.com for downloading GPG keys - ceph-deploy will now work on the Mint Linux distribution (by pointing to Ubuntu packages) - SUSE distro users will now be pointed to SUSE packages by default, as there have not been updated SUSE packages on ceph.com in quite some time. Full changelog is available at: http://ceph.com/ceph-deploy/docs/changelog.html#id1 New packages are available in the usual places of ceph.com hosted repos and PyPI. Cheers, - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy for Hammer
Hi Pankaj, While there have been times in the past where ARM binaries were hosted on ceph.com, there is not currently any ARM hardware for builds. I don't think you will see any ARM binaries in http://ceph.com/debian-hammer/pool/main/c/ceph/, for example. Combine that with the fact that ceph-deploy is not intended to work with locally compiled binaries (only packages, as it relies on paths, conventions, and service definitions from the packages), and it is a very tricky combo to use ceph-deploy and ARM together. Your most recent error is indicative of the ceph-mon service not coming up successfully. when ceph-mon (the service, not the daemon) is started, it also calls ceph-create-keys, which waits for the monitor daemon to come up and the creates keys that are necessary for all clusters to run when using cephx (the admin key, the bootsraps keys). - Travis On Wed, May 27, 2015 at 8:27 PM, Garg, Pankaj pankaj.g...@caviumnetworks.com wrote: Actually the ARM binaries do exist and I have been using for previous releases. Somehow this library is the one that doesn’t load. Anyway I did compile my own Ceph for ARM, and now getting the following issue: [ceph_deploy.gatherkeys][WARNIN] Unable to find /etc/ceph/ceph.client.admin.keyring on ceph1 [ceph_deploy][ERROR ] KeyNotFoundError: Could not find keyring file: /etc/ceph/ceph.client.admin.keyring on host ceph1 From: Somnath Roy [mailto:somnath@sandisk.com] Sent: Wednesday, May 27, 2015 4:29 PM To: Garg, Pankaj Cc: ceph-users@lists.ceph.com Subject: RE: ceph-deploy for Hammer If you are trying to install the ceph repo hammer binaries, I don’t think it is built for ARM. Both binary and the .so needs to be built in ARM to make this work I guess. Try to build hammer code base in your ARM server and then retry. Thanks Regards Somnath From: Pankaj Garg [mailto:pankaj.g...@caviumnetworks.com] Sent: Wednesday, May 27, 2015 4:17 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com Subject: RE: ceph-deploy for Hammer Yes I am on ARM. -Pankaj On May 27, 2015 3:58 PM, Somnath Roy somnath@sandisk.com wrote: Are you running this on ARM ? If not, it should not go for loading this library. Thanks Regards Somnath From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Garg, Pankaj Sent: Wednesday, May 27, 2015 2:26 PM To: Garg, Pankaj; ceph-users@lists.ceph.com Subject: Re: [ceph-users] ceph-deploy for Hammer I seem to be getting these errors in the Monitor Log : 2015-05-27 21:17:41.908839 3ff907368e0 -1 erasure_code_init(jerasure,/usr/lib/aarch64-linux-gnu/ceph/erasure-code): (5) Input/output error 2015-05-27 21:17:41.978113 3ff969168e0 0 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff), process ceph-mon, pid 16592 2015-05-27 21:17:41.984383 3ff969168e0 -1 ErasureCodePluginSelectJerasure: load dlopen(/usr/lib/aarch64-linux-gnu/ceph/erasure-code/libec_jerasure_neon.so): /usr/lib/aarch64-linux-gnu/ceph/erasure-code/libec_jerasure_neon.so: cannot open shared object file: No such file or directory 2015-05-27 21:17:41.98 3ff969168e0 -1 erasure_code_init(jerasure,/usr/lib/aarch64-linux-gnu/ceph/erasure-code): (5) Input/output error 2015-05-27 21:17:42.052415 3ff90cf68e0 0 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff), process ceph-mon, pid 16604 2015-05-27 21:17:42.058656 3ff90cf68e0 -1 ErasureCodePluginSelectJerasure: load dlopen(/usr/lib/aarch64-linux-gnu/ceph/erasure-code/libec_jerasure_neon.so): /usr/lib/aarch64-linux-gnu/ceph/erasure-code/libec_jerasure_neon.so: cannot open shared object file: No such file or directory 2015-05-27 21:17:42.058715 3ff90cf68e0 -1 erasure_code_init(jerasure,/usr/lib/aarch64-linux-gnu/ceph/erasure-code): (5) Input/output error 2015-05-27 21:17:42.125279 3ffac4368e0 0 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff), process ceph-mon, pid 16616 2015-05-27 21:17:42.131666 3ffac4368e0 -1 ErasureCodePluginSelectJerasure: load dlopen(/usr/lib/aarch64-linux-gnu/ceph/erasure-code/libec_jerasure_neon.so): /usr/lib/aarch64-linux-gnu/ceph/erasure-code/libec_jerasure_neon.so: cannot open shared object file: No such file or directory 2015-05-27 21:17:42.131726 3ffac4368e0 -1 erasure_code_init(jerasure,/usr/lib/aarch64-linux-gnu/ceph/erasure-code): (5) Input/output error The lib file exists, so not sure why this is happening. Any help appreciated. Thanks Pankaj From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Garg, Pankaj Sent: Wednesday, May 27, 2015 1:37 PM To: ceph-users@lists.ceph.com Subject: [ceph-users] ceph-deploy for Hammer Hi, Is there a particular verion of Ceph-Deploy that should be used with Hammer release? This is a brand new cluster. I’m getting the following error when running command : ceph-deploy mon create-initial [ceph_deploy.conf][DEBUG ] found configuration file at:
[ceph-users] [ANN] ceph-deploy 1.5.25 released
Hi everyone, This is announcing a new release of ceph-deploy that fixes a security related issue, improves SUSE support, and improves support for RGW on RPM systems. ceph-deploy can be installed from ceph.com hosted repos for Firefly, Giant, Hammer, and testing, and is also available on PyPI. Eagle-eyed readers may notice that there was not an announcement for 1.5.24 -- this was due to package build infrastructure issues that prevented the creation of RPM and DEB packages. By the time the issues were resolved, 1.5.25 was imminent, so 1.5.24 packages were not created even though 1.5.24 was available through PyPI. Full changelog is available at [1], but here are the highlights for both 1.5.25 and 1.5.24: - Fix CVE where 'ceph-deploy admin' command resulted in admin keyring being pushed to remote nodes with world readable (0644) permissions. - Fix reference to package name ceph-radosgw on RPM systems - Fix possible truncated output of ceph-deploy disk list - More robust deployment of RGW on RPM systems Please update! Cheers, - Travis [1] http://ceph.com/ceph-deploy/docs/changelog.html#id2 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Firefly - Giant : CentOS 7 : install failed ceph-deploy
Hi Vickey, The easiest way I know of to get around this right now is to add the following line in section for epel in /etc/yum.repos.d/epel.repo exclude=python-rados python-rbd So this is what my epel.repo file looks like: http://fpaste.org/208681/ It is those two packages in EPEL that are causing problems. I also tried enabling epel-testing, but that didn't work either. Unfortunately you would need to add this line on each node where Ceph Giant is being installed. - Travis On Wed, Apr 8, 2015 at 4:11 PM, Vickey Singh vickey.singh22...@gmail.com wrote: Community , need help. -VS- On Wed, Apr 8, 2015 at 4:36 PM, Vickey Singh vickey.singh22...@gmail.com wrote: Any suggestion geeks VS On Wed, Apr 8, 2015 at 2:15 PM, Vickey Singh vickey.singh22...@gmail.com wrote: Hi The below suggestion also didn’t worked Full logs here : http://paste.ubuntu.com/10771939/ [root@rgw-node1 yum.repos.d]# yum --showduplicates list ceph Loaded plugins: fastestmirror, priorities Loading mirror speeds from cached hostfile * base: mirror.zetup.net * epel: ftp.fi.muni.cz * extras: mirror.zetup.net * updates: mirror.zetup.net 25 packages excluded due to repository priority protections Available Packages ceph.x86_64 0.80.6-0.el7.centos Ceph ceph.x86_64 0.80.7-0.el7.centos Ceph ceph.x86_64 0.80.8-0.el7.centos Ceph ceph.x86_64 0.80.9-0.el7.centos Ceph [root@rgw-node1 yum.repos.d]# Its not able to install latest available package , yum is getting confused with other DOT releases. Any other suggestion to fix this ??? -- Processing Dependency: libboost_system-mt.so.1.53.0()(64bit) for package: librbd1-0.80.9-0.el7.centos.x86_64 -- Processing Dependency: libboost_thread-mt.so.1.53.0()(64bit) for package: librbd1-0.80.9-0.el7.centos.x86_64 -- Finished Dependency Resolution Error: Package: librbd1-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: libboost_system-mt.so.1.53.0()(64bit) Error: Package: ceph-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: libboost_system-mt.so.1.53.0()(64bit) Error: Package: ceph-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: libaio.so.1(LIBAIO_0.4)(64bit) Error: Package: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: libboost_thread-mt.so.1.53.0()(64bit) Error: Package: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: librados2 = 0.80.7-0.el7.centos Available: librados2-0.80.6-0.el7.centos.x86_64 (Ceph) librados2 = 0.80.6-0.el7.centos Available: librados2-0.80.7-0.el7.centos.x86_64 (Ceph) librados2 = 0.80.7-0.el7.centos Available: librados2-0.80.8-0.el7.centos.x86_64 (Ceph) librados2 = 0.80.8-0.el7.centos Installing: librados2-0.80.9-0.el7.centos.x86_64 (Ceph) librados2 = 0.80.9-0.el7.centos Error: Package: libcephfs1-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: libboost_thread-mt.so.1.53.0()(64bit) Error: Package: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: python-requests Error: Package: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: librbd1 = 0.80.7-0.el7.centos Available: librbd1-0.80.6-0.el7.centos.x86_64 (Ceph) librbd1 = 0.80.6-0.el7.centos Available: librbd1-0.80.7-0.el7.centos.x86_64 (Ceph) librbd1 = 0.80.7-0.el7.centos Available: librbd1-0.80.8-0.el7.centos.x86_64 (Ceph) librbd1 = 0.80.8-0.el7.centos Installing: librbd1-0.80.9-0.el7.centos.x86_64 (Ceph) librbd1 = 0.80.9-0.el7.centos Error: Package: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: libboost_system-mt.so.1.53.0()(64bit) Error: Package: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: python-ceph = 0.80.7-0.el7.centos Available: python-ceph-0.80.6-0.el7.centos.x86_64 (Ceph) python-ceph = 0.80.6-0.el7.centos Available: python-ceph-0.80.7-0.el7.centos.x86_64 (Ceph) python-ceph = 0.80.7-0.el7.centos Available: python-ceph-0.80.8-0.el7.centos.x86_64 (Ceph) python-ceph = 0.80.8-0.el7.centos Installing: python-ceph-0.80.9-0.el7.centos.x86_64 (Ceph) python-ceph = 0.80.9-0.el7.centos Error: Package: libcephfs1-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: libboost_system-mt.so.1.53.0()(64bit) Error: Package: ceph-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: python-requests Error: Package: ceph-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: librados2 = 0.80.7-0.el7.centos Available: librados2-0.80.6-0.el7.centos.x86_64 (Ceph) librados2 = 0.80.6-0.el7.centos Available: librados2-0.80.7-0.el7.centos.x86_64 (Ceph)
[ceph-users] [ANN] ceph-deploy 1.5.23 released
Hi All, This is a new release of ceph-deploy that includes a new feature for Hammer and bugfixes. ceph-deploy can be installed from the ceph.com hosted repos for Firefly, Giant, Hammer, or testing, and is also available on PyPI. ceph-deploy now defaults to installing the Hammer release. If you need to install a different release, use the --release flag. To go along with the Hammer release, ceph-deploy now includes support for a drastically simplified deployment for RGW. See further details at [1] and [2]. This release also fixes an issue where keyrings pushed to remote nodes ended up with world-readable permissions. The full changelog can be seen at [3]. Please update! Cheers, - Travis [1] http://ceph.com/docs/master/start/quick-ceph-deploy/#add-an-rgw-instance [2] http://ceph.com/ceph-deploy/docs/rgw.html [3] http://ceph.com/ceph-deploy/docs/changelog.html#id2 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Inconsistent ceph-deploy disk list command results
Hi Frederic, Thanks for the report! Do you mind throwing this details into a bug report at http://tracker.ceph.com/ ? I have seen the same thing once before, but at the time didn't have the chance to check if the inconsistency was coming from ceph-deploy or from ceph-disk. This certainly seems to point at ceph-deploy! - Travis On Wed, Apr 8, 2015 at 4:15 AM, f...@univ-lr.fr f...@univ-lr.fr wrote: Hi all, I want to alert on a command we've learned to avoid for its inconsistent results. on Giant 0.87.1 and Hammer 0.93.0 (ceph-deploy-1.5.22-0.noarch was used in both cases) ceph-deploy disk list command has a problem. We should get an exhaustive list of devices entries, like this one : ../.. /dev/sdk : /dev/sdk1 ceph data, active, cluster ceph, osd.34, journal /dev/sda9 ../.. But from the admin node, when we count how many disks we have on our nodes , results are incorrect and differ each time : $ ceph-deploy disk list osdnode1 21|grep active, |wc -l 8 $ ceph-deploy disk list osdnode1 21|grep active, |wc -l 12 $ ceph-deploy disk list osdnode1 21|grep active, |wc -l 10 $ ceph-deploy disk list osdnode1 21|grep active, |wc -l 15 $ ceph-deploy disk list osdnode1 21|grep active, |wc -l 12 From the nodes, results are correct (15) and always the same : $ ceph-disk list 21|grep active, |wc -l 15 $ ceph-disk list 21|grep active, |wc -l 15 $ ceph-disk list 21|grep active, |wc -l 15 $ ceph-disk list 21|grep active, |wc -l 15 $ ceph-disk list 21|grep active, |wc -l 15 $ ceph-disk list 21|grep active, |wc -l 15 but a pretty similar 'ceph-deploy osd list' command works fine Frederic ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] [ANN] ceph-deploy 1.5.22 released
Hi All, This is a new release of ceph-deploy that changes a couple of behaviors. On RPM-based distros, ceph-deploy will now automatically enable check_obsoletes in the Yum priorities plugin. This resolves an issue many community members hit where package dependency resolution was breaking due to conflicts between upstream packaging (hosted on ceph.com) and downstream (i.e., Fedora or EPEL). The other important change is that when using ceph-deploy to install Ceph packages on a RHEL machine, the --release flag *must* be used if you want to install upstream packages. In other words, if you want to install Giant on a RHEL machine, you would need to use ceph-deploy install --release giant. If the --release flag is not used, ceph-deploy will expect to use downstream package on RHEL. This is documented at [1]. The full changelog can be seen at [2]. Please update! - Travis [1] http://ceph.com/ceph-deploy/docs/install.html#distribution-notes [2] http://ceph.com/ceph-deploy/docs/changelog.html#id1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] problem with yum install ceph-deploy command
Hi Khyati, On Sat, Mar 7, 2015 at 5:18 AM, khyati joshi kpjosh...@gmail.com wrote: Hello ceph-users, I am new to ceph.I am using centos-5.11 (i386) for deploying ceph with and epel-release-5.4.noarch.rpm is sucessfuly installed. ceph (and ceph-deploy) is not packaged for CentOS 5. You'll need to use 6 or 7. But running yum install ceph-deploy command is giving following error. ceph-deploy-1.5.21-0.noarch from ceph-noarch has depsolving problem --missing dependencies: python-distribute is needed by package ceph-deploy-1.5.21-0.noarch. This is Yum saying it can't find a python-distribute package in your configured repos. Again, not sure where/if this is available on CentOS 5. then I removed python-argparse and run command yum install snappy leveldb gdisk python-argparse gperftools-libs and got another error as rpm(lib)(filedigests) needed by python-argparse-1.2.1-2.e16.noarch rpmlib(payloadlsxz) needed by python-arg-parse-1.2.1.1-2.e16.noarch. Not sure why you would remove argparse. It's required. Main cause of both errors are related with python but dont know how to resolve it. Do anyone knows how to sove this error? While it *may* be possible to get ceph-deploy working on a CentOS 5 box (I would install using pip, pointing to PyPI instead of using Yum/EPEL for this), it would only be useful as a place to launch installs on remote machines from. Your best bet is to run a much newer distribution. - Travis Any help will be appreciated. Thanks, khyati joshi M.tech Student, Gujarat, India. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph User Teething Problems
On Wed, Mar 4, 2015 at 4:43 PM, Lionel Bouton lionel-subscript...@bouton.name wrote: On 03/04/15 22:18, John Spray wrote: On 04/03/2015 20:27, Datatone Lists wrote: [...] [Please don't mention ceph-deploy] This kind of comment isn't very helpful unless there is a specific issue with ceph-deploy that is preventing you from using it, and causing you to resort to manual steps. As a new maintainer of ceph-deploy, I'm happy to hear all gripes. :) ceph-deploy is a subject I never took the time to give feedback on. We can't use it (we use Gentoo which isn't supported by ceph-deploy) and even if we could I probably wouldn't allow it: I believe that for important pieces of infrastructure like Ceph you have to understand its inner workings to the point where you can hack your way out in cases of problems and build tools to integrate them better with your environment (you can understand one of the reasons why we use Gentoo in production with other distributions...). I believe using ceph-deploy makes it more difficult to acquire the knowledge to do so. For example we have a script to replace a defective OSD (destroying an existing one and replacing with a new one) locking data in place as long as we can to avoid crush map changes to trigger movements until the map reaches its original state again which minimizes the total amount of data copied around. It might have been possible to achieve this with ceph-deploy, but I doubt we would have achieved it as easily (from understanding the causes of data movements through understanding the osd identifiers allocation process to implementing the script) if we hadn't created the OSD by hand repeatedly before scripting some processes. Thanks for this feedback. I share a lot of your sentiments, especially that it is good to understand as much of the system as you can. Everyone's skill level and use-case is different, and ceph-deploy is targeted more towards PoC use-cases. It tries to make things as easy as possible, but that necessarily abstracts most of the details away. Last time I searched for documentation on manual configuration it was much harder to find (mds manual configuration was indeed something I didn't find at all too). Best regards, Lionel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Centos 7 OSD silently fail to start
Also, did you successfully start your monitor(s), and define/create the OSDs within the Ceph cluster itself? There are several steps to creating a Ceph cluster manually. I'm unsure if you have done the steps to actually create and register the OSDs with the cluster. - Travis On Wed, Feb 25, 2015 at 9:49 AM, Leszek Master keks...@gmail.com wrote: Check firewall rules and selinux. It sometimes is a pain in the ... :) 25 lut 2015 01:46 Barclay Jameson almightybe...@gmail.com napisał(a): I have tried to install ceph using ceph-deploy but sgdisk seems to have too many issues so I did a manual install. After mkfs.btrfs on the disks and journals and mounted them I then tried to start the osds which failed. The first error was: #/etc/init.d/ceph start osd.0 /etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines ) I then manually added the osds to the conf file with the following as an example: [osd.0] osd_host = node01 Now when I run the command : # /etc/init.d/ceph start osd.0 There is no error or output from the command and in fact when I do a ceph -s no osds are listed as being up. Doing as ps aux | grep -i ceph or ps aux | grep -i osd shows there are no osd running. I also have done htop to see if any process are running and none are shown. I had this working on SL6.5 with Firefly but Giant on Centos 7 has been nothing but a giant pain. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph-deploy issues
Hi Pankaj, I can't say that it will fix the issue, but the first thing I would encourage is to use the latest ceph-deploy. you are using 1.4.0, which is quite old. The latest is 1.5.21. - Travis On Wed, Feb 25, 2015 at 3:38 PM, Garg, Pankaj pankaj.g...@caviumnetworks.com wrote: Hi, I had a successful ceph cluster that I am rebuilding. I have completely uninstalled ceph and any remnants and directories and config files. While setting up the new cluster, I follow the Ceph-deploy documentation as described before. I seem to get an error now (tried many times) : ceph-deploy mon create-initial command fails in gather keys step. This never happened before, and I’m not sure why its failing now. cephuser@ceph1:~/my-cluster$ ceph-deploy mon create-initial [ceph_deploy.cli][INFO ] Invoked (1.4.0): /usr/bin/ceph-deploy mon create-initial [ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts ceph1 [ceph_deploy.mon][DEBUG ] detecting platform for host ceph1 ... [ceph1][DEBUG ] connected to host: ceph1 [ceph1][DEBUG ] detect platform information from remote host [ceph1][DEBUG ] detect machine type [ceph_deploy.mon][INFO ] distro info: Ubuntu 14.04 trusty [ceph1][DEBUG ] determining if provided host has same hostname in remote [ceph1][DEBUG ] get remote short hostname [ceph1][DEBUG ] deploying mon to ceph1 [ceph1][DEBUG ] get remote short hostname [ceph1][DEBUG ] remote hostname: ceph1 [ceph1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [ceph1][DEBUG ] create the mon path if it does not exist [ceph1][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-ceph1/done [ceph1][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-ceph1/done [ceph1][INFO ] creating keyring file: /var/lib/ceph/tmp/ceph-ceph1.mon.keyring [ceph1][DEBUG ] create the monitor keyring file [ceph1][INFO ] Running command: sudo ceph-mon --cluster ceph --mkfs -i ceph1 --keyring /var/lib/ceph/tmp/ceph-ceph1.mon.keyring [ceph1][DEBUG ] ceph-mon: set fsid to 099013d5-126d-45b4-a98e-5f0c386805a4 [ceph1][DEBUG ] ceph-mon: created monfs at /var/lib/ceph/mon/ceph-ceph1 for mon.ceph1 [ceph1][INFO ] unlinking keyring file /var/lib/ceph/tmp/ceph-ceph1.mon.keyring [ceph1][DEBUG ] create a done file to avoid re-doing the mon deployment [ceph1][DEBUG ] create the init path if it does not exist [ceph1][DEBUG ] locating the `service` executable... [ceph1][INFO ] Running command: sudo initctl emit ceph-mon cluster=ceph id=ceph1 [ceph1][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph1.asok mon_status [ceph1][DEBUG ] [ceph1][DEBUG ] status for monitor: mon.ceph1 [ceph1][DEBUG ] { [ceph1][DEBUG ] election_epoch: 2, [ceph1][DEBUG ] extra_probe_peers: [ [ceph1][DEBUG ] 192.168.240.101:6789/0 [ceph1][DEBUG ] ], [ceph1][DEBUG ] monmap: { [ceph1][DEBUG ] created: 0.00, [ceph1][DEBUG ] epoch: 1, [ceph1][DEBUG ] fsid: 099013d5-126d-45b4-a98e-5f0c386805a4, [ceph1][DEBUG ] modified: 0.00, [ceph1][DEBUG ] mons: [ [ceph1][DEBUG ] { [ceph1][DEBUG ] addr: 10.18.240.101:6789/0, [ceph1][DEBUG ] name: ceph1, [ceph1][DEBUG ] rank: 0 [ceph1][DEBUG ] } [ceph1][DEBUG ] ] [ceph1][DEBUG ] }, [ceph1][DEBUG ] name: ceph1, [ceph1][DEBUG ] outside_quorum: [], [ceph1][DEBUG ] quorum: [ [ceph1][DEBUG ] 0 [ceph1][DEBUG ] ], [ceph1][DEBUG ] rank: 0, [ceph1][DEBUG ] state: leader, [ceph1][DEBUG ] sync_provider: [] [ceph1][DEBUG ] } [ceph1][DEBUG ] [ceph1][INFO ] monitor: mon.ceph1 is running [ceph1][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph1.asok mon_status [ceph_deploy.mon][INFO ] processing monitor mon.ceph1 [ceph1][DEBUG ] connected to host: ceph1 [ceph1][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph1.asok mon_status [ceph_deploy.mon][INFO ] mon.ceph1 monitor has reached quorum! [ceph_deploy.mon][INFO ] all initial monitors are running and have formed quorum [ceph_deploy.mon][INFO ] Running gatherkeys... [ceph_deploy.gatherkeys][DEBUG ] Checking ceph1 for /etc/ceph/ceph.client.admin.keyring [ceph1][DEBUG ] connected to host: ceph1 [ceph1][DEBUG ] detect platform information from remote host [ceph1][DEBUG ] detect machine type [ceph1][DEBUG ] fetch remote file [ceph_deploy.gatherkeys][WARNIN] Unable to find /etc/ceph/ceph.client.admin.keyring on ['ceph1'] [ceph_deploy.gatherkeys][DEBUG ] Have ceph.mon.keyring [ceph_deploy.gatherkeys][DEBUG ] Checking ceph1 for /var/lib/ceph/bootstrap-osd/ceph.keyring [ceph1][DEBUG ] connected to host: ceph1 [ceph1][DEBUG ] detect platform
Re: [ceph-users] ceph-giant installation error on centos 6.6
Note that ceph-deploy would enable EPEL for you automatically on CentOS. When doing a manual installation, the requirement for EPEL is called out here: http://ceph.com/docs/master/install/get-packages/#id8 Though looking at that, we could probably update it to use the now much easier to use yum install epel-release. :) - Travis On Wed, Feb 18, 2015 at 12:25 PM, Wenxiao He wenx...@gmail.com wrote: Thanks Brad. That solved the problem. I mistakenly assumed all dependencies are in http://ceph.com/rpm-giant/el6/x86_64/. Regards, Wenxiao On Tue, Feb 17, 2015 at 10:37 PM, Brad Hubbard bhubb...@redhat.com wrote: On 02/18/2015 12:43 PM, Wenxiao He wrote: Hello, I need some help as I am getting package dependency errors when trying to install ceph-giant on centos 6.6. See below for repo files and also the yum install output. --- Package python-imaging.x86_64 0:1.1.6-19.el6 will be installed -- Finished Dependency Resolution Error: Package: 1:librbd1-0.87-0.el6.x86_64 (Ceph) Requires: liblttng-ust.so.0()(64bit) Error: Package: gperftools-libs-2.0-11.el6.3.x86_64 (Ceph) Requires: libunwind.so.8()(64bit) Error: Package: 1:librados2-0.87-0.el6.x86_64 (Ceph) Requires: liblttng-ust.so.0()(64bit) Error: Package: 1:ceph-0.87-0.el6.x86_64 (Ceph) Requires: liblttng-ust.so.0()(64bit) Looks like you may need to install libunwind and lttng-ust from EPEL 6? They seem to be the packages that supply liblttng-ust.so and ibunwind.so so you could try installing those from EPEL 6 and see how that goes? Note that this should not be taken as the, or even a, authorative answer :) Cheers, Brad ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Installation failure
Hi Paul, Would you mind sharing/posting the contents of your .repo files for ceph, ceph-el7, and ceph-noarch repos? I see that python-rbd is getting pulled in from EPEL, which I don't think is what you want. My guess is that you need the fix documented in http://tracker.ceph.com/issues/10476, though that was specifically addressing Fedora downstream packaging of Ceph competing with current upstream packaging hosted on ceph.com repos. This may be something similar with EPEL. - Travis On Mon, Feb 16, 2015 at 7:19 AM, HEWLETT, Paul (Paul)** CTR ** paul.hewl...@alcatel-lucent.com wrote: Hi all I have been installing ceph giant quite happily for the past 3 months on various systems and use an ansible recipe to do so. The OS is RHEL7. This morning on one of my test systems installation fails with: [root@octopus ~]# yum install ceph ceph-deploy Loaded plugins: langpacks, priorities, product-id, subscription-manager Ceph-el7 | 951 B 00:00:00 ceph | 951 B 00:00:00 ceph-noarch | 951 B 00:00:00 14 packages excluded due to repository priority protections Package ceph-deploy-1.5.21-0.noarch already installed and latest version Resolving Dependencies -- Running transaction check --- Package ceph.x86_64 1:0.87-0.el7.centos will be installed -- Processing Dependency: librbd1 = 1:0.87-0.el7.centos for package: 1:ceph-0.87-0.el7.centos.x86_64 -- Processing Dependency: ceph-common = 1:0.87-0.el7.centos for package: 1:ceph-0.87-0.el7.centos.x86_64 -- Processing Dependency: libcephfs1 = 1:0.87-0.el7.centos for package: 1:ceph-0.87-0.el7.centos.x86_64 -- Processing Dependency: python-ceph = 1:0.87-0.el7.centos for package: 1:ceph-0.87-0.el7.centos.x86_64 -- Processing Dependency: librados2 = 1:0.87-0.el7.centos for package: 1:ceph-0.87-0.el7.centos.x86_64 -- Processing Dependency: python-flask for package: 1:ceph-0.87-0.el7.centos.x86_64 -- Processing Dependency: python-requests for package: 1:ceph-0.87-0.el7.centos.x86_64 -- Processing Dependency: hdparm for package: 1:ceph-0.87-0.el7.centos.x86_64 -- Processing Dependency: libtcmalloc.so.4()(64bit) for package: 1:ceph-0.87-0.el7.centos.x86_64 -- Processing Dependency: libleveldb.so.1()(64bit) for package: 1:ceph-0.87-0.el7.centos.x86_64 -- Processing Dependency: libcephfs.so.1()(64bit) for package: 1:ceph-0.87-0.el7.centos.x86_64 -- Processing Dependency: librados.so.2()(64bit) for package: 1:ceph-0.87-0.el7.centos.x86_64 -- Processing Dependency: libboost_system-mt.so.1.53.0()(64bit) for package: 1:ceph-0.87-0.el7.centos.x86_64 -- Processing Dependency: libboost_thread-mt.so.1.53.0()(64bit) for package: 1:ceph-0.87-0.el7.centos.x86_64 -- Running transaction check --- Package boost-system.x86_64 0:1.53.0-18.el7 will be installed --- Package boost-thread.x86_64 0:1.53.0-18.el7 will be installed --- Package ceph-common.x86_64 1:0.87-0.el7.centos will be installed -- Processing Dependency: redhat-lsb-core for package: 1:ceph-common-0.87-0.el7.centos.x86_64 --- Package gperftools-libs.x86_64 0:2.1-1.el7 will be installed -- Processing Dependency: libunwind.so.8()(64bit) for package: gperftools-libs-2.1-1.el7.x86_64 --- Package hdparm.x86_64 0:9.43-5.el7 will be installed --- Package leveldb.x86_64 0:1.12.0-5.el7 will be installed --- Package libcephfs1.x86_64 1:0.87-0.el7.centos will be installed --- Package librados2.x86_64 1:0.87-0.el7.centos will be installed --- Package librbd1.x86_64 1:0.87-0.el7.centos will be installed --- Package python-ceph-compat.x86_64 1:0.80.7-0.4.el7 will be installed -- Processing Dependency: python-rbd = 1:0.80.7 for package: 1:python-ceph-compat-0.80.7-0.4.el7.x86_64 -- Processing Dependency: python-rados = 1:0.80.7 for package: 1:python-ceph-compat-0.80.7-0.4.el7.x86_64 -- Processing Dependency: python-cephfs = 1:0.80.7 for package: 1:python-ceph-compat-0.80.7-0.4.el7.x86_64 --- Package python-flask.noarch 1:0.10.1-4.el7 will be installed -- Processing Dependency: python-werkzeug for package: 1:python-flask-0.10.1-4.el7.noarch -- Processing Dependency: python-jinja2 for package: 1:python-flask-0.10.1-4.el7.noarch -- Processing Dependency: python-itsdangerous for package: 1:python-flask-0.10.1-4.el7.noarch --- Package python-requests.noarch 0:1.1.0-8.el7 will be installed -- Processing Dependency: python-urllib3 for package: python-requests-1.1.0-8.el7.noarch -- Running transaction check --- Package libunwind.x86_64 0:1.1-3.el7 will be installed --- Package python-cephfs.x86_64 1:0.80.7-0.4.el7 will be installed -- Processing Dependency: libcephfs1 = 1:0.80.7 for package: 1:python-cephfs-0.80.7-0.4.el7.x86_64 --- Package python-itsdangerous.noarch 0:0.23-2.el7 will be installed --- Package python-jinja2.noarch 0:2.7.2-2.el7 will be installed -- Processing Dependency: python-babel = 0.8 for package: python-jinja2-2.7.2-2.el7.noarch -- Processing Dependency: python-markupsafe for package: python-jinja2-2.7.2-2.el7.noarch --- Package
Re: [ceph-users] Installation failure
Hi Paul, Looking a bit closer, I do believe it is the same issue. It looks like python-rbd in EPEL (and others like python-rados) were updated in EPEL on January 21st, 2015. This update included some changes to how dependencies were handled between EPEL and RHEL for Ceph. See http://pkgs.fedoraproject.org/cgit/ceph.git/commit/?h=epel7 Fedora and EPEL both split out the older python-ceph package into smaller subsets (python-{rados,cephfs,rbd}), but these changes are not upstream yet (from the ceph.com hosted packages). So if repos enable both ceph.com and EPEL, the EPEL packages will override the ceph.com packages because the RPMs have obsoletes: python-ceph in them, even though the EPEL packages are older. It's a bit of a problematic transition period until the upstream packaging splits in the same way. I do believe that using check_obsoletes=1 in /etc/yum/pluginconf.d/priorities.conf will take care of the problem for you. However, it may be the case that you would need to make your ceph .repo files that point to rpm-giant be priority=1. That's my best advice of something to try for now. Thanks, - Travis On Mon, Feb 16, 2015 at 10:16 AM, HEWLETT, Paul (Paul)** CTR ** paul.hewl...@alcatel-lucent.com wrote: Hi Travis Thanks for the reply. My only doubt is that this was all working until this morning. Has anything changed in the Ceph repository? I tried commenting out various repos but this did not work. If I delete the epel repos than ceph installation fails becuase tcmalloc and leveldb are not found My repos are: [root@octopus ~]# ls -l /etc/yum.repos.d/ total 40 -rw-r--r-- 1 root root 700 Feb 16 12:08 ceph.repo -rw-r--r-- 1 root root 957 Nov 25 16:23 epel.repo -rw-r--r-- 1 root root 1056 Nov 25 16:23 epel-testing.repo -rw-r--r-- 1 root root 26533 Feb 16 11:55 redhat.repo and the contents of ceph.repo: [ceph] name=Ceph packages for $basearch baseurl=http://ceph.com/rpm-giant/el7/$basearch enabled=1 priority=2 gpgcheck=1 type=rpm-md gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc [ceph-noarch] name=Ceph noarch packages baseurl=http://ceph.com/rpm-giant/el7/noarch enabled=1 priority=2 gpgcheck=1 type=rpm-md gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc [ceph-source] name=Ceph source packages baseurl=http://ceph.com/rpm-giant/el7/SRPMS enabled=0 priority=2 gpgcheck=1 type=rpm-md gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc [Ceph-el7] name=Ceph-el7 baseurl=http://eu.ceph.com/rpms/rhel7/noarch/ enabled=1 priority=2 gpgcheck=0 [root@octopus ~]# cat /etc/yum.repos.d/epel.repo [epel] name=Extra Packages for Enterprise Linux 7 - $basearch #baseurl=http://download.fedoraproject.org/pub/epel/7/$basearch mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=epel-7arch=$basearch failovermethod=priority enabled=1 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7 [epel-debuginfo] name=Extra Packages for Enterprise Linux 7 - $basearch - Debug #baseurl=http://download.fedoraproject.org/pub/epel/7/$basearch/debug mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=epel-debug-7arch=$basearch failovermethod=priority enabled=0 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7 gpgcheck=1 [epel-source] name=Extra Packages for Enterprise Linux 7 - $basearch - Source #baseurl=http://download.fedoraproject.org/pub/epel/7/SRPMS mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=epel-source-7arch=$basearch failovermethod=priority enabled=0 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7 gpgcheck=1 [root@octopus ~]# cat /etc/yum.repos.d/epel-testing.repo [epel-testing] name=Extra Packages for Enterprise Linux 7 - Testing - $basearch #baseurl=http://download.fedoraproject.org/pub/epel/testing/7/$basearch mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=testing-epel7arch=$basearch failovermethod=priority enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7 [epel-testing-debuginfo] name=Extra Packages for Enterprise Linux 7 - Testing - $basearch - Debug #baseurl=http://download.fedoraproject.org/pub/epel/testing/7/$basearch/debug mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=testing-debug-epel7arch=$basearch failovermethod=priority enabled=0 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7 gpgcheck=1 [epel-testing-source] name=Extra Packages for Enterprise Linux 7 - Testing - $basearch - Source #baseurl=http://download.fedoraproject.org/pub/epel/testing/7/SRPMS mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=testing-source-epel7arch=$basearch failovermethod=priority enabled=0 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7 gpgcheck=1 Regards Paul Hewlett Senior Systems Engineer Velocix, Cambridge Alcatel-Lucent t: +44 1223 435893 m: +44 7985327353 From: Travis Rhoden [trho...@gmail.com] Sent
Re: [ceph-users] error in sys.exitfunc
Hi Karl, Sorry that I missed this go by. If you are still hitting this issue, I'd like to help you and figure this one out, especially since you are not the only person to have hit it. Can you pass along your system details, (OS, version, etc.). I'd also like to know how you installed ceph-deploy (via RPM, or pip?). - Travis On Tue, Jan 20, 2015 at 10:46 AM, Blake, Karl D karl.d.bl...@intel.com wrote: Error is same as this posted link - http://www.spinics.net/lists/ceph-devel/msg21388.html From: Blake, Karl D Sent: Tuesday, January 20, 2015 4:29 AM To: ceph-us...@ceph.com Subject: RE: error in sys.exitfunc Please advise. Thanks, -Karl From: Blake, Karl D Sent: Monday, January 19, 2015 7:23 PM To: 'ceph-us...@ceph.com' Subject: error in sys.exitfunc Anytime I run Ceph-deploy I get the above error. Can you help resolve? Thanks, -Karl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RHEL 7 Installs
Hi John, For the last part, there being two different versions of packages in Giant, I don't think that's the actual problem. What's really happening there is that python-ceph has been obsoleted by other packages that are getting picked up by Yum. See the line that says Package python-ceph is obsoleted by python-rados... It's the same deal as http://tracker.ceph.com/issues/10476 You could try the same fix there. On Fri, Jan 9, 2015 at 4:50 PM, John Wilkins john.wilk...@inktank.com wrote: Ken, I had a number of issues installing Ceph on RHEL 7, which I think are mostly due to dependencies. I followed the quick start guide, which gets the latest major release--e.g., Firefly, Giant. ceph.conf is here: http://goo.gl/LNjFp3 ceph.log common errors included: http://goo.gl/yL8UsM To resolve these, I had to download and install libunwind and python-jinja2. It also seems that the Giant repo had 0.86 and 0.87 packages for python-ceph, and ceph-deploy didn't like that. ceph.log error: http://goo.gl/oeKGUv To resolve this, I had to download and install python-ceph v0.87. Then, run the ceph-deploy install command again. -- John Wilkins Red Hat jowil...@redhat.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy dependency errors on fc20 with firefly
Hi Noah, The root cause has been found. Please see http://tracker.ceph.com/issues/10476 for details. In short, it's an issue between RPM obsoletes and yum priorities plugin. Final solution is pending, but details of a work around are in the issue comments. - Travis On Wed, Jan 7, 2015 at 4:05 PM, Travis Rhoden trho...@gmail.com wrote: Hi Noah, I'll try to recreate this on a fresh FC20 install as well. Looks to me like there might be a repo priority issue. It's mixing packages from Fedora downstream repos and the ceph.com upstream repos. That's not supposed to happen. - Travis On Wed, Jan 7, 2015 at 2:15 PM, Noah Watkins noah.watk...@inktank.com wrote: I'm trying to install Firefly on an up-to-date FC20 box. I'm getting the following errors: [nwatkins@kyoto cluster]$ ../ceph-deploy/ceph-deploy install --release firefly kyoto [ceph_deploy.conf][DEBUG ] found configuration file at: /home/nwatkins/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.21): ../ceph-deploy/ceph-deploy install --release firefly kyoto [ceph_deploy.install][DEBUG ] Installing stable version firefly on cluster ceph hosts kyoto [ceph_deploy.install][DEBUG ] Detecting platform for host kyoto ... [kyoto][DEBUG ] connection detected need for sudo [kyoto][DEBUG ] connected to host: kyoto [kyoto][DEBUG ] detect platform information from remote host [kyoto][DEBUG ] detect machine type [ceph_deploy.install][INFO ] Distro info: Fedora 20 Heisenbug [kyoto][INFO ] installing ceph on kyoto [kyoto][INFO ] Running command: sudo yum -y install yum-plugin-priorities [kyoto][DEBUG ] Loaded plugins: langpacks, priorities, refresh-packagekit [kyoto][DEBUG ] Package yum-plugin-priorities-1.1.31-27.fc20.noarch already installed and latest version [kyoto][DEBUG ] Nothing to do [kyoto][INFO ] Running command: sudo rpm --import https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc [kyoto][INFO ] Running command: sudo rpm -Uvh --replacepkgs --force --quiet http://ceph.com/rpm-firefly/fc20/noarch/ceph-release-1-0.fc20.noarch.rpm [kyoto][DEBUG ] [kyoto][DEBUG ] Updating / installing... [kyoto][DEBUG ] [kyoto][WARNIN] ensuring that /etc/yum.repos.d/ceph.repo contains a high priority [kyoto][WARNIN] altered ceph.repo priorities to contain: priority=1 [kyoto][INFO ] Running command: sudo yum -y -q install ceph [kyoto][WARNIN] Error: Package: 1:python-cephfs-0.80.7-1.fc20.x86_64 (updates) [kyoto][WARNIN]Requires: libcephfs1 = 1:0.80.7-1.fc20 [kyoto][WARNIN]Available: libcephfs1-0.80.1-0.fc20.x86_64 (Ceph) [kyoto][DEBUG ] You could try using --skip-broken to work around the problem [kyoto][WARNIN]libcephfs1 = 0.80.1-0.fc20 [kyoto][WARNIN]Available: libcephfs1-0.80.3-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]libcephfs1 = 0.80.3-0.fc20 [kyoto][WARNIN]Available: libcephfs1-0.80.4-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]libcephfs1 = 0.80.4-0.fc20 [kyoto][WARNIN]Available: libcephfs1-0.80.5-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]libcephfs1 = 0.80.5-0.fc20 [kyoto][WARNIN]Available: libcephfs1-0.80.6-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]libcephfs1 = 0.80.6-0.fc20 [kyoto][WARNIN]Installing: libcephfs1-0.80.7-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]libcephfs1 = 0.80.7-0.fc20 [kyoto][WARNIN] Error: Package: 1:python-rbd-0.80.7-1.fc20.x86_64 (updates) [kyoto][WARNIN]Requires: librbd1 = 1:0.80.7-1.fc20 [kyoto][WARNIN]Available: librbd1-0.80.1-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]librbd1 = 0.80.1-0.fc20 [kyoto][WARNIN]Available: librbd1-0.80.3-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]librbd1 = 0.80.3-0.fc20 [kyoto][WARNIN]Available: librbd1-0.80.4-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]librbd1 = 0.80.4-0.fc20 [kyoto][WARNIN]Available: librbd1-0.80.5-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]librbd1 = 0.80.5-0.fc20 [kyoto][WARNIN]Available: librbd1-0.80.6-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]librbd1 = 0.80.6-0.fc20 [kyoto][WARNIN]Installing: librbd1-0.80.7-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]librbd1 = 0.80.7-0.fc20 [kyoto][WARNIN] Error: Package: 1:python-rados-0.80.7-1.fc20.x86_64 (updates) [kyoto][WARNIN]Requires: librados2 = 1:0.80.7-1.fc20 [kyoto][WARNIN]Available: librados2-0.80.1-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]librados2 = 0.80.1-0.fc20 [kyoto][WARNIN]Available: librados2-0.80.3-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]librados2 = 0.80.3-0.fc20 [kyoto][WARNIN]Available: librados2-0.80.4-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]librados2 = 0.80.4-0.fc20 [kyoto][WARNIN
Re: [ceph-users] Ceph on Centos 7
Hello, Can you give the link the exact instructions you followed? For CentOS7 (EL7) ceph-extras should not be necessary. The instructions at [1] do not have you enabled the ceph-extras repo. You will find that there are EL7 packages at [2]. I recently found a README that was incorrectly referencing ceph-extras when it came to ceph-deploy. I'm wondering if there may be other incorrect instructions floating around. I'm guessing the confusion may be coming from [3]. I think a note should be added there that ceph-extras is not needed for EL7. Right now it just says this is needed for some Ceph deployments, but as you have found, if you enable it on EL7, it won't work. Can you try removing the ceph-extra repo definition and see if that fixes things? - Travis [1] http://ceph.com/docs/master/start/quick-start-preflight/#red-hat-package-manager-rpm [2] http://ceph.com/rpm-giant/ [3] http://ceph.com/docs/master/install/get-packages/#add-ceph-extras On Tue, Jan 6, 2015 at 2:40 AM, Nur Aqilah aqi...@impact-multimedia.com wrote: Hi all, I was wondering if anyone can give me some guidelines in installing ceph on Centos 7. I followed the guidelines on ceph.com on how to do the Quick Installation. But there was always this one particular error. When i typed in this command sudo yum update sudo yum install ceph-deploy a long error pops up. I later checked and found out that el7/CentOS 7 is not listed in here http://ceph.com/packages/ceph-extras/rpm/ Together attached is a screenshot of the error that i was talking about. I would really appreciate it if someone would kindly help me out Thank you and regards, *Nur Aqilah Abdul Rahman* Systems Engineer *impact* *business solutions Sdn Bhd* E303, Level 3 East Wing Metropolitan Square, Jalan PJU 8/1, Damansara Perdana, 47820 Petaling Jaya, Selangor Darul Ehsan P: 03 7728 6826 F: 03 7728 5826 Thanks Regards, [image: Email-Signature_Updated240713] ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy dependency errors on fc20 with firefly
Hi Noah, I'll try to recreate this on a fresh FC20 install as well. Looks to me like there might be a repo priority issue. It's mixing packages from Fedora downstream repos and the ceph.com upstream repos. That's not supposed to happen. - Travis On Wed, Jan 7, 2015 at 2:15 PM, Noah Watkins noah.watk...@inktank.com wrote: I'm trying to install Firefly on an up-to-date FC20 box. I'm getting the following errors: [nwatkins@kyoto cluster]$ ../ceph-deploy/ceph-deploy install --release firefly kyoto [ceph_deploy.conf][DEBUG ] found configuration file at: /home/nwatkins/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.21): ../ceph-deploy/ceph-deploy install --release firefly kyoto [ceph_deploy.install][DEBUG ] Installing stable version firefly on cluster ceph hosts kyoto [ceph_deploy.install][DEBUG ] Detecting platform for host kyoto ... [kyoto][DEBUG ] connection detected need for sudo [kyoto][DEBUG ] connected to host: kyoto [kyoto][DEBUG ] detect platform information from remote host [kyoto][DEBUG ] detect machine type [ceph_deploy.install][INFO ] Distro info: Fedora 20 Heisenbug [kyoto][INFO ] installing ceph on kyoto [kyoto][INFO ] Running command: sudo yum -y install yum-plugin-priorities [kyoto][DEBUG ] Loaded plugins: langpacks, priorities, refresh-packagekit [kyoto][DEBUG ] Package yum-plugin-priorities-1.1.31-27.fc20.noarch already installed and latest version [kyoto][DEBUG ] Nothing to do [kyoto][INFO ] Running command: sudo rpm --import https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc [kyoto][INFO ] Running command: sudo rpm -Uvh --replacepkgs --force --quiet http://ceph.com/rpm-firefly/fc20/noarch/ceph-release-1-0.fc20.noarch.rpm [kyoto][DEBUG ] [kyoto][DEBUG ] Updating / installing... [kyoto][DEBUG ] [kyoto][WARNIN] ensuring that /etc/yum.repos.d/ceph.repo contains a high priority [kyoto][WARNIN] altered ceph.repo priorities to contain: priority=1 [kyoto][INFO ] Running command: sudo yum -y -q install ceph [kyoto][WARNIN] Error: Package: 1:python-cephfs-0.80.7-1.fc20.x86_64 (updates) [kyoto][WARNIN]Requires: libcephfs1 = 1:0.80.7-1.fc20 [kyoto][WARNIN]Available: libcephfs1-0.80.1-0.fc20.x86_64 (Ceph) [kyoto][DEBUG ] You could try using --skip-broken to work around the problem [kyoto][WARNIN]libcephfs1 = 0.80.1-0.fc20 [kyoto][WARNIN]Available: libcephfs1-0.80.3-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]libcephfs1 = 0.80.3-0.fc20 [kyoto][WARNIN]Available: libcephfs1-0.80.4-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]libcephfs1 = 0.80.4-0.fc20 [kyoto][WARNIN]Available: libcephfs1-0.80.5-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]libcephfs1 = 0.80.5-0.fc20 [kyoto][WARNIN]Available: libcephfs1-0.80.6-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]libcephfs1 = 0.80.6-0.fc20 [kyoto][WARNIN]Installing: libcephfs1-0.80.7-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]libcephfs1 = 0.80.7-0.fc20 [kyoto][WARNIN] Error: Package: 1:python-rbd-0.80.7-1.fc20.x86_64 (updates) [kyoto][WARNIN]Requires: librbd1 = 1:0.80.7-1.fc20 [kyoto][WARNIN]Available: librbd1-0.80.1-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]librbd1 = 0.80.1-0.fc20 [kyoto][WARNIN]Available: librbd1-0.80.3-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]librbd1 = 0.80.3-0.fc20 [kyoto][WARNIN]Available: librbd1-0.80.4-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]librbd1 = 0.80.4-0.fc20 [kyoto][WARNIN]Available: librbd1-0.80.5-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]librbd1 = 0.80.5-0.fc20 [kyoto][WARNIN]Available: librbd1-0.80.6-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]librbd1 = 0.80.6-0.fc20 [kyoto][WARNIN]Installing: librbd1-0.80.7-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]librbd1 = 0.80.7-0.fc20 [kyoto][WARNIN] Error: Package: 1:python-rados-0.80.7-1.fc20.x86_64 (updates) [kyoto][WARNIN]Requires: librados2 = 1:0.80.7-1.fc20 [kyoto][WARNIN]Available: librados2-0.80.1-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]librados2 = 0.80.1-0.fc20 [kyoto][WARNIN]Available: librados2-0.80.3-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]librados2 = 0.80.3-0.fc20 [kyoto][WARNIN]Available: librados2-0.80.4-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]librados2 = 0.80.4-0.fc20 [kyoto][WARNIN]Available: librados2-0.80.5-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]librados2 = 0.80.5-0.fc20 [kyoto][WARNIN]Available: librados2-0.80.6-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]librados2 = 0.80.6-0.fc20 [kyoto][WARNIN]Installing: librados2-0.80.7-0.fc20.x86_64 (Ceph) [kyoto][WARNIN]
Re: [ceph-users] ceph-deploy Errors - Fedora 21
Hello, I believe this is a problem specific to Fedora packaging. The Fedora package for ceph-deploy is a bit different than the ones hosted at ceph.com. Can you please tell me the output of rpm -q python-remoto? I believe the problem is that the python-remoto package is too old, and there is not a correct dependency on it when it comes to versions. The minimum version should be 0.0.22, but the latest in Fedora is 0.0.21 (and latest upstream is 0.0.23). I'll push to get this updated correctly. The Fedora package maintainers will need to put out a new release of python-remoto, and hopefully update the spec file for ceph-deploy to require = 0.0.22. - Travis On Mon, Dec 29, 2014 at 10:24 PM, deeepdish deeepd...@gmail.com wrote: Hello. I’m having an issue with ceph-deploy on Fedora 21. - Installed ceph-deploy via ‘yum install ceph-deploy' - created non-root user - assigned sudo privs as per documentation - http://ceph.com/docs/master/rados/deployment/preflight-checklist/ $ ceph-deploy install smg01.erbus.kupsta.net [ceph_deploy.conf][DEBUG ] found configuration file at: /cephfs/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.20): /bin/ceph-deploy install [hostname] [ceph_deploy.install][DEBUG ] Installing stable version firefly on cluster ceph hosts [hostname] [ceph_deploy.install][DEBUG ] Detecting platform for host [hostname] ... [ceph_deploy][ERROR ] RuntimeError: connecting to host: [hostname] resulted in errors: TypeError __init__() got an unexpected keyword argument 'detect_sudo' Thank you. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph-deploy install and pinning on Ubuntu 14.04
Hi Giuseppe, ceph-deploy does try to do some pinning for the Ceph packages. Those settings should be found at /etc/apt/preferences.d/ceph.pref If you find something is incorrect there, please let us know what it is and we can can look into it! - Travis On Sat, Dec 20, 2014 at 11:32 AM, Giuseppe Civitella giuseppe.civite...@gmail.com wrote: Hi all, I'm using deph-deploy on Ubuntu 14.04. When I do a ceph-deploy install I see packages getting installed from ubuntu repositories instead of ceph's ones, am I missing something? Do I need to do some pinning on repositories? Thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Block device and Trim/Discard
One question re: discard support for kRBD -- does it matter which format the RBD is? Format 1 and Format 2 are okay, or just for Format 2? - Travis On Mon, Dec 15, 2014 at 8:58 AM, Max Power mailli...@ferienwohnung-altenbeken.de wrote: Ilya Dryomov ilya.dryo...@inktank.com hat am 12. Dezember 2014 um 18:00 geschrieben: Just a note, discard support went into 3.18, which was released a few days ago. I recently compiled 3.18 on Debian 7 and what do I have to say... It works perfectly well. The used memory goes up and down again. So I think this will be my choice. Thank you! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] [ANN] ceph-deploy 1.5.21 released
Hi All, This is a new release of ceph-deploy that defaults to installing the Giant release of Ceph. Additionally, there are a couple of bug fixes that makes sure that calls to 'gatherkeys' returns non-zero upon failure, and that the EPEL repo is properly enabled as a prerequisite to installation on CentOS and Scientific Linux distros. The full changelog can be seen here: http://ceph.com/ceph-deploy/docs/changelog.html#id1 Please update! - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install
Hi Massimiliano, On Tue, Nov 25, 2014 at 6:02 AM, Massimiliano Cuttini m...@phoenixweb.it wrote: Hi travis, can I have a develop account or tester account in order to submit issue by myself? Registration for the Ceph tracker is open -- anyone can sign up for an account to report issues. If you visit http://tracker.ceph.com, in the top right-hand corner is a link for Register. Hope that helps! - Travis Thanks, Massimiliano Cuttini Il 18/11/2014 23:03, Travis Rhoden ha scritto: I've captured this at http://tracker.ceph.com/issues/10133 On Tue, Nov 18, 2014 at 4:48 PM, Travis Rhoden trho...@gmail.com wrote: Hi Massimiliano, I just recreated this bug myself. Ceph-deploy is supposed to install EPEL automatically on the platforms that need it. I just confirmed that it is not doing so, and will be opening up a bug in the Ceph tracker. I'll paste it here when I do so you can follow it. Thanks for the report! - Travis On Tue, Nov 18, 2014 at 4:41 PM, Massimiliano Cuttini m...@phoenixweb.it wrote: I solved by installing EPEL repo on yum. I think that somebody should write down in the documentation that EPEL is mandatory Il 18/11/2014 14:29, Massimiliano Cuttini ha scritto: Dear all, i try to install ceph but i get errors: #ceph-deploy install node1 [] [ceph_deploy.install][DEBUG ] Installing stable version *firefly *on cluster ceph hosts node1 [ceph_deploy.install][DEBUG ] Detecting platform for host node1 ... [] [node1][DEBUG ] --- Pacchetto libXxf86vm.x86_64 0:1.1.3-2.1.el7 settato per essere installato [node1][DEBUG ] --- Pacchetto mesa-libgbm.x86_64 0:9.2.5-6.20131218.el7_0 settato per essere installato [node1][DEBUG ] --- Pacchetto mesa-libglapi.x86_64 0:9.2.5-6.20131218.el7_0 settato per essere installato [node1][DEBUG ] -- Risoluzione delle dipendenze completata [node1][WARNIN] Errore: Pacchetto: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][DEBUG ] Si può provare ad usare --skip-broken per aggirare il problema [node1][WARNIN] Richiede: libleveldb.so.1()(64bit) [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][DEBUG ] Provare ad eseguire: rpm -Va --nofiles --nodigest [node1][ERROR ] RuntimeError: command returned non-zero exit status: 1 *[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y install ceph* I installed GIANT version not FIREFLY on admin-node. Is it a typo error in the config file or is it truly trying to install FIREFLY instead of GIANT. About the error, i see that it's related to wrong python default libraries. It seems that CEPH require libraries not available in the current distro: [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][WARNIN] Richiede: libleveldb.so.1()(64bit) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) This seems strange. Can you fix this? Thanks, Massimiliano Cuttini ___ ceph-users mailing listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install
Hi Massimiliano, On Tue, Nov 18, 2014 at 5:23 PM, Massimiliano Cuttini m...@phoenixweb.it wrote: Then. ...very good! :) Ok, the next bad thing is that I have installed GIANT on Admin node. However ceph-deploy ignore ADMIN node installation and install FIREFLY. Now i have ceph-deploy of Giant on my ADMIN node and my first OSD node with FIREFLY. How did you do the install on the ADMIN node? was it using ceph-deploy, or installing manually? ceph-deploy does indeed still default to Firefly, but it will change in the next version to Giant. The ceph-deploy admin command should only push keys and config files, so it doesn't do an actual install of packages. A call to ceph-deploy install would have installed Firefly, unless providing the --release giant option. It seems to me odd. Is it fine or i should prepare myself to format again? That will depend on your goals. A mixed version cluster is viable, but if you want Giant everywhere, you'll need to upgrade the packages on the node running your OSDs and restart the OSDs themselves. An actual disk re-format is not necessary. - Travis Il 18/11/2014 23:03, Travis Rhoden ha scritto: I've captured this at http://tracker.ceph.com/issues/10133 On Tue, Nov 18, 2014 at 4:48 PM, Travis Rhoden trho...@gmail.com wrote: Hi Massimiliano, I just recreated this bug myself. Ceph-deploy is supposed to install EPEL automatically on the platforms that need it. I just confirmed that it is not doing so, and will be opening up a bug in the Ceph tracker. I'll paste it here when I do so you can follow it. Thanks for the report! - Travis On Tue, Nov 18, 2014 at 4:41 PM, Massimiliano Cuttini m...@phoenixweb.it wrote: I solved by installing EPEL repo on yum. I think that somebody should write down in the documentation that EPEL is mandatory Il 18/11/2014 14:29, Massimiliano Cuttini ha scritto: Dear all, i try to install ceph but i get errors: #ceph-deploy install node1 [] [ceph_deploy.install][DEBUG ] Installing stable version *firefly *on cluster ceph hosts node1 [ceph_deploy.install][DEBUG ] Detecting platform for host node1 ... [] [node1][DEBUG ] --- Pacchetto libXxf86vm.x86_64 0:1.1.3-2.1.el7 settato per essere installato [node1][DEBUG ] --- Pacchetto mesa-libgbm.x86_64 0:9.2.5-6.20131218.el7_0 settato per essere installato [node1][DEBUG ] --- Pacchetto mesa-libglapi.x86_64 0:9.2.5-6.20131218.el7_0 settato per essere installato [node1][DEBUG ] -- Risoluzione delle dipendenze completata [node1][WARNIN] Errore: Pacchetto: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][DEBUG ] Si può provare ad usare --skip-broken per aggirare il problema [node1][WARNIN] Richiede: libleveldb.so.1()(64bit) [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][DEBUG ] Provare ad eseguire: rpm -Va --nofiles --nodigest [node1][ERROR ] RuntimeError: command returned non-zero exit status: 1 *[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y install ceph* I installed GIANT version not FIREFLY on admin-node. Is it a typo error in the config file or is it truly trying to install FIREFLY instead of GIANT. About the error, i see that it's related to wrong python default libraries. It seems that CEPH require libraries not available in the current distro: [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][WARNIN] Richiede: libleveldb.so.1()(64bit) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) This seems strange. Can you fix this? Thanks, Massimiliano Cuttini ___ ceph-users mailing listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install
Hi Massimiliano, I just recreated this bug myself. Ceph-deploy is supposed to install EPEL automatically on the platforms that need it. I just confirmed that it is not doing so, and will be opening up a bug in the Ceph tracker. I'll paste it here when I do so you can follow it. Thanks for the report! - Travis On Tue, Nov 18, 2014 at 4:41 PM, Massimiliano Cuttini m...@phoenixweb.it wrote: I solved by installing EPEL repo on yum. I think that somebody should write down in the documentation that EPEL is mandatory Il 18/11/2014 14:29, Massimiliano Cuttini ha scritto: Dear all, i try to install ceph but i get errors: #ceph-deploy install node1 [] [ceph_deploy.install][DEBUG ] Installing stable version *firefly *on cluster ceph hosts node1 [ceph_deploy.install][DEBUG ] Detecting platform for host node1 ... [] [node1][DEBUG ] --- Pacchetto libXxf86vm.x86_64 0:1.1.3-2.1.el7 settato per essere installato [node1][DEBUG ] --- Pacchetto mesa-libgbm.x86_64 0:9.2.5-6.20131218.el7_0 settato per essere installato [node1][DEBUG ] --- Pacchetto mesa-libglapi.x86_64 0:9.2.5-6.20131218.el7_0 settato per essere installato [node1][DEBUG ] -- Risoluzione delle dipendenze completata [node1][WARNIN] Errore: Pacchetto: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][DEBUG ] Si può provare ad usare --skip-broken per aggirare il problema [node1][WARNIN] Richiede: libleveldb.so.1()(64bit) [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][DEBUG ] Provare ad eseguire: rpm -Va --nofiles --nodigest [node1][ERROR ] RuntimeError: command returned non-zero exit status: 1 *[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y install ceph* I installed GIANT version not FIREFLY on admin-node. Is it a typo error in the config file or is it truly trying to install FIREFLY instead of GIANT. About the error, i see that it's related to wrong python default libraries. It seems that CEPH require libraries not available in the current distro: [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][WARNIN] Richiede: libleveldb.so.1()(64bit) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) This seems strange. Can you fix this? Thanks, Massimiliano Cuttini ___ ceph-users mailing listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install
I've captured this at http://tracker.ceph.com/issues/10133 On Tue, Nov 18, 2014 at 4:48 PM, Travis Rhoden trho...@gmail.com wrote: Hi Massimiliano, I just recreated this bug myself. Ceph-deploy is supposed to install EPEL automatically on the platforms that need it. I just confirmed that it is not doing so, and will be opening up a bug in the Ceph tracker. I'll paste it here when I do so you can follow it. Thanks for the report! - Travis On Tue, Nov 18, 2014 at 4:41 PM, Massimiliano Cuttini m...@phoenixweb.it wrote: I solved by installing EPEL repo on yum. I think that somebody should write down in the documentation that EPEL is mandatory Il 18/11/2014 14:29, Massimiliano Cuttini ha scritto: Dear all, i try to install ceph but i get errors: #ceph-deploy install node1 [] [ceph_deploy.install][DEBUG ] Installing stable version *firefly *on cluster ceph hosts node1 [ceph_deploy.install][DEBUG ] Detecting platform for host node1 ... [] [node1][DEBUG ] --- Pacchetto libXxf86vm.x86_64 0:1.1.3-2.1.el7 settato per essere installato [node1][DEBUG ] --- Pacchetto mesa-libgbm.x86_64 0:9.2.5-6.20131218.el7_0 settato per essere installato [node1][DEBUG ] --- Pacchetto mesa-libglapi.x86_64 0:9.2.5-6.20131218.el7_0 settato per essere installato [node1][DEBUG ] -- Risoluzione delle dipendenze completata [node1][WARNIN] Errore: Pacchetto: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][DEBUG ] Si può provare ad usare --skip-broken per aggirare il problema [node1][WARNIN] Richiede: libleveldb.so.1()(64bit) [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][DEBUG ] Provare ad eseguire: rpm -Va --nofiles --nodigest [node1][ERROR ] RuntimeError: command returned non-zero exit status: 1 *[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y install ceph* I installed GIANT version not FIREFLY on admin-node. Is it a typo error in the config file or is it truly trying to install FIREFLY instead of GIANT. About the error, i see that it's related to wrong python default libraries. It seems that CEPH require libraries not available in the current distro: [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][WARNIN] Richiede: libleveldb.so.1()(64bit) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) This seems strange. Can you fix this? Thanks, Massimiliano Cuttini ___ ceph-users mailing listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.80.4 Firefly released
Hi Andrija, I'm running a cluster with both CentOS and Ubuntu machines in it. I just did some upgrades to 0.80.4, and I can confirm that doing yum update ceph on the CentOS machine did result in having all OSDs on that machine restarted automatically. I actually did not know that would happen, as the CentOS machines were new additions (first update since deploying them with 0.80.1), and I'm used the Ubuntu behavior where I can update the package first, then reboot things at will. So yeah, that still happens with RPM. :/ On Wed, Jul 16, 2014 at 3:55 AM, Andrija Panic andrija.pa...@gmail.com wrote: Hi Sage, can anyone validate, if there is still bug inside RPMs that does automatic CEPH service restart after updating packages ? We are instructed to first update/restart MONs, and after that OSD - but that is impossible if we have MON+OSDs on same host...since the ceph is automaticaly restarted with YUM/RPM, but NOT automaticaly restarted on Ubuntu/Debian (as reported by some other list memeber...) Thanks On 16 July 2014 01:45, Sage Weil s...@inktank.com wrote: This Firefly point release fixes an potential data corruption problem when ceph-osd daemons run on top of XFS and service Firefly librbd clients. A recently added allocation hint that RBD utilizes triggers an XFS bug on some kernels (Linux 3.2, and likely others) that leads to data corruption and deep-scrub errors (and inconsistent PGs). This release avoids the situation by disabling the allocation hint until we can validate which kernels are affected and/or are known to be safe to use the hint on. We recommend that all v0.80.x Firefly users urgently upgrade, especially if they are using RBD. Notable Changes --- * osd: disable XFS extsize hint by default (#8830, Samuel Just) * rgw: fix extra data pool default name (Yehuda Sadeh) For more detailed information, see: http://ceph.com/docs/master/_downloads/v0.80.4.txt Getting Ceph * Git at git://github.com/ceph/ceph.git * Tarball at http://ceph.com/download/ceph-0.80.4.tar.gz * For packages, see http://ceph.com/docs/master/install/get-packages * For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] scrub error on firefly
I can also say that after a recent upgrade to Firefly, I have experienced massive uptick in scrub errors. The cluster was on cuttlefish for about a year, and had maybe one or two scrub errors. After upgrading to Firefly, we've probably seen 3 to 4 dozen in the last month or so (was getting 2-3 a day for a few weeks until the whole cluster was rescrubbed, it seemed). What I cannot determine, however, is how to know which object is busted? For example, just today I ran into a scrub error. The object has two copies and is an 8MB piece of an RBD, and has identical timestamps, identical xattrs names and values. But it definitely has a different MD5 sum. How to know which one is correct? I've been just kicking off pg repair each time, which seems to just use the primary copy to overwrite the others. Haven't run into any issues with that so far, but it does make me nervous. - Travis On Tue, Jul 8, 2014 at 1:06 AM, Gregory Farnum g...@inktank.com wrote: It's not very intuitive or easy to look at right now (there are plans from the recent developer summit to improve things), but the central log should have output about exactly what objects are busted. You'll then want to compare the copies manually to determine which ones are good or bad, get the good copy on the primary (make sure you preserve xattrs), and run repair. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Jul 7, 2014 at 6:48 PM, Randy Smith rbsm...@adams.edu wrote: Greetings, I upgraded to firefly last week and I suddenly received this error: health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors ceph health detail shows the following: HEALTH_ERR 1 pgs inconsistent; 1 scrub errors pg 3.c6 is active+clean+inconsistent, acting [2,5] 1 scrub errors The docs say that I can run `ceph pg repair 3.c6` to fix this. What I want to know is what are the risks of data loss if I run that command in this state and how can I mitigate them? -- Randall Smith Computing Services Adams State University http://www.adams.edu/ 719-587-7741 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] scrub error on firefly
And actually just to follow-up, it does seem like there are some additional smarts beyond just using the primary to overwrite the secondaries... Since I captured md5 sums before and after the repair, I can say that in this particular instance, the secondary copy was used to overwrite the primary. So, I'm just trusting Ceph to the right thing, and so far it seems to, but the comments here about needing to determine the correct object and place it on the primary PG make me wonder if I've been missing something. - Travis On Thu, Jul 10, 2014 at 10:19 AM, Travis Rhoden trho...@gmail.com wrote: I can also say that after a recent upgrade to Firefly, I have experienced massive uptick in scrub errors. The cluster was on cuttlefish for about a year, and had maybe one or two scrub errors. After upgrading to Firefly, we've probably seen 3 to 4 dozen in the last month or so (was getting 2-3 a day for a few weeks until the whole cluster was rescrubbed, it seemed). What I cannot determine, however, is how to know which object is busted? For example, just today I ran into a scrub error. The object has two copies and is an 8MB piece of an RBD, and has identical timestamps, identical xattrs names and values. But it definitely has a different MD5 sum. How to know which one is correct? I've been just kicking off pg repair each time, which seems to just use the primary copy to overwrite the others. Haven't run into any issues with that so far, but it does make me nervous. - Travis On Tue, Jul 8, 2014 at 1:06 AM, Gregory Farnum g...@inktank.com wrote: It's not very intuitive or easy to look at right now (there are plans from the recent developer summit to improve things), but the central log should have output about exactly what objects are busted. You'll then want to compare the copies manually to determine which ones are good or bad, get the good copy on the primary (make sure you preserve xattrs), and run repair. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Jul 7, 2014 at 6:48 PM, Randy Smith rbsm...@adams.edu wrote: Greetings, I upgraded to firefly last week and I suddenly received this error: health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors ceph health detail shows the following: HEALTH_ERR 1 pgs inconsistent; 1 scrub errors pg 3.c6 is active+clean+inconsistent, acting [2,5] 1 scrub errors The docs say that I can run `ceph pg repair 3.c6` to fix this. What I want to know is what are the risks of data loss if I run that command in this state and how can I mitigate them? -- Randall Smith Computing Services Adams State University http://www.adams.edu/ 719-587-7741 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 'osd pool set-quota' behaviour with CephFS
Hi George, I actually asked Sage about a similar scenario at the OpenStack summit in Atlanta this year -- namely if I could use the new pool quota functionality to enforce quotas on CephFS. The answer was no, that the pool quota functionality is mostly intended for radosgw and that the existing cephfs clients have no support for it. He said the quota should work, actually, but that you were likely to see some very strange behavior in cephfs. That sounds like what you've seen. It won't be a graceful failure at all. Quotas in cephfs is a different task, and one that I'm following as well. See here: https://github.com/ceph/ceph/pull/1122 The pull request is old, but Sage did mention he was in contact with the team working on the code and was hopeful to see it finished. - Travis On Tue, Jun 24, 2014 at 7:06 AM, george.ry...@stfc.ac.uk wrote: Last week I decided to take a look at the ‘osd pool set-quota’ option. I have a directory in cephFS that uses a pool called pool-2 (configured by following this: http://www.sebastien-han.fr/blog/2013/02/11/mount-a-specific-pool-with-cephfs/). I have a directory in that filled with cat pictures. I ran ‘rados df’. I then copied a couple more cat pictures into my directory using ‘cp file destination sync’. I then ran ‘rados df’ again, this showed an increase in the object count for the pool equal to the number of additional cat pictures and an increase in the pool size equal to the size of the cat pictures, as expected. I then used the command ‘ceph osd pool set-quota {pool-name} [max_objects {obj-count}] [max_bytes {bytes}]’, as per http://ceph.com/docs/master/rados/operations/pools/, and set an object limit a couple of objects bigger than the current pool size. I then ran a loop copying more cat pictures one at a time (again with ‘ sync’) each time. Whilst doing this I ran ‘rados df’, the number of objects in the pool increased up to the limit and stopped. However on the machine copying the cat pictures, the copying appeared to work fine and running ls showed more pictures than the ‘rados df’ command would suggest should be there. If I accessed the same directory from a different machine, then I saw only the pictures that were copied up to the limit. If I then removed the limit, the images would appear in the directory and ‘rados df’ would report a larger number of objects. Similar behaviour was observed when setting a size limit. What’s going on? Is this expected behaviour? George Ryall Scientific Computing | STFC Rutherford Appleton Laboratory | Harwell Oxford | Didcot | OX11 0QX (01235 44) 5021 -- Scanned by iCritical. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Multiple L2 LAN segments with Ceph
Hi folks, Does anybody know if there are any issues running Ceph with multiple L2 LAN segements? I'm picturing a large multi-rack/multi-row deployment where you may give each rack (or row) it's own L2 segment, then connect them all with L3/ECMP in a leaf-spine architecture. I'm wondering how cluster_network (or public_network) in ceph.conf works in this case. Does that directive just tell a daemon starting on a particular node which network to bind to? Or is a CIDR that has to be accurate for every OSD and MON in the entire cluster? Thanks, - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Multiple L2 LAN segments with Ceph
Thanks to you all! You confirmed everything I thought I knew, but it is nice to be sure! On Wed, May 28, 2014 at 1:37 PM, Mike Dawson mike.daw...@cloudapt.comwrote: Travis, We run a routed ECMP spine-leaf network architecture with Ceph and have no issues on the network side whatsoever. Each leaf switch has an L2 cidr block inside a common L3 supernet. We do not currently split cluster_network and public_network. If we did, we'd likely build a separate spine-leaf network with it's own L3 supernet. A simple IPv4 example: - ceph-cluster: 10.1.0.0/16 - cluster-leaf1: 10.1.1.0/24 - node1: 10.1.1.1/24 - node2: 10.1.1.2/24 - cluster-leaf2: 10.1.2.0/24 - ceph-public: 10.2.0.0/16 - public-leaf1: 10.2.1.0/24 - node1: 10.2.1.1/24 - node2: 10.2.1.2/24 - public-leaf2: 10.2.2.0/24 ceph.conf would be: cluster_network: 10.1.0.0/255.255.0.0 public_network: 10.2.0.0/255.255.0.0 - Mike Dawson On 5/28/2014 1:01 PM, Travis Rhoden wrote: Hi folks, Does anybody know if there are any issues running Ceph with multiple L2 LAN segements? I'm picturing a large multi-rack/multi-row deployment where you may give each rack (or row) it's own L2 segment, then connect them all with L3/ECMP in a leaf-spine architecture. I'm wondering how cluster_network (or public_network) in ceph.conf works in this case. Does that directive just tell a daemon starting on a particular node which network to bind to? Or is a CIDR that has to be accurate for every OSD and MON in the entire cluster? Thanks, - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph cinder compute-nodes
You can define the UUID in the secret.xml file. That way you can generate one yourself, or let it autogenerate the first one for you and then use the same one on all the other compute nodes. In the Ceph docs, it actually generates one using uuidgen, then puts that UUID in the secret.xml file itself. See the very last part here: http://ceph.com/docs/master/rbd/rbd-openstack/#setup-ceph-client-authentication Hope that's clear. It's definitely possible (and pretty much required) to have all the nodes have the same secret UUID -- as you saw, you only put one UUID into cinder. - Travis On Sat, May 24, 2014 at 9:39 AM, 10 minus t10te...@gmail.com wrote: Hi, I went through the docs fo setting up cinder with ceph. from the docs - I have to perform on every compute node virsh secret-define --file secret.xml The issue I see is that I have to perform this on 5 compute nodes and on cinder it expects to have only one rbd_secret_uuid= uuid as the former command will generate 5 uuids . How can I pass 5 uuids to cinder Cheers ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Red Hat to acquire Inktank
Sage, Congrats to you and Inktank! - Travis On Wed, Apr 30, 2014 at 9:27 AM, Haomai Wang haomaiw...@gmail.com wrote: Congratulation! On Wed, Apr 30, 2014 at 8:18 PM, Sage Weil s...@inktank.com wrote: Today we are announcing some very big news: Red Hat is acquiring Inktank. We are very excited about what this means for Ceph, the community, the team, our partners, and our customers. Ceph has come a long way in the ten years since the first line of code has been written, particularly over the last two years that Inktank has been focused on its development. The fifty members of the Inktank team, our partners, and the hundreds of other contributors have done amazing work in bringing us to where we are today. We believe that, as part of Red Hat, the Inktank team will be able to build a better quality Ceph storage platform that will benefit the entire ecosystem. Red Hat brings a broad base of expertise in building and delivering hardened software stacks as well as a wealth of resources that will help Ceph become the transformative and ubiquitous storage platform that we always believed it could be. For existing Inktank customers, this is going to mean turning a reliable and robust storage system into something that delivers even more value. In particular, joining forces with the Red Hat team will improve our ability to address problems at all layers of the storage stack, including in the kernel. We naturally recognize that many customers and users have built platforms based on other Linux distributions. We will continue to support these installations while we determine how to provide the best customer experience moving forward and how the next iteration of the enterprise Ceph product will be structured. In the meantime, our team remains committed to keeping Ceph an open, multiplatform project that works in any environment where it makes sense, including other Linux distributions and non-Linux operating systems. Red Hat is one of only a handful of companies that I trust to steward the Ceph project. When we started Inktank two years ago, our goal was to build the business by making Ceph successful as a broad-based, collaborative open source project with a vibrant user, developer, and commercial community. Red Hat shares this vision. They are passionate about open source, and have demonstrated that they are strong and fair stewards with other critical projects (like KVM). Red Hat intends to administer the Ceph trademark in a manner that protects the ecosystem as a whole and creates a level playing field where everyone is held to the same standards of use. Similarly, policies like upstream first ensure that bug fixes and improvements that go into Ceph-derived products are always shared with the community to streamline development and benefit all members of the ecosystem. One important change that will take place involves Inktank's product strategy, in which some add-on software we have developed is proprietary. In contrast, Red Hat favors a pure open source model. That means that Calamari, the monitoring and diagnostics tool that Inktank has developed as part of the Inktank Ceph Enterprise product, will soon be open sourced. This is a big step forward for the Ceph community. Very little will change on day one as it will take some time to integrate the Inktank business and for any significant changes to happen with our engineering activities. However, we are very excited about what is coming next for Ceph and are looking forward to this new chapter. I'd like to thank everyone who has helped Ceph get to where we are today: the amazing research group at UCSC where it began, DreamHost for supporting us for so many years, the incredible Inktank team, and the many contributors and users that have helped shape the system. We continue to believe that robust, scalable, and completely open storage platforms like Ceph will transform a storage industry that is still dominated by proprietary systems. Let's make it happen! sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Best Regards, Wheat ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] packages for Trusty
Thanks guys. I don't know why I didn't try that. I guess just too much habit of setting up the additional repo. =) On Fri, Apr 25, 2014 at 4:09 PM, Cédric Lemarchand c.lemarch...@yipikai.org wrote: Yes, juste apt-get install ceph ;-) Cheers -- Cédric Lemarchand Le 25 avr. 2014 à 21:07, Drew Weaver drew.wea...@thenap.com a écrit : You can actually just install it using the Ubuntu packages. I did it yesterday on Trusty. Thanks, -Drew *From:* ceph-users-boun...@lists.ceph.com [ mailto:ceph-users-boun...@lists.ceph.comceph-users-boun...@lists.ceph.com] *On Behalf Of *Travis Rhoden *Sent:* Friday, April 25, 2014 3:06 PM *To:* ceph-users *Subject:* [ceph-users] packages for Trusty Are there packages for Trusty being built yet? I don't see it listed at http://ceph.com/debian-emperor/dists/ Thanks, - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephx key for CephFS access only
Thanks for the response Greg. Unfortunately, I appear to be missing something. If I use my cephfs key with these perms: client.cephfs key: redacted caps: [mds] allow rwx caps: [mon] allow r caps: [osd] allow rwx pool=data This is what happens when I mount: # ceph-fuse -k /etc/ceph/ceph.client.cephfs.keyring -m ceph0-10g /data ceph-fuse[13533]: starting ceph client ceph-fuse[13533]: ceph mount failed with (1) Operation not permitted ceph-fuse[13531]: mount failed: (1) Operation not permitted But using the admin key works just fine: # ceph-fuse -k /etc/ceph/ceph.client.admin.keyring -m ceph0-10g /data ceph-fuse[13548]: starting ceph client ceph-fuse[13548]: starting fuse The admin key as the following perms: client.admin key: redacted caps: [mds] allow caps: [mon] allow * caps: [osd] allow * Since the mds permissions are functionally equivalent, either I need extra rights on the monitor, or the OSDs. Does a client need to access the metadata pool in order to do a CephFS mount? I'll experiment a bit and report back. On Mon, Mar 31, 2014 at 1:36 PM, Gregory Farnum g...@inktank.com wrote: At present, the only security permission on the MDS is allowed to do stuff, so rwx and * are synonymous. In general * means is an admin, though, so you'll be happier in the future if you use rwx. You may also want a more restrictive set of monitor capabilities as somebody else recently pointed out, but [3] will give you the filesystem access you're looking for. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Fri, Mar 28, 2014 at 9:40 AM, Travis Rhoden trho...@gmail.com wrote: Hi Folks, What would be the right set of capabilities to set for a new client key that has access to CephFS only? I've seen a few different examples: [1] mds 'allow *' mon 'allow r' osd 'allow rwx pool=data' [2] mon 'allow r' osd 'allow rwx pool=data' [3] mds 'allow rwx' mon 'allow r' osd 'allow rwx pool=data' I'm inclined to go with [3]. [1] seems weird for using *, I like seeing rwx. Are these synonymous? [2] seems wrong because it doesn't include anything for MDS. - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephx key for CephFS access only
Ah, I figured it out. My original key worked, but I needed to use the --id option with ceph-fuse to tell it to use the cephfs user rather than the admin user. Tailing the log on my monitor pointed out that it was logging in with client.admin, but providing the key for client.cephfs. So, final working command is: ceph-fuse -k /etc/ceph/ceph.client.cephfs.keyring --id cephfs -m ceph0-10g /data I will note that neither the -k or --id options are present in man ceph-fuse, ceph-fuse --help, or in the Ceph docs, really. An example using -k is found here: http://ceph.com/docs/master/start/quick-cephfs/#filesystem-in-user-space-fuse, but there is never any mention of needing to change users if you are not using client.admin. In fact, using the search functionality on ceph-fuse returns zero results. If I'm ambitious I'll submit changes for the docs... Thanks for the help! - Travis On Wed, Apr 2, 2014 at 12:00 PM, Travis Rhoden trho...@gmail.com wrote: Thanks for the response Greg. Unfortunately, I appear to be missing something. If I use my cephfs key with these perms: client.cephfs key: redacted caps: [mds] allow rwx caps: [mon] allow r caps: [osd] allow rwx pool=data This is what happens when I mount: # ceph-fuse -k /etc/ceph/ceph.client.cephfs.keyring -m ceph0-10g /data ceph-fuse[13533]: starting ceph client ceph-fuse[13533]: ceph mount failed with (1) Operation not permitted ceph-fuse[13531]: mount failed: (1) Operation not permitted But using the admin key works just fine: # ceph-fuse -k /etc/ceph/ceph.client.admin.keyring -m ceph0-10g /data ceph-fuse[13548]: starting ceph client ceph-fuse[13548]: starting fuse The admin key as the following perms: client.admin key: redacted caps: [mds] allow caps: [mon] allow * caps: [osd] allow * Since the mds permissions are functionally equivalent, either I need extra rights on the monitor, or the OSDs. Does a client need to access the metadata pool in order to do a CephFS mount? I'll experiment a bit and report back. On Mon, Mar 31, 2014 at 1:36 PM, Gregory Farnum g...@inktank.com wrote: At present, the only security permission on the MDS is allowed to do stuff, so rwx and * are synonymous. In general * means is an admin, though, so you'll be happier in the future if you use rwx. You may also want a more restrictive set of monitor capabilities as somebody else recently pointed out, but [3] will give you the filesystem access you're looking for. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Fri, Mar 28, 2014 at 9:40 AM, Travis Rhoden trho...@gmail.com wrote: Hi Folks, What would be the right set of capabilities to set for a new client key that has access to CephFS only? I've seen a few different examples: [1] mds 'allow *' mon 'allow r' osd 'allow rwx pool=data' [2] mon 'allow r' osd 'allow rwx pool=data' [3] mds 'allow rwx' mon 'allow r' osd 'allow rwx pool=data' I'm inclined to go with [3]. [1] seems weird for using *, I like seeing rwx. Are these synonymous? [2] seems wrong because it doesn't include anything for MDS. - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Monitors stuck in electing
Hello, I just deployed a new Emperor cluster using ceph-deploy 1.4. All went very smooth, until I rebooted all the nodes. After reboot, the monitors no longer form a quorum. I followed the troubleshooting steps here: http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/ Specifically, Im in the stat described in this section: http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#most-common-monitor-issues The state for all the monitors is electing. The docs say this is most likely clock skew, but I do have all nodes synch'd with NTP. I've confirmed this multiple times. I've also confirmed the monitors can reach each other (by telneting to IP:PORT, and I can see established connections via netstat). I'm baffled. here is a sample mon_status output: root@ceph0:~# ceph daemon mon.ceph0 quorum_status { election_epoch: 31, quorum: [], quorum_names: [], quorum_leader_name: , monmap: { epoch: 2, fsid: XXX, (redacted) modified: 2014-03-24 14:35:22.332646, created: 0.00, mons: [ { rank: 0, name: ceph0, addr: 10.10.30.0:6789\/0}, { rank: 1, name: ceph1, addr: 10.10.30.1:6789\/0}, { rank: 2, name: ceph2, addr: 10.10.30.2:6789\/0}]}} They all look identical to that. Any ideas what I can look at besides NTP? The docs really stress that it should be clock skew, so I'll keep looking at that... - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Monitors stuck in electing
Just to emphasize that I don't think it's clock skew, here is the NTP state of all three monitors: # ansible ceph_mons -m command -a ntpq -p -kK SSH password: sudo password [defaults to SSH password]: ceph0 | success | rc=0 remote refid st t when poll reach delay offset jitter == *controller-10g 198.60.73.8 2 u 43 64 3770.2360.057 0.097 ceph1 | success | rc=0 remote refid st t when poll reach delay offset jitter == *controller-10g 198.60.73.8 2 u 39 64 3770.2730.035 0.064 ceph2 | success | rc=0 remote refid st t when poll reach delay offset jitter == *controller-10g 198.60.73.8 2 u 30 64 3770.201 -0.063 0.063 I think they are pretty well in synch. - Travis On Tue, Mar 25, 2014 at 11:09 AM, Travis Rhoden trho...@gmail.com wrote: Hello, I just deployed a new Emperor cluster using ceph-deploy 1.4. All went very smooth, until I rebooted all the nodes. After reboot, the monitors no longer form a quorum. I followed the troubleshooting steps here: http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/ Specifically, Im in the stat described in this section: http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#most-common-monitor-issues The state for all the monitors is electing. The docs say this is most likely clock skew, but I do have all nodes synch'd with NTP. I've confirmed this multiple times. I've also confirmed the monitors can reach each other (by telneting to IP:PORT, and I can see established connections via netstat). I'm baffled. here is a sample mon_status output: root@ceph0:~# ceph daemon mon.ceph0 quorum_status { election_epoch: 31, quorum: [], quorum_names: [], quorum_leader_name: , monmap: { epoch: 2, fsid: XXX, (redacted) modified: 2014-03-24 14:35:22.332646, created: 0.00, mons: [ { rank: 0, name: ceph0, addr: 10.10.30.0:6789\/0}, { rank: 1, name: ceph1, addr: 10.10.30.1:6789\/0}, { rank: 2, name: ceph2, addr: 10.10.30.2:6789\/0}]}} They all look identical to that. Any ideas what I can look at besides NTP? The docs really stress that it should be clock skew, so I'll keep looking at that... - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Monitors stuck in electing
Well since I spammed the list earlier, I should fess up to my mistakes. I forgot to change MTU sizes on the 10G switch after I switched to jumbo frames. So yes, I had a very unhappy networking stack. On the upside, playing with Cumulus Linux on switches is fun. On Tue, Mar 25, 2014 at 1:12 PM, Travis Rhoden trho...@gmail.com wrote: Thanks for the feedback -- I'll post back with more detailed logs if anything looks fishy! On Tue, Mar 25, 2014 at 1:10 PM, Gregory Farnum g...@inktank.com wrote: Well, you could try running with messenger debugging cranked all the way up and see if there's something odd happening there (eg, not handling incoming messages), but based on not having any other reports of this, I think your networking stack is unhappy in some way. *shrug* (Higher log levels showing what the individual pipes are doing will narrow it down on the Ceph side.) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Mar 25, 2014 at 10:05 AM, Travis Rhoden trho...@gmail.com wrote: On Tue, Mar 25, 2014 at 12:53 PM, Gregory Farnum g...@inktank.com wrote: On Tue, Mar 25, 2014 at 9:24 AM, Travis Rhoden trho...@gmail.com wrote: Okay, last one until I get some guidance. Sorry for the spam, but wanted to paint a full picture. Here are debug logs from all three mons, capturing what looks like an election sequence to me: ceph0: 2014-03-25 16:17:24.324846 7fa5c53fc700 5 mon.ceph0@0(electing).elector(35) start -- can i be leader? 2014-03-25 16:17:24.324900 7fa5c53fc700 1 mon.ceph0@0(electing).elector(35) init, last seen epoch 35 2014-03-25 16:17:24.324913 7fa5c53fc700 1 -- 10.10.30.0:6789/0 -- mon.1 10.10.30.1:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951 propose 35) v4 -- ?+0 0x263d480 2014-03-25 16:17:24.324948 7fa5c53fc700 1 -- 10.10.30.0:6789/0 -- mon.2 10.10.30.2:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951 propose 35) v4 -- ?+0 0x263d6c0 2014-03-25 16:17:25.353975 7fa5c4bfb700 1 -- 10.10.30.0:6789/0 == mon.2 10.10.30.2:6789/0 493 election(b3f38955-4321-4850-9ddb-3b09940dc951 propose 35) v4 537+0+0 (4036841703 0 0) 0x265fd80 con 0x1df0c60 2014-03-25 16:17:25.354042 7fa5c4bfb700 5 mon.ceph0@0(electing).elector(35) handle_propose from mon.2 2014-03-25 16:17:29.325107 7fa5c53fc700 5 mon.ceph0@0(electing).elector(35) election timer expired ceph1: 2014-03-25 16:17:24.325529 7ffe48cc1700 5 mon.ceph1@1(electing).elector(35) handle_propose from mon.0 2014-03-25 16:17:24.325535 7ffe48cc1700 5 mon.ceph1@1(electing).elector(35) defer to 0 2014-03-25 16:17:24.325546 7ffe48cc1700 1 -- 10.10.30.1:6789/0 -- mon.0 10.10.30.0:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951 ack 35) v4 -- ?+0 0x1bbfb40 2014-03-25 16:17:25.354038 7ffe48cc1700 1 -- 10.10.30.1:6789/0 == mon.2 10.10.30.2:6789/0 489 election(b3f38955-4321-4850-9ddb-3b09940dc951 propose 35) v4 537+0+0 (4036841703 0 0) 0x1bbf6c0 con 0x14d9b00 2014-03-25 16:17:25.354102 7ffe48cc1700 5 mon.ceph1@1(electing).elector(35) handle_propose from mon.2 2014-03-25 16:17:25.354113 7ffe48cc1700 5 mon.ceph1@1(electing).elector(35) no, we already acked 0 ceph2: 2014-03-25 16:17:20.353135 7f80d0013700 5 mon.ceph2@2(electing).elector(35) election timer expired 2014-03-25 16:17:20.353154 7f80d0013700 5 mon.ceph2@2(electing).elector(35) start -- can i be leader? 2014-03-25 16:17:20.353225 7f80d0013700 1 mon.ceph2@2(electing).elector(35) init, last seen epoch 35 2014-03-25 16:17:20.353238 7f80d0013700 1 -- 10.10.30.2:6789/0 -- mon.0 10.10.30.0:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951 propose 35) v4 -- ?+0 0x18e7900 2014-03-25 16:17:20.353272 7f80d0013700 1 -- 10.10.30.2:6789/0 -- mon.1 10.10.30.1:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951 propose 35) v4 -- ?+0 0x18e7d80 2014-03-25 16:17:25.353559 7f80d0013700 5 mon.ceph2@2(electing).elector(35) election timer expired 2014-03-25 16:17:25.353578 7f80d0013700 5 mon.ceph2@2(electing).elector(35) start -- can i be leader? 2014-03-25 16:17:25.353647 7f80d0013700 1 mon.ceph2@2(electing).elector(35) init, last seen epoch 35 2014-03-25 16:17:25.353660 7f80d0013700 1 -- 10.10.30.2:6789/0 -- mon.0 10.10.30.0:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951 propose 35) v4 -- ?+0 0x19b7240 2014-03-25 16:17:25.353695 7f80d0013700 1 -- 10.10.30.2:6789/0 -- mon.1 10.10.30.1:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951 propose 35) v4 -- ?+0 0x19b76c0 2014-03-25 16:17:30.354040 7f80d0013700 5 mon.ceph2@2(electing).elector(35) election timer expired Oddly, it looks to me like mon.2 (ceph2) never handles/receives the proposal from mon.0 (ceph0). But I admit I have no clue how
Re: [ceph-users] ceph-deploy, single mon not in quorum
HI Mordur, I'm definitely straining my memory on this one, but happy to help if I can? I'm pretty sure I did not figure it out -- you can see I didn't get any feedback from the list. What I did do, however, was uninstall everything and try the same setup with mkcephfs, which worked fine at the time. This was 8 months ago, though, and I have since used ceph-deploy many times with great success. I am not sure if I have ever tried a similar set up, though, with just one node and one monitor. Fortuitiously, I may be trying that very setup today or tomorrow. If I still have issues, I will be sure to post them here. Are you using both the latest ceph-deploy and the latest Ceph packages (Emperor or newer dev packages)? There have been lots of changes in the monitor area, including in the upstart scripts, that made many things more robust in this area. I did have a cluster a few months ago that had a flaky monitor that refused to join quorum after install, and I had to just blow it away and re-install/deploy it and then it was fine, which I thought was odd. Sorry that's probably not much help. - Travis On Thu, Jan 9, 2014 at 12:40 AM, Mordur Ingolfsson r...@1984.is wrote: Hi Travis, Did you figure this out? I'm dealing with exactly the same thing over here. Best, Moe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy, single mon not in quorum
On Thu, Jan 9, 2014 at 9:48 AM, Alfredo Deza alfredo.d...@inktank.com wrote: On Thu, Jan 9, 2014 at 9:45 AM, Travis Rhoden trho...@gmail.com wrote: HI Mordur, I'm definitely straining my memory on this one, but happy to help if I can? I'm pretty sure I did not figure it out -- you can see I didn't get any feedback from the list. What I did do, however, was uninstall everything and try the same setup with mkcephfs, which worked fine at the time. This was 8 months ago, though, and I have since used ceph-deploy many times with great success. I am not sure if I have ever tried a similar set up, though, with just one node and one monitor. Fortuitiously, I may be trying that very setup today or tomorrow. If I still have issues, I will be sure to post them here. Are you using both the latest ceph-deploy and the latest Ceph packages (Emperor or newer dev packages)? There have been lots of changes in the monitor area, including in the upstart scripts, that made many things more robust in this area. I did have a cluster a few months ago that had a flaky monitor that refused to join quorum after install, and I had to just blow it away and re-install/deploy it and then it was fine, which I thought was odd. Sorry that's probably not much help. - Travis On Thu, Jan 9, 2014 at 12:40 AM, Mordur Ingolfsson r...@1984.is wrote: Hi Travis, Did you figure this out? I'm dealing with exactly the same thing over here. Can you share what exactly you are having problems with? ceph-deploy's log output has been much improved and it is super useful to have that when dealing with possible issues. I do not, it was long long ago... And it case it was ambiguous, let me explicitly say I was not recommending the use of mkcephfs at all (is that even still possible?). ceph-deploy is certainly the tool to use. Best, Moe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Multiple kernel RBD clients failures
Eric, Yeah, your OSD weights are a little crazy... For example, looking at one host from your output of ceph osd tree... -3 31.5host tca23 1 3.63osd.1 up 1 7 0.26osd.7 up 1 13 2.72osd.13 up 1 19 2.72osd.19 up 1 25 0.26osd.25 up 1 31 3.63osd.31 up 1 37 2.72osd.37 up 1 43 0.26osd.43 up 1 49 3.63osd.49 up 1 55 0.26osd.55 up 1 61 3.63osd.61 up 1 67 0.26osd.67 up 1 73 3.63osd.73 up 1 79 0.26osd.79 up 1 85 3.63osd.85 up 1 osd.7 is set to 0.26, with others set to 3. Under normal circumstances, the rule of thumb would be to set weights equal to the disk size in TB. So, a 2TB disk would have a weight of 2, a 1.5TB disk == 1.5, etc. These weights control what proportion of data is directed to each OSD. I'm guessing you do have very different size disks, though, as it looks like the disk that are reporting near full all have relatively small weights (OSD 43 is at 91%, weight = 0.26). Is this really a 260GB disk? A mix of HDD and SSDs? or maybe just a small partition? Either way, you probably have something wrong with the weights. I'd look into that. Having a single pool made of disks of such varied size may not be a good option, but I'm not sure if that's your setup or not. To the best of my knowledge, Ceph halts IO operations when any disk reaches the near full scenario (85% by default). I'm not 100% certain on that one, but I believe that is true. Hope that helps, - Travis On Tue, Oct 1, 2013 at 2:51 AM, Yan, Zheng uker...@gmail.com wrote: On Mon, Sep 30, 2013 at 11:50 PM, Eric Eastman eri...@aol.com wrote: Thank you for the reply -28 == -ENOSPC (No space left on device). I think it's is due to the fact that some osds are near full. Yan, Zheng I thought that may be the case, but I would expect that ceph health would tell me I had a full OSDs, but it is only saying they are near full: # ceph health detail HEALTH_WARN 9 near full osd(s) osd.9 is near full at 85% osd.29 is near full at 85% osd.43 is near full at 91% osd.45 is near full at 88% osd.47 is near full at 88% osd.55 is near full at 94% osd.59 is near full at 94% osd.67 is near full at 94% osd.83 is near full at 94% Are these OSD's disks smaller than other OSD's. If they do, you need to lower these OSD's weights. Regards Yan, Zheng As I still have lots of space: # ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 249T 118T 131T 52.60 POOLS: NAME ID USED %USED OBJECTS data 0 0 0 0 metadata 1 0 0 0 rbd 2 8 0 1 rbd-pool3 67187G 26.30 17713336 And I setup lots of Placement Groups: # ceph osd dump | grep 'rep size' | grep rbd-pool pool 3 'rbd-pool' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 4500 pgp_num 4500 last_change 360 owner 0 Why did the OSDs fill up long before I ran out of space? Thanks, Eric ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RBD Snap removal priority
Hello everyone, I'm running a Cuttlefish cluster that hosts a lot of RBDs. I recently removed a snapshot of a large one (rbd snap rm -- 12TB), and I noticed that all of the clients had markedly decreased performance. Looking at iostat on the OSD nodes had most disks pegged at 100% util. I know there are thread priorities that can be set for clients vs recovery, but I'm not sure what deleting a snapshot falls under. I couldn't really find anything relevant. Is there anything I can tweak to lower the priority of such an operation? I didn't need it to complete fast, as rbd snap rm returns immediately and the actual deletion is done asynchronously. I'd be fine with it taking longer at a lower priority, but as it stands now it brings my cluster to a crawl and is causing issues with several VMs. I see an osd snap trim thread timeout option in the docs -- Is the operation occuring here what you would call snap trimming? If so, any chance of adding an option for osd snap trim priority just like there is for osd client op and osd recovery op? Hope what I am saying makes sense... - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD Snap removal priority
Hi Mike, Thanks for the info. I had seem some of the previous reports of reduced performance during various recovery tasks (and certainly experienced them) but you summarized them all quite nicely. Yes, I'm running XFS on the OSDs. I checked fragmentation on a few of my OSDs -- all came back ~38% (better than I thought!). - Travis On Fri, Sep 27, 2013 at 2:05 PM, Mike Dawson mike.daw...@cloudapt.com wrote: [cc ceph-devel] Travis, RBD doesn't behave well when Ceph maintainance operations create spindle contention (i.e. 100% util from iostat). More about that below. Do you run XFS under your OSDs? If so, can you check for extent fragmentation? Should be something like: xfs_db -c frag -r /dev/sdb1 We recently saw a fragmentation factors of over 80%, with lots of ino's having hundreds of extents. After 24 hours+ of defrag'ing, we got it under control, but we're seeing the fragmentation factor grow by ~1.5% daily. We experienced spindle contention issues even after the defrag. Sage, Sam, etc, I think the real issue is Ceph has several states where it performs what I would call maintanance operations that saturate the underlying storage without properly yielding to client i/o (which should have a higher priority). I have experienced or seen reports of Ceph maintainance affecting rbd client i/o in many ways: - QEMU/RBD Client I/O Stalls or Halts Due to Spindle Contention from Ceph Maintainance [1] - Recovery and/or Backfill Cause QEMU/RBD Reads to Hang [2] - rbd snap rm (Travis' report below) [1] http://tracker.ceph.com/issues/6278 [2] http://tracker.ceph.com/issues/6333 I think this family of issues speak to the need for Ceph to have more visibility into the underlying storage's limitations (especially spindle contention) when performing known expensive maintainance operations. Thanks, Mike Dawson On 9/27/2013 12:25 PM, Travis Rhoden wrote: Hello everyone, I'm running a Cuttlefish cluster that hosts a lot of RBDs. I recently removed a snapshot of a large one (rbd snap rm -- 12TB), and I noticed that all of the clients had markedly decreased performance. Looking at iostat on the OSD nodes had most disks pegged at 100% util. I know there are thread priorities that can be set for clients vs recovery, but I'm not sure what deleting a snapshot falls under. I couldn't really find anything relevant. Is there anything I can tweak to lower the priority of such an operation? I didn't need it to complete fast, as rbd snap rm returns immediately and the actual deletion is done asynchronously. I'd be fine with it taking longer at a lower priority, but as it stands now it brings my cluster to a crawl and is causing issues with several VMs. I see an osd snap trim thread timeout option in the docs -- Is the operation occuring here what you would call snap trimming? If so, any chance of adding an option for osd snap trim priority just like there is for osd client op and osd recovery op? Hope what I am saying makes sense... - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Scaling RBD module
This noshare option may have just helped me a ton -- I sure wish I would have asked similar questions sooner, because I have seen the same failure to scale. =) One question -- when using the noshare option (or really, even without it) are there any practical limits on the number of RBDs that can be mounted? I have servers with ~100 RBDs on them each, and am wondering if I switch them all over to using noshare if anything is going to blow up, use a ton more memory, etc. Even without noshare, are there any known limits to how many RBDs can be mapped? Thanks! - Travis On Thu, Sep 19, 2013 at 8:03 PM, Somnath Roy somnath@sandisk.comwrote: Thanks Josh ! I am able to successfully add this noshare option in the image mapping now. Looking at dmesg output, I found that was indeed the secret key problem. Block performance is scaling now. Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto: ceph-devel-ow...@vger.kernel.org] On Behalf Of Josh Durgin Sent: Thursday, September 19, 2013 12:24 PM To: Somnath Roy Cc: Sage Weil; ceph-de...@vger.kernel.org; Anirban Ray; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Scaling RBD module On 09/19/2013 12:04 PM, Somnath Roy wrote: Hi Josh, Thanks for the information. I am trying to add the following but hitting some permission issue. root@emsclient:/etc# echo mon-1:6789,mon-2:6789,mon-3:6789 name=admin,key=client.admin,noshare test_rbd ceph_block_test' /sys/bus/rbd/add -bash: echo: write error: Operation not permitted If you check dmesg, it will probably show an error trying to authenticate to the cluster. Instead of key=client.admin, you can pass the base64 secret value as shown in 'ceph auth list' with the secret=X option. BTW, there's a ticket for adding the noshare option to rbd map so using the sysfs interface like this is never necessary: http://tracker.ceph.com/issues/6264 Josh Here is the contents of rbd directory.. root@emsclient:/sys/bus/rbd# ll total 0 drwxr-xr-x 4 root root0 Sep 19 11:59 ./ drwxr-xr-x 30 root root0 Sep 13 11:41 ../ --w--- 1 root root 4096 Sep 19 11:59 add drwxr-xr-x 2 root root0 Sep 19 12:03 devices/ drwxr-xr-x 2 root root0 Sep 19 12:03 drivers/ -rw-r--r-- 1 root root 4096 Sep 19 12:03 drivers_autoprobe --w--- 1 root root 4096 Sep 19 12:03 drivers_probe --w--- 1 root root 4096 Sep 19 12:03 remove --w--- 1 root root 4096 Sep 19 11:59 uevent I checked even if I am logged in as root , I can't write anything on /sys. Here is the Ubuntu version I am using.. root@emsclient:/etc# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description:Ubuntu 13.04 Release:13.04 Codename: raring Here is the mount information root@emsclient:/etc# mount /dev/mapper/emsclient--vg-root on / type ext4 (rw,errors=remount-ro) proc on /proc type proc (rw,noexec,nosuid,nodev) sysfs on /sys type sysfs (rw,noexec,nosuid,nodev) none on /sys/fs/cgroup type tmpfs (rw) none on /sys/fs/fuse/connections type fusectl (rw) none on /sys/kernel/debug type debugfs (rw) none on /sys/kernel/security type securityfs (rw) udev on /dev type devtmpfs (rw,mode=0755) devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620) tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755) none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880) none on /run/shm type tmpfs (rw,nosuid,nodev) none on /run/user type tmpfs (rw,noexec,nosuid,nodev,size=104857600,mode=0755) /dev/sda1 on /boot type ext2 (rw) /dev/mapper/emsclient--vg-home on /home type ext4 (rw) Any idea what went wrong here ? Thanks Regards Somnath -Original Message- From: Josh Durgin [mailto:josh.dur...@inktank.com] Sent: Wednesday, September 18, 2013 6:10 PM To: Somnath Roy Cc: Sage Weil; ceph-de...@vger.kernel.org; Anirban Ray; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Scaling RBD module On 09/17/2013 03:30 PM, Somnath Roy wrote: Hi, I am running Ceph on a 3 node cluster and each of my server node is running 10 OSDs, one for each disk. I have one admin node and all the nodes are connected with 2 X 10G network. One network is for cluster and other one configured as public network. Here is the status of my cluster. ~/fio_test# ceph -s cluster b2e0b4db-6342-490e-9c28-0aadf0188023 health HEALTH_WARN clock skew detected on mon. server-name-2, mon. server-name-3 monmap e1: 3 mons at {server-name-1=xxx.xxx.xxx.xxx:6789/0, server-name-2=xxx.xxx.xxx.xxx:6789/0, server-name-3=xxx.xxx.xxx.xxx:6789/0}, election epoch 64, quorum 0,1,2 server-name-1,server-name-2,server-name-3 osdmap e391: 30 osds: 30 up, 30 in pgmap v5202: 30912 pgs: 30912 active+clean; 8494 MB data, 27912 MB used, 11145 GB / 11172 GB avail mdsmap e1: 0/0/1 up I
Re: [ceph-users] Scaling RBD module
On Tue, Sep 24, 2013 at 5:16 PM, Sage Weil s...@inktank.com wrote: On Tue, 24 Sep 2013, Travis Rhoden wrote: This noshare option may have just helped me a ton -- I sure wish I would have asked similar questions sooner, because I have seen the same failure to scale. =) One question -- when using the noshare option (or really, even without it) are there any practical limits on the number of RBDs that can be mounted? I have servers with ~100 RBDs on them each, and am wondering if I switch them all over to using noshare if anything is going to blow up, use a ton more memory, etc. Even without noshare, are there any known limits to how many RBDs can be mapped? With noshare each mapped image will appear as a separate client instance, which means it will have it's own session with teh monitors and own TCP connections to the OSDs. It may be a viable workaround for now but in general I would not recommend it. Good to know. We are still playing with CephFS as our ultimate solution, but in the meantime this may indeed be a good workaround for me. I'm very curious what the scaling issue is with the shared client. Do you have a working perf that can capture callgraph information on this machine? Not currently, but I could certainly work on it. The issue that we see is basically what the OP showed -- that there seems to be a finite amount of bandwidth that I can read/write from a machine, regardless of how many RBDs are involved. i.e., if I can get 1GB/sec writes on one RBD when everything else is idle, running the same test on two RBDs in parallel *from the same machine* ends up with the sum of the two at ~1GB/sec, split fairly evenly. However, if I do the same thing and run the same test on two RBDs, each hosted on a separate machine, I definitely see increased bandwidth. Monitoring network traffic and the Ceph OSD nodes seems to imply that they are not overloaded -- there is more bandwidth to be had, the clients just aren't able to push the data fast enough. That's why I'm hoping creating a new client for each RBD will improve things. I'm not going to enable this everywhere just yet, we will test things on a few RBDs and test, and perhaps enable on some RBDs that are particularly heavily loaded. I'll work on the perf capture! Thanks for the feedback, as always. - Travis sage Thanks! - Travis On Thu, Sep 19, 2013 at 8:03 PM, Somnath Roy somnath@sandisk.com wrote: Thanks Josh ! I am able to successfully add this noshare option in the image mapping now. Looking at dmesg output, I found that was indeed the secret key problem. Block performance is scaling now. Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Josh Durgin Sent: Thursday, September 19, 2013 12:24 PM To: Somnath Roy Cc: Sage Weil; ceph-de...@vger.kernel.org; Anirban Ray; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Scaling RBD module On 09/19/2013 12:04 PM, Somnath Roy wrote: Hi Josh, Thanks for the information. I am trying to add the following but hitting some permission issue. root@emsclient:/etc# echo mon-1:6789,mon-2:6789,mon-3:6789 name=admin,key=client.admin,noshare test_rbd ceph_block_test' /sys/bus/rbd/add -bash: echo: write error: Operation not permitted If you check dmesg, it will probably show an error trying to authenticate to the cluster. Instead of key=client.admin, you can pass the base64 secret value as shown in 'ceph auth list' with the secret=X option. BTW, there's a ticket for adding the noshare option to rbd map so using the sysfs interface like this is never necessary: http://tracker.ceph.com/issues/6264 Josh Here is the contents of rbd directory.. root@emsclient:/sys/bus/rbd# ll total 0 drwxr-xr-x 4 root root0 Sep 19 11:59 ./ drwxr-xr-x 30 root root0 Sep 13 11:41 ../ --w--- 1 root root 4096 Sep 19 11:59 add drwxr-xr-x 2 root root0 Sep 19 12:03 devices/ drwxr-xr-x 2 root root0 Sep 19 12:03 drivers/ -rw-r--r-- 1 root root 4096 Sep 19 12:03 drivers_autoprobe --w--- 1 root root 4096 Sep 19 12:03 drivers_probe --w--- 1 root root 4096 Sep 19 12:03 remove --w--- 1 root root 4096 Sep 19 11:59 uevent I checked even if I am logged in as root , I can't write anything on /sys. Here is the Ubuntu version I am using.. root@emsclient:/etc# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description:Ubuntu 13.04 Release:13.04 Codename: raring
Re: [ceph-users] 1 particular ceph-mon never jobs on 0.67.2
Hi James, Yes, all configured using the interfaces file. Only two interfaces, eth0 and eth1: auto eth0 iface eth0 inet dhcp auto eth1 iface eth1 inet dhcp I took a single node and rebooted it several times, and it really was about 50/50 whether or not the OSDs showed up under 'localhost' or n0. I tried a few different things last night with no luck. I modified when ceph-all starts by writing differet start on values to /etc/init/ceph-all.override. I was grasping for straws a bit, as I just kept adding (and'ing) events, hoping to find something that works. I tried: start on (local-filesystems and net-device-up IFACE=eth0) start on (local-filesystems and net-device-up IFACE=eth0 and net-device-up IFACE=eth1) start on (local-filesystems and net-device-up IFACE=eth0 and net-device-up IFACE=eth1 and started network-services) Oddly, the last one seemed to work at first. When I added the started network-services to the list, the OSDs came up correctly each time! But, the monitor never started. If I started it directly start ceph-mon id=n0, it came up fine, but not during boot. I spent a couple hours trying to debug *that* before I gave up and switched to static hostnames. =/ I had even thrown --verbose in the kernel command line so I could see all the upstart events happening, but didn't see anything obvious. So now I'm back to the stock upstart scripts, using static hostnames, and, and I don't have any issues with OSDs moving in the crushmap, or any new problems with the monitors. Sage, I do think I still saw a weird issue with my third mon not starting (same as the original email -- even now with static hostnames), but it was late, and I lost access to the cluster right about then and haven't regained it. Ill double-check that when I get access again and hopefully will find that problem has gone away too. - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 1 particular ceph-mon never jobs on 0.67.2
Cool. So far I have tried: start on (local-filesystems and net-device-up IFACE=eth0) start on (local-filesystems and net-device-up IFACE=eth0 and net-device-up IFACE=eth1) About to try: start on (local-filesystems and net-device-up IFACE=eth0 and net-device-up IFACE=eth1 and started network-services) The local-filesystems + network device is billed as an alternative to runlevel if you need to to do something *after* networking... No luck so far. I'll keep trying things out. On Mon, Aug 26, 2013 at 2:31 PM, Sage Weil s...@inktank.com wrote: On Mon, 26 Aug 2013, Travis Rhoden wrote: Hi Sage, Thanks for the response. I noticed that as well, and suspected hostname/DHCP/DNS shenanigans. What's weird is that all nodes are identically configured. I also have monitors running on n0 and n12, and they come up fine, every time. Here's the mon_host line from ceph.conf: mon_initial_members = n0, n12, n24 mon_host = 10.0.1.0,10.0.1.12,10.0.1.24 just to test /etc/hosts and name resolution... root@n24:~# getent hosts n24 10.0.1.24 n24 root@n24:~# hostname -s n24 The only loopback device in /etc/hosts is 127.0.0.1 localhost, so that should be fine. Upon rebooting this node, I've had the monitor come up okay once, maybe out of 12 tries. So it appears to be some kind of race... No clue what is going on. If I stop and start the monitor (or restart), it doesn't appear to change anything. However, on the topic of races, I having one other more pressing issue. Each OSD host is having it's hostname assigned via DHCP. Until that assignment is made (during init), the hostname is localhost, and then it switches over to nx, for some node number. The issue I am seeing is that there is a race between this hostname assignment and the Ceph Upstart scripts, such that sometimes ceph-osd starts while the hostname is still 'localhost'. This then causes the osd location to change in the crushmap, which is going to be a very bad thing. =) When rebooting all my nodes at once (there are several dozen), about 50% move from being under nx to localhost. Restarting all the ceph-osd jobs moves them back (because the hostname is defined). I'm wondering what kind of delay, or additional start-on logic I can add to the upstart script to work around this. Hmm, this is beyond my upstart-fu, unfortunately. This has come up before, actually. Previously we would wait for any interface to come up and then start, but that broke with multi-nic machines, and I ended up just making things start in runlevel [2345]. James, do you know what should be done to make the job wait for *all* network interfaces to be up? Is that even the right solution here? sage On Fri, Aug 23, 2013 at 4:47 PM, Sage Weil s...@inktank.com wrote: Hi Travis, On Fri, 23 Aug 2013, Travis Rhoden wrote: Hey folks, I've just done a brand new install of 0.67.2 on a cluster of Calxeda nodes. I have one particular monitor that number joins the quorum when I restart the node. Looks to me like it has something to do with the create-keys task, which never seems to finish: root 1240 1 4 13:03 ?00:00:02 /usr/bin/ceph-mon --cluster=ceph -i n24 -f root 1244 1 0 13:03 ?00:00:00 /usr/bin/python /usr/sbin/ceph-create-keys --cluster=ceph -i n24 I don't see that task on my other monitors. Additionally, that task is periodically query the monitor status: root 1240 1 2 13:03 ?00:00:02 /usr/bin/ceph-mon --cluster=ceph -i n24 -f root 1244 1 0 13:03 ?00:00:00 /usr/bin/python /usr/sbin/ceph-create-keys --cluster=ceph -i n24 root 1982 1244 15 13:04 ?00:00:00 /usr/bin/python /usr/bin/ceph --cluster=ceph --admin-daemon=/var/run/ceph/ceph-mon.n24.asok mon_status Checking that status myself, I see: # ceph --cluster=ceph --admin-daemon=/var/run/ceph/ceph-mon.n24.asok mon_status { name: n24, rank: 2, state: probing, election_epoch: 0, quorum: [], outside_quorum: [ n24], extra_probe_peers: [], sync_provider: [], monmap: { epoch: 2, fsid: f0b0d4ec-1ac3-4b24-9eab-c19760ce4682, modified: 2013-08-23 12:55:34.374650, created: 0.00, mons: [ { rank: 0, name: n0, addr: 10.0.1.0:6789\/0}, { rank: 1, name: n12, addr: 10.0.1.12:6789\/0
Re: [ceph-users] Backporting the kernel client
I built the 3.10-rc rbd module for a 3.8 kernel yesterday, and only had one thing to add (I know I'm reviving an old thread). There is one folder missing from the original list of files to use: include/linux/crush/* That would bring everything to: include/keys/ceph-type.h include/linux/ceph/* include/linux/crush/** fs/ceph/* net/ceph/* drivers/block/rbd.c drivers/block/rbd_types.h RBD built without a hitch. Getting CephFS to build was going to be a bit more work, but I didn't need it so I just skipped it. - Travis On Mon, Apr 29, 2013 at 8:41 PM, James Harper james.har...@bendigoit.com.au wrote: I'm probably not the only one who would like to run a distribution-provided kernel (which for Debian Wheezy/Ubuntu Precise is 3.2) and still have a recent-enough Ceph kernel client. So I'm wondering whether it's feasible to backport the kernel client to an earlier kernel. You can grab the 3.8 kernel from debian experimental http://packages.debian.org/search?keywords=linux-image-3.8 I'm using it on a bunch of machines and I know of a few others using it too. The plan is as follows: 1) Grab the Ceph files from https://github.com/ceph/ceph-client (and put them over the older kernel sources). If I got it right the files are: include/keys/ceph-type.h include/linux/ceph/* fs/ceph/* net/ceph/* drivers/block/rbd.c drivers/block/rbd_types.h 2) Make (trivial) adjustments to the source code to account for changed kernel interfaces. 3) Compile as modules and install the new Ceph modules under /lib/modules. 4) Reboot to a standard distribution kernel with up-to-date Ceph client. I would think you should be able to build a dkms package pretty easily, and it would be a lot faster to build than building an entire kernel, and much easier to maintain. Of course that depends on the degree of integration with the kernel... James ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade from bobtail
I'm actually planning this same upgrade on Saturday. Is the memory leak from Bobtail during deep-scrub known to be squashed? I've been seeing that a lot lately. I know Bobtail-Cuttlefish is only one way, due to the mon re-architecting. But in general, whenever we do upgrades we usually have a fall-back/reversion plan in case things go wrong. Is that ever going to be possible with Ceph? - Travis On Mon, Jun 17, 2013 at 12:27 PM, Sage Weil s...@inktank.com wrote: On Mon, 17 Jun 2013, Wolfgang Hennerbichler wrote: Hi, i'm planning to Upgrade my bobtail (latest) cluster to cuttlefish. Are there any outstanding issues that I should be aware of? Anything that could brake my productive setup? There will be another point release out in the next day or two that resolves a rare sequence of errors during the upgrade that can be problematic (see the 0.61.3 release notes). There are also several fixes for udev/ceph-disk/ceph-deploy on rpm-based distros that will be included. If you can wait a couple days I would suggest that. sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD crash during script, 0.56.4
I'm afraid I don't. I don't think I looked when it happened, and searching for one just now came up empty. :/ If it happens again, I'll be sure to keep my eye out for one. FWIW, this particular server (1 out of 5) has 8GB *less* RAM than the others (one bad stick, it seems), and this has happened twice. But it still has 40GB for 12 OSDs, so I think it should be plenty. Thanks for responding. - Travis On Mon, May 13, 2013 at 4:49 PM, Gregory Farnum g...@inktank.com wrote: On Tue, May 7, 2013 at 9:44 AM, Travis Rhoden trho...@gmail.com wrote: Hey folks, Saw this crash the other day: ceph version 0.56.4 (63b0f854d1cef490624de5d6cf9039735c7de5ca) 1: /usr/bin/ceph-osd() [0x788fba] 2: (()+0xfcb0) [0x7f19d1889cb0] 3: (gsignal()+0x35) [0x7f19d0248425] 4: (abort()+0x17b) [0x7f19d024bb8b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f19d0b9a69d] 6: (()+0xb5846) [0x7f19d0b98846] 7: (()+0xb5873) [0x7f19d0b98873] 8: (()+0xb596e) [0x7f19d0b9896e] 9: (operator new[](unsigned long)+0x47e) [0x7f19d102db1e] 10: (ceph::buffer::create(unsigned int)+0x67) [0x834727] 11: (ceph::buffer::ptr::ptr(unsigned int)+0x15) [0x834a95] 12: (FileStore::read(coll_t, hobject_t const, unsigned long, unsigned long, ceph::buffer::list)+0x1ae) [0x6fbdde] 13: (PG::build_scrub_map_chunk(ScrubMap, hobject_t, hobject_t, bool)+0x347) [0x69ac57] 14: (PG::chunky_scrub()+0x375) [0x69faf5] 15: (PG::scrub()+0x145) [0x6a0e95] 16: (OSD::ScrubWQ::_process(PG*)+0xc) [0x6384ec] 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8297e6] 18: (ThreadPool::WorkThread::entry()+0x10) [0x82b610] 19: (()+0x7e9a) [0x7f19d1881e9a] 20: (clone()+0x6d) [0x7f19d0305cbd] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. Appears to have gone down during a scrub? I don't see anything interesting in /var/log/syslog or anywhere else at the same time. It's actually the second time I've seen this exact stack trace. First time was reported here... (was going to insert GMane link, but search.gmane.org appears to be down for me). Well, for those inclined, the thread was titled question about mon memory usage, and was also started by me. Any thoughts? I do plan to upgrade to 0.56.6 when I can. I'm a little leery of doing it on a production system without a maintenance window, though. When I went from 0.56.3 -- 0.56.4 on a live system, a system using the RBD kernel module kpanic'd. =) Do you have a core from when this happened? It was indeed during a scrub, but it didn't fail an assert or anything — looks like maybe it tried to allocate too much memory or something... :/ -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] distinguish administratively down OSDs
Hey folks, This is either a feature request, or a request for guidance to handle something that must be common... =) I have a cluster with dozens of OSDs, and one started having read errors (media errors) from the hard disk. Ceph complained, I took it out of service my marking it down and out. ceph osd tree showed it as down, with a weight of 0 (out). Perfect. In the meantime, I RMA'd the disk. The replacement is on-hand, but we haven't done the swap-out yet. Woohoo, rot in place. =) Fast forward a few days, and we had a server failure. This took a bunch of OSDs with it, but we were able to bring it back online, but not before before normal recovery operations had started. The failed server came back up, and things started to migrate *back*. All this is normal. However, the loads were pretty intense, and I actually saw a few OSDs on *other* servers fail. Seemingly randomly. Only 3 or 4. Thankfully I was watching for that, and restarted them before hitting the default 5 minute timeout and kicking off *more* recovery. On to my question... During this time where I was watching for newly down OSDs, I had no way of knowing which OSDs were newly down (and potentially out), and which was the one I had set down on purpose. At least not from the CLI. I figured it out from some notes I had taken when I RMA'd the drive, but (sheepishly) not before I tried restarting the OSD that had a bad hard drive behind it. So, from the CLI, how could one distinguish OSDs that are down *on purpose* and should be left that way? My first thought would be to allow for a note field to be attached to an OSD, and have that displayed in the output of ceph osd tree. If anyone is familiar with HPC and specifically PBS (pbsnodes command, specifically), this would be similar to pbsnodes -ln, which shows notes attached to compute nodes that an administrator might have attached to compute nodes that are down. Examples I see from this on one of our current compute clusters are bad RAM, bad scratch disk, does not POST, etc. Anyone else want to be able to track such a thing? Is there an existing method I could achieve such a goal with? As things scale to hundreds of OSDs are more, seems like a useful thing to note OSDs that have failed, and why. Thanks, - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OSD crash during script, 0.56.4
Hey folks, Saw this crash the other day: ceph version 0.56.4 (63b0f854d1cef490624de5d6cf9039735c7de5ca) 1: /usr/bin/ceph-osd() [0x788fba] 2: (()+0xfcb0) [0x7f19d1889cb0] 3: (gsignal()+0x35) [0x7f19d0248425] 4: (abort()+0x17b) [0x7f19d024bb8b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f19d0b9a69d] 6: (()+0xb5846) [0x7f19d0b98846] 7: (()+0xb5873) [0x7f19d0b98873] 8: (()+0xb596e) [0x7f19d0b9896e] 9: (operator new[](unsigned long)+0x47e) [0x7f19d102db1e] 10: (ceph::buffer::create(unsigned int)+0x67) [0x834727] 11: (ceph::buffer::ptr::ptr(unsigned int)+0x15) [0x834a95] 12: (FileStore::read(coll_t, hobject_t const, unsigned long, unsigned long, ceph::buffer::list)+0x1ae) [0x6fbdde] 13: (PG::build_scrub_map_chunk(ScrubMap, hobject_t, hobject_t, bool)+0x347) [0x69ac57] 14: (PG::chunky_scrub()+0x375) [0x69faf5] 15: (PG::scrub()+0x145) [0x6a0e95] 16: (OSD::ScrubWQ::_process(PG*)+0xc) [0x6384ec] 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8297e6] 18: (ThreadPool::WorkThread::entry()+0x10) [0x82b610] 19: (()+0x7e9a) [0x7f19d1881e9a] 20: (clone()+0x6d) [0x7f19d0305cbd] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. Appears to have gone down during a scrub? I don't see anything interesting in /var/log/syslog or anywhere else at the same time. It's actually the second time I've seen this exact stack trace. First time was reported here... (was going to insert GMane link, but search.gmane.org appears to be down for me). Well, for those inclined, the thread was titled question about mon memory usage, and was also started by me. Any thoughts? I do plan to upgrade to 0.56.6 when I can. I'm a little leery of doing it on a production system without a maintenance window, though. When I went from 0.56.3 -- 0.56.4 on a live system, a system using the RBD kernel module kpanic'd. =) - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph osd tell bench
I have a question about tell bench command. When I run this, is it behaving more or less like a dd on the drive? It appears to be, but I wanted to confirm whether or not it is bypassing all the normal Ceph stack that would be writing metadata, calculating checksums, etc. One bit of behavior I noticed a while back that I was not expecting is that this command does write to the journal. It made sense when I thought about it, but when I have an SSD journal in front of an OSD, I can't get the tell bench command to really show me accurate numbers of the raw speed of the OSD -- instead I get write speeds of the SSD. Just a small caveat there. The upside to that is when do you something like tell \* bench, you are able to see if that SSD becomes a bottleneck by hosting multiple journals, so I'm not really complaining. But it does make a bit tough to see if perhaps one OSD is performing much differently than others. But really, I'm mainly curious if it skips any normal metadata/checksum overhead that may be there otherwise. Thanks, - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Failed assert when starting new OSDs in 0.60
Hi Guys, Any additional thoughts on this? There was a bit of information shared off-list I wanted to bring back: Sam mentioned that the metadata looked odd, and suspected some form of 32bit shenanigans in the key name construction. However, that might not have been the case, because later came in with: Hmm. Based on the omap and logs, the omap directory is simply a bunch of updates behind. Was the node rebooted as part of the osd restart? FS is xfs? What are your fs mount options? There was no node restart. We are using XFS. From ceph.conf: osd mount options xfs = rw,noatime,inode64,logbufs=8,logbsize=256k And of course as soon as I paste that, I look at inode64 on these 32-bit ARM systems and think, hmm. I know 64-bit inodes are recommended for filesystems 1TB (these are 4TB drives), but have never thought about if this is supported on a 32-bit system. Quick web searches appear to indicate this may be okay... Sorry some of this may be a duplicate. I wanted to bring it back on-list in case someone looked at that and said no, you can't use those XFS options on 32-bit ARM. =) On a side note, I've been using the cluster heavily the last couple days, with no other problems. I just am not doing any cluster or OSD restarts for fear of the OSD not coming back. - Travis On Tue, Apr 30, 2013 at 12:17 PM, Travis Rhoden trho...@gmail.com wrote: On the OSD node: root@cepha0:~# lsb_release -a No LSB modules are available. Distributor ID:Ubuntu Description:Ubuntu 12.10 Release:12.10 Codename:quantal root@cepha0:~# dpkg -l *leveldb* Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==---== ii libleveldb1:armhf 0+20120530.gitdd0d562-2 armhffast key-value storage library root@cepha0:~# uname -a Linux cepha0 3.5.0-27-highbank #46-Ubuntu SMP Mon Mar 25 23:19:40 UTC 2013 armv7l armv7l armv7l GNU/Linux On the MON node: # lsb_release -a No LSB modules are available. Distributor ID:Ubuntu Description:Ubuntu 12.10 Release:12.10 Codename:quantal # uname -a Linux 3.5.0-27-generic #46-Ubuntu SMP Mon Mar 25 19:58:17 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux # dpkg -l *leveldb* Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==---== un leveldb-doc none(no description available) ii libleveldb-dev:amd64 0+20120530.gitdd0d562-2 amd64fast key-value storage library (development files) ii libleveldb1:amd64 0+20120530.gitdd0d562-2 amd64fast key-value storage library On Tue, Apr 30, 2013 at 12:11 PM, Samuel Just sam.j...@inktank.comwrote: What version of leveldb is installed? Ubuntu/version? -Sam On Tue, Apr 30, 2013 at 8:50 AM, Travis Rhoden trho...@gmail.com wrote: Interestingly, the down OSD does not get marked out after 5 minutes. Probably that is already fixed by http://tracker.ceph.com/issues/4822. On Tue, Apr 30, 2013 at 11:42 AM, Travis Rhoden trho...@gmail.com wrote: Hi Sam, I was prepared to write in and say that the problem had gone away. I tried restarting several OSDs last night in the hopes of capturing the problem on and OSD that hadn't failed yet, but didn't have any luck. So I did indeed re-create the cluster from scratch (using mkcephfs), and what do you know -- everything worked. I got everything in a nice stable state, then decided to do a full cluster restart, just to be sure. Sure enough, one OSD failed to come up, and has the same stack trace. So I believe I have the log you want -- just from the OSD that failed, right? Question -- any feeling for what parts of the log you need? It's 688MB uncompressed (two hours!), so I'd like to be able to trim some off for you before making it available. Do you only need/want the part from after the OSD was restarted? Or perhaps the corruption happens on OSD shutdown and you need some before that? If you are fine with that large of a file, I can just make that available too. Let me know. - Travis On Mon, Apr 29, 2013 at 6:26 PM, Travis Rhoden trho...@gmail.com wrote
Re: [ceph-users] Failed assert when starting new OSDs in 0.60
On the OSD node: root@cepha0:~# lsb_release -a No LSB modules are available. Distributor ID:Ubuntu Description:Ubuntu 12.10 Release:12.10 Codename:quantal root@cepha0:~# dpkg -l *leveldb* Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==---== ii libleveldb1:armhf 0+20120530.gitdd0d562-2 armhffast key-value storage library root@cepha0:~# uname -a Linux cepha0 3.5.0-27-highbank #46-Ubuntu SMP Mon Mar 25 23:19:40 UTC 2013 armv7l armv7l armv7l GNU/Linux On the MON node: # lsb_release -a No LSB modules are available. Distributor ID:Ubuntu Description:Ubuntu 12.10 Release:12.10 Codename:quantal # uname -a Linux 3.5.0-27-generic #46-Ubuntu SMP Mon Mar 25 19:58:17 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux # dpkg -l *leveldb* Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==---== un leveldb-doc none(no description available) ii libleveldb-dev:amd64 0+20120530.gitdd0d562-2 amd64fast key-value storage library (development files) ii libleveldb1:amd64 0+20120530.gitdd0d562-2 amd64fast key-value storage library On Tue, Apr 30, 2013 at 12:11 PM, Samuel Just sam.j...@inktank.com wrote: What version of leveldb is installed? Ubuntu/version? -Sam On Tue, Apr 30, 2013 at 8:50 AM, Travis Rhoden trho...@gmail.com wrote: Interestingly, the down OSD does not get marked out after 5 minutes. Probably that is already fixed by http://tracker.ceph.com/issues/4822. On Tue, Apr 30, 2013 at 11:42 AM, Travis Rhoden trho...@gmail.com wrote: Hi Sam, I was prepared to write in and say that the problem had gone away. I tried restarting several OSDs last night in the hopes of capturing the problem on and OSD that hadn't failed yet, but didn't have any luck. So I did indeed re-create the cluster from scratch (using mkcephfs), and what do you know -- everything worked. I got everything in a nice stable state, then decided to do a full cluster restart, just to be sure. Sure enough, one OSD failed to come up, and has the same stack trace. So I believe I have the log you want -- just from the OSD that failed, right? Question -- any feeling for what parts of the log you need? It's 688MB uncompressed (two hours!), so I'd like to be able to trim some off for you before making it available. Do you only need/want the part from after the OSD was restarted? Or perhaps the corruption happens on OSD shutdown and you need some before that? If you are fine with that large of a file, I can just make that available too. Let me know. - Travis On Mon, Apr 29, 2013 at 6:26 PM, Travis Rhoden trho...@gmail.com wrote: Hi Sam, No problem, I'll leave that debugging turned up high, and do a mkcephfs from scratch and see what happens. Not sure if it will happen again or not. =) Thanks again. - Travis On Mon, Apr 29, 2013 at 5:51 PM, Samuel Just sam.j...@inktank.com wrote: Hmm, I need logging from when the corruption happened. If this is reproducible, can you enable that logging on a clean osd (or better, a clean cluster) until the assert occurs? -Sam On Mon, Apr 29, 2013 at 2:45 PM, Travis Rhoden trho...@gmail.com wrote: Also, I can note that it does not take a full cluster restart to trigger this. If I just restart an OSD that was up/in previously, the same error can happen (though not every time). So restarting OSD's for me is a bit like Russian roullette. =) Even though restarting an OSD may not also result in the error, it seems that once it happens that OSD is gone for good. No amount of restart has brought any of the dead ones back. I'd really like to get to the bottom of it. Let me know if I can do anything to help. I may also have to try completely wiping/rebuilding to see if I can make this thing usable. On Mon, Apr 29, 2013 at 2:38 PM, Travis Rhoden trho...@gmail.com wrote: Hi Sam, Thanks for being willing to take a look. I applied the debug settings on one host that 3 out of 3
Re: [ceph-users] Failed assert when starting new OSDs in 0.60
Thanks Greg. I quit playing with it because every time I restarted the cluster (service ceph -a restart), I lost more OSDs.. First time it was 1, 2nd 10, 3rd time 13... All 13 down OSDs all show the same stacktrace. - Travis On Mon, Apr 29, 2013 at 11:56 AM, Gregory Farnum g...@inktank.com wrote: This sounds vaguely familiar to me, and I see http://tracker.ceph.com/issues/4052, which is marked as Can't reproduce — I think maybe this is fixed in next and master, but I'm not sure. For more than that I'd have to defer to Sage or Sam. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Sat, Apr 27, 2013 at 6:43 PM, Travis Rhoden trho...@gmail.com wrote: Hey folks, I'm helping put together a new test/experimental cluster, and hit this today when bringing the cluster up for the first time (using mkcephfs). After doing the normal service ceph -a start, I noticed one OSD was down, and a lot of PGs were stuck creating. I tried restarting the down OSD, but it would come up. It always had this error: -1 2013-04-27 18:11:56.179804 b6fcd000 2 osd.1 0 boot 0 2013-04-27 18:11:56.402161 b6fcd000 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, ceph::bufferlist*)' thread b6fcd000 time 2013-04-27 18:11:56.399089 osd/PG.cc: 2556: FAILED assert(values.size() == 1) ceph version 0.60-401-g17a3859 (17a38593d60f5f29b9b66c13c0aaa759762c6d04) 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, ceph::buffer::list*)+0x1ad) [0x2c3c0a] 2: (OSD::load_pgs()+0x357) [0x28cba0] 3: (OSD::init()+0x741) [0x290a16] 4: (main()+0x1427) [0x2155c0] 5: (__libc_start_main()+0x99) [0xb69bcf42] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. I then did a full cluster restart, and now I have ten OSDs down -- each showing the same exception/failed assert. Anybody seen this? I know I'm running a weird version -- it's compiled from source, and was provided to me. The OSDs are all on ARM, and the mon is x86_64. Just looking to see if anyone has seen this particular stack trace of load_pgs()/peek_map_epoch() before - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Failed assert when starting new OSDs in 0.60
Hi Sam, Thanks for being willing to take a look. I applied the debug settings on one host that 3 out of 3 OSDs with this problem. Then tried to start them up. Here are the resulting logs: https://dl.dropboxusercontent.com/u/23122069/cephlogs.tgz - Travis On Mon, Apr 29, 2013 at 1:04 PM, Samuel Just sam.j...@inktank.com wrote: You appear to be missing pg metadata for some reason. If you can reproduce it with debug osd = 20 debug filestore = 20 debug ms = 1 on all of the OSDs, I should be able to track it down. I created a bug: #4855. Thanks! -Sam On Mon, Apr 29, 2013 at 9:52 AM, Travis Rhoden trho...@gmail.com wrote: Thanks Greg. I quit playing with it because every time I restarted the cluster (service ceph -a restart), I lost more OSDs.. First time it was 1, 2nd 10, 3rd time 13... All 13 down OSDs all show the same stacktrace. - Travis On Mon, Apr 29, 2013 at 11:56 AM, Gregory Farnum g...@inktank.com wrote: This sounds vaguely familiar to me, and I see http://tracker.ceph.com/issues/4052, which is marked as Can't reproduce — I think maybe this is fixed in next and master, but I'm not sure. For more than that I'd have to defer to Sage or Sam. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Sat, Apr 27, 2013 at 6:43 PM, Travis Rhoden trho...@gmail.com wrote: Hey folks, I'm helping put together a new test/experimental cluster, and hit this today when bringing the cluster up for the first time (using mkcephfs). After doing the normal service ceph -a start, I noticed one OSD was down, and a lot of PGs were stuck creating. I tried restarting the down OSD, but it would come up. It always had this error: -1 2013-04-27 18:11:56.179804 b6fcd000 2 osd.1 0 boot 0 2013-04-27 18:11:56.402161 b6fcd000 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, ceph::bufferlist*)' thread b6fcd000 time 2013-04-27 18:11:56.399089 osd/PG.cc: 2556: FAILED assert(values.size() == 1) ceph version 0.60-401-g17a3859 (17a38593d60f5f29b9b66c13c0aaa759762c6d04) 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, ceph::buffer::list*)+0x1ad) [0x2c3c0a] 2: (OSD::load_pgs()+0x357) [0x28cba0] 3: (OSD::init()+0x741) [0x290a16] 4: (main()+0x1427) [0x2155c0] 5: (__libc_start_main()+0x99) [0xb69bcf42] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. I then did a full cluster restart, and now I have ten OSDs down -- each showing the same exception/failed assert. Anybody seen this? I know I'm running a weird version -- it's compiled from source, and was provided to me. The OSDs are all on ARM, and the mon is x86_64. Just looking to see if anyone has seen this particular stack trace of load_pgs()/peek_map_epoch() before - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] how to get latest (non-point release) debs
Hey folks, There are some changes in Bobtail queued up for 0.56.4 that I am really anxious to get, but that build hasn't been released yet. Is there an apt repo I can point out that will get the latest build off of the bobtail branch? based on the docs [1] I tried this: deb http://gitbuilder.ceph.com/ceph-deb-main-x86_64/ref/bobtail precise main But that was not found. - Travis [1] http://ceph.com/docs/master/install/debian/#development-testing-packages ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Live migration of VM using librbd and OpenStack
Just for posterity, my ultimate solution was to patch nova on each compute host to always return True in _check_shared_storage_test_file (nova/virt/libvirt/driver.py) This did make migration work with nova live-migration, with one caveat. Since Nova is assuming that /var/lib/nova/instances is on shared storage (and since I hard coded the check to say yes, it really is), it thinks the /var/lib/nova/instances/domain folder will exist at both source and destination, and makes no attempt to create it on the destination. So before I run live-migration, I pop over to the source host, and rsync that folder to the destination. A little dirty, but it allows me to move running VMs around just fine in cases of maintenance on a host, which is exactly what I need. Thanks for everyone's feedback. - Travis On Tue, Mar 12, 2013 at 6:33 PM, Travis Rhoden trho...@gmail.com wrote: On Tue, Mar 12, 2013 at 5:06 PM, Josh Durgin josh.dur...@inktank.com wrote: On 03/12/2013 01:48 PM, Travis Rhoden wrote: Hi Josh, Thanks for the info. So if I want to do live migration with VMs that were launched with boot-from-volume, I'll need to use virsh to do the migration, rather than Nova. Okay, that should be doable. As an aside, I will probably want to look at the OpenStack DB and figure out how to tell it that the VM has moved to a different host. I'd rather there not be a disconnect between Nova and libvirt about where the VM lives. =) It's probably not too hard to edit nova to skip the checks when the instance is volume-backed, but if you don't want to do that, libvirt should be fine, and a bit more flexible. After messing with it for a few hours, I'm thinking about doing just that. The nova edits should be easy. Looks like it tests for shared storage by writing a file on the migration destination, and trying to read it at the source. I should be able to just comment out the check entirely, or make the check always pass. The virsh migrate strategy has been surprisingly difficult. Since I'm migrating a Nova VM, I had to the following pre-requisites (so far). Define the /var/lib/nova/instance/domain dir on the destination Define/migrate the nova libvirt-nwfilter for the specific VM Then, when I try to do the actual migration, I always get (at the source): error: internal error Process exited while reading console log output: chardev: opening backend file failed: Permission denied So QEMU is bailing, saying it can't read the console.log file. When I go look at that file, it is created, but with owner root:root and perms 0600. However, libvirtd makes it libvirt-qemu:kvm, 0600 before KVM tries to actually start the VM. I've always found this dynamic file ownership bit in KVM/libvirt/qemu very confusing. Anyways, I tried a few different things, debug logging, etc. Even tried disabling apparmor. Still get permission denied each time. The commands Im running manually should be identical to what OpenStack is doing, so I can't figure out why their migrate is working and mine wouldn't. Oh well, will edit Nova and give that shot. - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Live migration of VM using librbd and OpenStack
Hi Josh, Thanks for the info. So if I want to do live migration with VMs that were launched with boot-from-volume, I'll need to use virsh to do the migration, rather than Nova. Okay, that should be doable. As an aside, I will probably want to look at the OpenStack DB and figure out how to tell it that the VM has moved to a different host. I'd rather there not be a disconnect between Nova and libvirt about where the VM lives. =) Additionally, thanks for saying that the migration is safe with the RBD cache enabled. I was going to ask that as well. On Tue, Mar 12, 2013 at 4:38 PM, Josh Durgin josh.dur...@inktank.comwrote: On 03/12/2013 01:28 PM, Travis Rhoden wrote: Thanks for the response, Trevor. The root disk (/var/lib/nova/instances) must be on shared storage to run the live migrate. I would argue that it is on shared storage. It is an RBD stored in Ceph, and that's available at each host via librbd. Agreed. You should be able to run block migration (which is a different form of the live-migration) that does not require shared storage. I think block-migration would not be correct in this instance. There is no file to copy (there is no disk file in /var/lib/nova/instances/** domain). Where is it going to copy it from/to? It's already an RBD. I know this is supposed to work [1]. I'm just wondering if it requires disabled the true live migration in libvirt. I think Josh will know. Yes, it works with true live migration just fine (even with caching). You can use virsh migrate or even do it through the virt-manager gui. Nova is just doing a check that doesn't make sense for volume-backed instances with live migration there. Unfortunately I haven't had the time to look at that problem in nova since that message, but I suspect the same issue is still there. Josh [1] https://lists.launchpad.net/**openstack/msg15074.htmlhttps://lists.launchpad.net/openstack/msg15074.html On Tue, Mar 12, 2013 at 4:13 PM, tra26 tr...@cs.drexel.edu wrote: Travis, The root disk (/var/lib/nova/instances) must be on shared storage to run the live migrate. You should be able to run block migration (which is a different form of the live-migration) that does not require shared storage. Take a look at: http://www.sebastien-han.fr/ blog/2012/07/12/openstack-**http://www.sebastien-han.fr/**blog/2012/07/12/openstack-** block-migration/http://www.**sebastien-han.fr/blog/2012/07/** 12/openstack-block-migration/http://www.sebastien-han.fr/blog/2012/07/12/openstack-block-migration/ **for information regarding the block level migration. -Trevor On 2013-03-12 15:57, Travis Rhoden wrote: Hey folks, Im wondering if the following is possible. I have OpenStack (Folsom) configured to boot VMs from volume using Ceph as a backend for Cinder and Glance. My setup pretty much follows the Ceph guides for this verbatim. Ive been using this setup for a while now, and its all been really smooth. However, I if I try do a live-migration, I get this: RemoteError: Remote error: RemoteError Remote error: InvalidSharedStorage_Remote vmhost3 is not on shared storage: Live migration can not be used without shared storage. One thing I am doing that may not be normal is that I am trying to do the true live migration in KVM/libvirt, having set this in my nova.conf: live_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE,VIR_** MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE Anyone know if this setup should work? Or if there is something I should tweak to make it work? I was thinking that having the RBD available via librbd at both the source and destination host makes that storage shared storage. Perhaps not if I am trying to do live migration? If I do OpenStacks normal live migration, it will pause the VM and move it, which is less than ideal, but workable. Thanks, - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com