Re: [ceph-users] Ceph Day Germany 2018
On 01/16/2018 06:51 AM, Leonardo Vaz wrote: Hey Cephers! We are proud to announce our first Ceph Day in 2018 which happens on February 7 at the Deutsche Telekom AG Office in Darmstadt (25 km South from Frankfurt Airport). The conference schedule[1] is being finished and the registration is already in progress[2]. If you're in Europe, join us at the Ceph Day Germany! Yes! Looking forward :-) I'll be there :) Wido Mit freundlichen Grüßen, Leo [1] https://ceph.com/cephdays/germany/ [2] https://cephdaygermany.eventbrite.com/ -- Leonardo Vaz Ceph Community Manager Open Source and Standards Team ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph Day Germany 2018
Hey Cephers! We are proud to announce our first Ceph Day in 2018 which happens on February 7 at the Deutsche Telekom AG Office in Darmstadt (25 km South from Frankfurt Airport). The conference schedule[1] is being finished and the registration is already in progress[2]. If you're in Europe, join us at the Ceph Day Germany! Mit freundlichen Grüßen, Leo [1] https://ceph.com/cephdays/germany/ [2] https://cephdaygermany.eventbrite.com/ -- Leonardo Vaz Ceph Community Manager Open Source and Standards Team ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Future
Hi Massimiliano, On Thu, Jan 11, 2018 at 6:15 AM, Massimiliano Cuttiniwrote: > Hi everybody, > > i'm always looking at CEPH for the future. > But I do see several issue that are leaved unresolved and block nearly > future adoption. > I would like to know if there are some answear already: > > 1) Separation between Client and Server distribution. > At this time you have always to update client & server in order to match the > same distribution of Ceph. > This is ok in the early releases but in future I do expect that the > ceph-client is ONE, not many for every major version. > The client should be able to self determinate what version of the protocol > and what feature are enabable and connect to at least 3 or 5 older major > version of Ceph by itself. > > 2) Kernel is old -> feature mismatch > Ok, kernel is old, and so? Just do not use it and turn to NBD. > And please don't let me even know, just virtualize under the hood. > > 3) Management complexity > Ceph is amazing, but is just too big to have everything under control (too > many services). > Now there is a management console, but as far as I read this management > console just show basic data about performance. > So it doesn't manage at all... it's just a monitor... > > In the end You have just to manage everything by your command-line. > In order to manage by web it's mandatory: > > create, delete, enable, disable services > If I need to run ISCSI redundant gateway, do I really need to cut > command from your online docs? > Of course no. You just can script it better than what every admin can do. > Just give few arguments on the html forms and that's all. > > create, delete, enable, disable users > I have to create users and keys for 24 servers. Do you really think it's > possible to make it without some bad transcription or bad cut of the > keys across all servers. > Everybody end by just copy the admin keys across all servers, giving very > unsecure full permission to all clients. > > create MAPS (server, datacenter, rack, node, osd). > This is mandatory to design how the data need to be replicate. > It's not good create this by script or shell, it's needed a graph editor > which can dive you the perpective of what will be copied where. > > check hardware below the hood > It's missing the checking of the health of the hardware below. > But Ceph was born as a storage software that ensure redundacy and protect > you from single failure. > So WHY did just ignore to check the healths of disks with SMART? > FreeNAS just do a better work on this giving lot of tools to understand > which disks is which and if it will fail in the nearly future. > Of course also Ceph could really forecast issues by itself and need to start > to integrate with basic hardware IO. > For example, should be possible to enable disable UID on the disks in order > to know which one need to be replace. As a technical note, we ran into this need with Storcium, and it is pretty easy to utilize UID indicators using both Areca and LSI/Avago HBAs. You will need the standard control tools available from their web sites, as well as hardware that supports SGPIO (most enterprise JBODs and drives do). There's likely similar options to other HBAs. Areca: UID on: cli64 curctrl=1 set password= cli64 curctrl= disk identify drv= UID OFF: cli64 curctrl=1 set password= cli64 curctrl= disk identify drv=0 LSI/Avago: UID on: sas2ircu locate : ON UID OFF: sas2ircu locate : OFF HTH, Alex Gorbachev Storcium > I guess this kind of feature are quite standard across all linux > distributions. > > The management complexity can be completly overcome with a great Web > Manager. > A Web Manager, in the end is just a wrapper for Shell Command from the > CephAdminNode to others. > If you think about it a wrapper is just tons of time easier to develop than > what has been already developed. > I do really see that CEPH is the future of storage. But there is some > quick-avoidable complexity that need to be reduced. > > If there are already some plan for these issue I really would like to know. > > Thanks, > Max > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Changing device-class using crushtool
Hi Wido, On Wed, Jan 10, 2018 at 11:09 AM, Wido den Hollanderwrote: > Hi, > > Is there a way to easily modify the device-class of devices on a offline > CRUSHMap? > > I know I can decompile the CRUSHMap and do it, but that's a lot of work in a > large environment. > > In larger environments I'm a fan of downloading the CRUSHMap, modifying it > to my needs, testing it and injecting it at once into the cluster. > > crushtool can do a lot, you can also run tests using device classes, but > there doesn't seem to be a way to modify the device-class using crushtool, > is that correct? This is how we do it in Storcium based on http://docs.ceph.com/docs/master/rados/operations/crush-map/ ceph osd crush rm-device-class ceph osd crush set-device-class -- Best regards, Alex Gorbachev Storcium > > Wido > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Safe to delete data, metadata pools?
Thanks John, I removed these pools on Friday and as you suspected there was no impact. Regards, Rich On 8 January 2018 at 23:15, John Spraywrote: > On Mon, Jan 8, 2018 at 2:55 AM, Richard Bade wrote: >> Hi Everyone, >> I've got a couple of pools that I don't believe are being used but >> have a reasonably large number of pg's (approx 50% of our total pg's). >> I'd like to delete them but as they were pre-existing when I inherited >> the cluster, I wanted to make sure they aren't needed for anything >> first. >> Here's the details: >> POOLS: >> NAME ID USED %USED MAX AVAIL OBJECTS >> data 0 0 088037G0 >> metadata 1 0 088037G0 >> >> We don't run cephfs and I believe these are meant for that, but may >> have been created by default when the cluster was set up (back on >> dumpling or bobtail I think). >> As far as I can tell there is no data in them. Do they need to exist >> for some ceph function? >> The pool names worry me a little, as they sound important. > > The data and metadata pools were indeed created by default in older > versions of Ceph, for use by CephFS. Since you're not using CephFS, > and nobody is using the pools for anything else either (they're > empty), you can go ahead and delete them. > >> >> They have 3136 pg's each so I'd like to be rid of those so I can >> increase the number of pg's in my actual data pools without getting >> over the 300 pg's per osd. >> Here's the osd dump: >> pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash >> rjenkins pg_num 3136 pgp_num 3136 last_change 1 crash_replay_interval >> 45 min_read_recency_for_promote 1 min_write_recency_for_promote 1 >> stripe_width 0 >> pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 >> object_hash rjenkins pg_num 3136 pgp_num 3136 last_change 1 >> min_read_recency_for_promote 1 min_write_recency_for_promote 1 >> stripe_width 0 >> >> Also, what performance impact am I likely to see when ceph removes the >> empty pg's considering it's approx 50% of my total pg's on my 180 >> osd's. > > Given that they're empty, I'd expect little if any noticeable impact. > > John > >> >> Thanks, >> Rich >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw fails with "ERROR: failed to initialize watch: (34) Numerical result out of range"
On Tue, Jan 16, 2018 at 1:35 AM, Alexander Peterswrote: > i created the dump output but it looks very cryptic to me so i can't really > make much sense of it. is there anything to look for in particular? Yes, basically we are looking for any line that ends in "= 34". You might also find piping it through c++filt helps. Something like... $ c++filt > i think i am going to read up on how interpret ltrace output... > > BR > Alex > > - Ursprüngliche Mail - > Von: "Brad Hubbard" > An: "Alexander Peters" > CC: "Ceph Users" > Gesendet: Montag, 15. Januar 2018 03:09:53 > Betreff: Re: [ceph-users] radosgw fails with "ERROR: failed to initialize > watch: (34) Numerical result out of range" > > On Mon, Jan 15, 2018 at 11:38 AM, Brad Hubbard wrote: >> On Mon, Jan 15, 2018 at 10:38 AM, Alexander Peters >> wrote: >>> Thanks for the reply - unfortunatly the link you send is behind a paywall so >>> at least for now i can’t read it. >> >> That's why I provided the cause as laid out in that article (pgp num > pg >> num). >> >> Do you have any settings in ceph.conf related to pg_num or pgp_num? >> >> If not, please add your details to http://tracker.ceph.com/issues/22351 > > Rados can return ERANGE (34) in multiple places so identifying where > might be a big step towards working this out. > > $ ltrace -fo /tmp/ltrace.out /usr/bin/radosgw --cluster ceph --name > client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d > > The objective is to find which function(s) return 34. > >> >>> >>> output of ceph osd dump shows that pgp num == pg num: >>> >>> [root@ctrl01 ~]# ceph osd dump >>> epoch 142 >>> fsid 0e2d841f-68fd-4629-9813-ab083e8c0f10 >>> created 2017-12-20 23:04:59.781525 >>> modified 2018-01-14 21:30:57.528682 >>> flags sortbitwise,recovery_deletes,purged_snapdirs >>> crush_version 6 >>> full_ratio 0.95 >>> backfillfull_ratio 0.9 >>> nearfull_ratio 0.85 >>> require_min_compat_client jewel >>> min_compat_client jewel >>> require_osd_release luminous >>> pool 1 'glance' replicated size 3 min_size 2 crush_rule 0 object_hash >>> rjenkins pg_num 64 pgp_num 64 last_change 119 flags hashpspool stripe_width >>> 0 application rbd >>> removed_snaps [1~3] >>> pool 2 'cinder-2' replicated size 3 min_size 2 crush_rule 0 object_hash >>> rjenkins pg_num 64 pgp_num 64 last_change 120 flags hashpspool stripe_width >>> 0 application rbd >>> removed_snaps [1~3] >>> pool 3 'cinder-3' replicated size 3 min_size 2 crush_rule 0 object_hash >>> rjenkins pg_num 64 pgp_num 64 last_change 121 flags hashpspool stripe_width >>> 0 application rbd >>> removed_snaps [1~3] >>> pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash >>> rjenkins pg_num 8 pgp_num 8 last_change 94 owner 18446744073709551615 flags >>> hashpspool stripe_width 0 application rgw >>> max_osd 3 >>> osd.0 up in weight 1 up_from 82 up_thru 140 down_at 79 >>> last_clean_interval [23,78) 10.16.0.11:6800/1795 10.16.0.11:6801/1795 >>> 10.16.0.11:6802/1795 10.16.0.11:6803/1795 exists,up >>> abe33844-6d98-4ede-81a8-a8bdc92dada8 >>> osd.1 up in weight 1 up_from 73 up_thru 140 down_at 71 >>> last_clean_interval [55,72) 10.16.0.13:6800/1756 10.16.0.13:6804/1001756 >>> 10.16.0.13:6805/1001756 10.16.0.13:6806/1001756 exists,up >>> 0dab9372-6ffe-4a23-a8b7-4edca3745a2a >>> osd.2 up in weight 1 up_from 140 up_thru 140 down_at 133 >>> last_clean_interval [31,132) 10.16.0.12:6800/1749 10.16.0.12:6801/1749 >>> 10.16.0.12:6802/1749 10.16.0.12:6803/1749 exists,up >>> 220bba17-8119-4035-9e43-5b8eaa27562f >>> >>> >>> Am 15.01.2018 um 01:33 schrieb Brad Hubbard : >>> >>> On Mon, Jan 15, 2018 at 8:34 AM, Alexander Peters >>> wrote: >>> >>> Hello >>> >>> I am currently experiencing a strange issue with my radosgw. It Fails to >>> start and all tit says is: >>> [root@ctrl02 ~]# /usr/bin/radosgw --cluster ceph --name >>> client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d >>> 2018-01-14 21:30:57.132007 7f44ddd18e00 0 deferred set uid:gid to 167:167 >>> (ceph:ceph) >>> 2018-01-14 21:30:57.132161 7f44ddd18e00 0 ceph version 12.2.2 >>> (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process >>> (unknown), pid 13928 >>> 2018-01-14 21:30:57.556672 7f44ddd18e00 -1 ERROR: failed to initialize >>> watch: (34) Numerical result out of range >>> 2018-01-14 21:30:57.558752 7f44ddd18e00 -1 Couldn't init storage provider >>> (RADOS) >>> >>> (when started via systemctl it writes the same lines to the logfile) >>> >>> strange thing is that it is working on an other env that was installed with >>> the same set of ansible playbooks. >>> OS is CentOS Linux release 7.4.1708 (Core) >>> >>> Ceph is up and running ( I am currently using it for storing volumes and >>> images form Openstack ) >>> >>> Does anyone have an idea how to debug this? >>> >>> >>> According to
Re: [ceph-users] Removing cache tier for RBD pool
On Mon, Jan 8, 2018 at 6:08 AM, Jens-U. Mozdzenwrote: > Hi *, > > trying to remove a caching tier from a pool used for RBD / Openstack, we > followed the procedure from http://docs.ceph.com/docs/mast > er/rados/operations/cache-tiering/#removing-a-writeback-cache and ran > into problems. > > The cluster is currently running Ceph 12.2.2, the caching tier was created > with an earlier release of Ceph. > > First of all, setting the cache-mode to "forward" is reported to be > unsafe, which is not mentioned in the documentation - if it's really meant > to be used in this case, the need for "--yes-i-really-mean-it" should be > documented. > > Unfortunately, using "rados -p hot-storage cache-flush-evict-all" not only > reported errors ("file not found") for many objects, but left us with quite > a number of objects in the pool and new ones being created, despite the > "forward" mode. Even after stopping all Openstack instances ("VMs"), we > could also see that the remaining objects in the pool were still locked. > Manually unlocking these via rados commands worked, but > "cache-flush-evict-all" then still reported those "file not found" errors > and 1070 objects remained in the pool, like before. We checked the > remaining objects via "rados stat" both in the hot-storage and the > cold-storage pool and could see that every hot-storage object had a > counter-part in cold-storage with identical stat info. We also compared > some of the objects (with size > 0) and found the hot-storage and > cold-storage entities to be identical. > > We aborted that attempt, reverted the mode to "writeback" and restarted > the Openstack cluster - everything was working fine again, of course still > using the cache tier. > > During a recent maintenance window, the Openstack cluster was shut down > again and we re-tried the procedure. As there were no active users of the > images pool, we skipped the step of forcing the cache mode to forward and > immediately issued the "cache-flush-evict-all" command. Again 1070 objects > remained in the hot-storage pool (and gave "file not found" errors), but > unlike last time, none were locked. > > Out of curiosity we then issued loops of "rados -p hot-storage cache-flush > " and "rados -p hot-storage cache-evict " for all > objects in the hot-storage pool and surprisingly not only received no error > messages at all, but were left with an empty hot-storage pool! We then > proceeded with the further steps from the docs and were able to > successfully remove the cache tier. > > This leaves us with two questions: > > 1. Does setting the cache mode to "forward" lead to above situation of > remaining locks on hot-storage pool objects? Maybe the clients' unlock > requests are forwarded to the cold-storage pool, leaving the hot-storage > objects locked? If so, this should be documented and it'd seem impossible > to cleanly remove a cache tier during live operations. > > 2. What is the significant difference between "rados > cache-flush-evict-all" and separate "cache-flush" and "cache-evict" cycles? > Or is it some implementation error that leads to those "file not found" > errors with "cache-flush-evict-all", while the manual cycles work > successfully? > > Thank you for any insight you might be able to share. > > Regards, > Jens > i've removed a cache tier in environments a few times. the only locked files i ran into were the rbd_directory and rbd_header objects for each volume. the rbd_headers for each rbd volume are locked as long as the vm is running. every time i've tried to remove a cache tier, i shutdown all of the vms before starting the procedure and there wasn't any problem getting things flushed+evicted. so i can't really give any further insight into what might have happened other than it worked for me. i set the cache-mode to forward everytime before flushing and evicting objects. i don't think there really is a significant technical difference between the cache-flush-evict-all command and doing separate cache-flush and cache-evict on individual objects. my understanding is cache-flush-evict-all is just a short cut to getting everything in the cache flushed and evicted. did the cache-flush-evict-all error on some objects where the separate operations succeeded? you're description doesn't say if there was but then you say you used both styles during your second attempt. there being objects left in the hot storage pool is something i've seen, even after it looks like everything has been flushed. when i dug deeper, it looked like all of the objects left in the pool were the hitset objects that the cache tier uses for tracking how frequently objects are used. those hitsets need to be persisted in case an osd restarts or the pg is migrated to another osd. the method it uses for that is just storing the hitset as another object but one that is internal to ceph. since they're internal, the objects are hidden from some commands like "rados ls" but still get counted as
Re: [ceph-users] Adding a host node back to ceph cluster
Maybe for the future: rpm {-V|--verify} [select-options] [verify-options] Verifying a package compares information about the installed files in the package with information about the files taken from the package metadata stored in the rpm database. Among other things, verifying compares the size, digest, permis‐ sions, type, owner and group of each file. Any discrepancies are displayed. Files that were not installed from the pack‐ age, for example, documentation files excluded on installation using the "--excludedocs" option, will be silently ignored. -Original Message- From: Geoffrey Rhodes [mailto:geoff...@rhodes.org.za] Sent: maandag 15 januari 2018 16:39 To: ceph-users@lists.ceph.com Subject: [ceph-users] Adding a host node back to ceph cluster Good day, I'm having an issue re-deploying a host back into my production ceph cluster. Due to some bad memory (picked up by a scrub) which has been replaced I felt the need to re-install the host to be sure no host files were damaged. Prior to decommissioning the host I set the crush weight's on each osd to 0. Once to osd's had flushed all data I stopped the daemon's. I then purged the osd's from the crushmap with "ceph osd purge". Followed by "ceph osd crush rm {host}" to remove the host bucket from the crush map. I also ran "ceph-deploy purge {host}" & "ceph-deploy purgedata {host}" from the management node. I then reinstalled the host and made the necessary config changes followed by the appropriate ceph-deploy commands (ceph-deploy install..., ceph-deploy admin..., ceph-deploy osd create...) to bring the host & it's osd's back into the cluster, - same as I would when adding a new host node to the cluster. Running ceph osd df tree shows the osd's however the host node is not displayed. Inspecting the crush map I see no host bucket has been created or any host's osd's listed. The osd's also did not start which explains the weight being 0 but I presume the osd's not starting isn't the only issue since the crush map lacks the newly installed host detail. Could anybody maybe tell me where I've gone wrong? I'm also assuming there shouldn't be an issue using the same host name again or do I manually add the host bucket and osd detail back into the crush map or should ceph-deploy not take care of that? Thanks OS: Ubuntu 16.04.3 LTS Ceph version: 12.2.1 / 12.2.2 - Luminous Kind regards Geoffrey Rhodes ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Luminous RGW Metadata Search
Finally, the issue that has haunted me for quite some time turned out to be a ceph.conf issue: I had osd_pool_default_pg_num = 100 osd_pool_default_pgp_num = 100 once I changed to osd_pool_default_pg_num = 32 osd_pool_default_pgp_num = 32 then no issue to start the second rgw process. No idea why 32 works but 100 doesn't. The debug output is useless and log files too. Just insane. Anyway, thanks. On Fri, Jan 12, 2018 at 7:25 PM, Yehuda Sadeh-Weinraubwrote: > The errors you're seeing there don't look like related to > elasticsearch. It's a generic radosgw related error that says that it > failed to reach the rados (ceph) backend. You can try bumping up the > messenger log (debug ms =1) and see if there's any hint in there. > > Yehuda > > On Fri, Jan 12, 2018 at 12:54 PM, Youzhong Yang > wrote: > > So I did the exact same thing using Kraken and the same set of VMs, no > > issue. What is the magic to make it work in Luminous? Anyone lucky > enough to > > have this RGW ElasticSearch working using Luminous? > > > > On Mon, Jan 8, 2018 at 10:26 AM, Youzhong Yang > wrote: > >> > >> Hi Yehuda, > >> > >> Thanks for replying. > >> > >> >radosgw failed to connect to your ceph cluster. Does the rados command > >> >with the same connection params work? > >> > >> I am not quite sure what to do by running rados command to test. > >> > >> So I tried again, could you please take a look and check what could have > >> gone wrong? > >> > >> Here are what I did: > >> > >> On ceph admin node, I removed installation on ceph-rgw1 and > >> ceph-rgw2, reinstalled rgw on ceph-rgw1, stoped rgw service, removed > all rgw > >> pools. Elasticsearch is running on ceph-rgw2 node on port 9200. > >> > >> ceph-deploy purge ceph-rgw1 > >> ceph-deploy purge ceph-rgw2 > >> ceph-deploy purgedata ceph-rgw2 > >> ceph-deploy purgedata ceph-rgw1 > >> ceph-deploy install --release luminous ceph-rgw1 > >> ceph-deploy admin ceph-rgw1 > >> ceph-deploy rgw create ceph-rgw1 > >> ssh ceph-rgw1 sudo systemctl stop ceph-rado...@rgw.ceph-rgw1 > >> rados rmpool default.rgw.log default.rgw.log > --yes-i-really-really-mean-it > >> rados rmpool default.rgw.meta default.rgw.meta > >> --yes-i-really-really-mean-it > >> rados rmpool default.rgw.control default.rgw.control > >> --yes-i-really-really-mean-it > >> rados rmpool .rgw.root .rgw.root --yes-i-really-really-mean-it > >> > >> On ceph-rgw1 node: > >> > >> export RGWHOST="ceph-rgw1" > >> export ELASTICHOST="ceph-rgw2" > >> export REALM="demo" > >> export ZONEGRP="zone1" > >> export ZONE1="zone1-a" > >> export ZONE2="zone1-b" > >> export SYNC_AKEY="$( cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 20 > | > >> head -n 1 )" > >> export SYNC_SKEY="$( cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 40 > | > >> head -n 1 )" > >> > >> radosgw-admin realm create --rgw-realm=${REALM} --default > >> radosgw-admin zonegroup create --rgw-realm=${REALM} > >> --rgw-zonegroup=${ZONEGRP} --endpoints=http://${RGWHOST}:8000 --master > >> --default > >> radosgw-admin zone create --rgw-realm=${REALM} > --rgw-zonegroup=${ZONEGRP} > >> --rgw-zone=${ZONE1} --endpoints=http://${RGWHOST}:8000 > >> --access-key=${SYNC_AKEY} --secret=${SYNC_SKEY} --master --default > >> radosgw-admin user create --uid=sync --display-name="zone sync" > >> --access-key=${SYNC_AKEY} --secret=${SYNC_SKEY} --system > >> radosgw-admin period update --commit > >> sudo systemctl start ceph-radosgw@rgw.${RGWHOST} > >> > >> radosgw-admin zone create --rgw-realm=${REALM} > --rgw-zonegroup=${ZONEGRP} > >> --rgw-zone=${ZONE2} --access-key=${SYNC_AKEY} --secret=${SYNC_SKEY} > >> --endpoints=http://${RGWHOST}:8002 > >> radosgw-admin zone modify --rgw-realm=${REALM} > --rgw-zonegroup=${ZONEGRP} > >> --rgw-zone=${ZONE2} --tier-type=elasticsearch > >> --tier-config=endpoint=http://${ELASTICHOST}:9200,num_ > replicas=1,num_shards=10 > >> radosgw-admin period update --commit > >> > >> sudo systemctl restart ceph-radosgw@rgw.${RGWHOST} > >> sudo radosgw --keyring /etc/ceph/ceph.client.admin.keyring -f > >> --rgw-zone=${ZONE2} --rgw-frontends="civetweb port=8002" > >> 2018-01-08 00:21:54.389432 7f0fe9cd2e80 -1 Couldn't init storage > provider > >> (RADOS) > >> > >> As you can see, starting rgw on port 8002 failed, but rgw on port > >> 8000 was started successfully. > >> Here are some more info which may be useful for diagnosis: > >> > >> $ cat /etc/ceph/ceph.conf > >> [global] > >> fsid = 3e5a32d4-e45e-48dd-a3c5-f6f28fef8edf > >> mon_initial_members = ceph-mon1, ceph-osd1, ceph-osd2, ceph-osd3 > >> mon_host = 172.30.212.226,172.30.212.227,172.30.212.228,172.30.212.250 > >> auth_cluster_required = cephx > >> auth_service_required = cephx > >> auth_client_required = cephx > >> osd_pool_default_size = 2 > >> osd_pool_default_min_size = 2 > >> osd_pool_default_pg_num = 100 > >> osd_pool_default_pgp_num = 100 > >> bluestore_compression_algorithm = zlib > >> bluestore_compression_mode =
Re: [ceph-users] slow requests on a specific osd
Hi Wes, On 15-1-2018 20:57, Wes Dillingham wrote: My understanding is that the exact same objects would move back to the OSD if weight went 1 -> 0 -> 1 given the same Cluster state and same object names, CRUSH is deterministic so that would be the almost certain result. Ok, thanks! So this would be a useless exercise. :-| Thanks very much for your feedback, Wes! MJ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] slow requests on a specific osd
My understanding is that the exact same objects would move back to the OSD if weight went 1 -> 0 -> 1 given the same Cluster state and same object names, CRUSH is deterministic so that would be the almost certain result. On Mon, Jan 15, 2018 at 2:46 PM, listswrote: > Hi Wes, > > On 15-1-2018 20:32, Wes Dillingham wrote: > >> I dont hear a lot of people discuss using xfs_fsr on OSDs and going over >> the mailing list history it seems to have been brought up very infrequently >> and never as a suggestion for regular maintenance. Perhaps its not needed. >> > True, it's just something we've always done on all our xfs filesystems, to > keep them speedy and snappy. I've disabled it, and then it doesn't happen. > > Perhaps I'll keep it disabled. > > But on this last question, about data distribution across OSDs: > > In that case, how about reweighting that osd.10 to "0", wait until >> all data has moved off osd.10, and then setting it back to "1". >> Would this result in *exactly* the same situation as before, or >> would it at least cause the data to have spread move better across >> the other OSDs? >> > > Would it work like that? Or would setting it back to "1" give me again the > same data on this OSD that we started with? > > Thanks for your comments, > > MJ > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Respectfully, Wes Dillingham wes_dilling...@harvard.edu Research Computing | Senior CyberInfrastructure Storage Engineer Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 204 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] slow requests on a specific osd
Hi Wes, On 15-1-2018 20:32, Wes Dillingham wrote: I dont hear a lot of people discuss using xfs_fsr on OSDs and going over the mailing list history it seems to have been brought up very infrequently and never as a suggestion for regular maintenance. Perhaps its not needed. True, it's just something we've always done on all our xfs filesystems, to keep them speedy and snappy. I've disabled it, and then it doesn't happen. Perhaps I'll keep it disabled. But on this last question, about data distribution across OSDs: In that case, how about reweighting that osd.10 to "0", wait until all data has moved off osd.10, and then setting it back to "1". Would this result in *exactly* the same situation as before, or would it at least cause the data to have spread move better across the other OSDs? Would it work like that? Or would setting it back to "1" give me again the same data on this OSD that we started with? Thanks for your comments, MJ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] slow requests on a specific osd
I dont hear a lot of people discuss using xfs_fsr on OSDs and going over the mailing list history it seems to have been brought up very infrequently and never as a suggestion for regular maintenance. Perhaps its not needed. One thing to consider trying, and to rule out something funky with the XFS filesystem on that particular OSD/drive would be to remove the OSD entirely from the cluster, reformat the disk, and then rebuild the OSD, putting a brand new XFS on the OSD. On Mon, Jan 15, 2018 at 7:36 AM, listswrote: > Hi, > > On our three-node 24 OSDs ceph 10.2.10 cluster, we have started seeing > slow requests on a specific OSD, during the the two-hour nightly xfs_fsr > run from 05:00 - 07:00. This started after we applied the meltdown patches. > > The specific osd.10 also has the highest space utilization of all OSDs > cluster-wide, with 45%, while the others are mostly around 40%. All OSDs > are the same 4TB platters with journal on ssd, all with weight 1. > > Smart info for osd.10 shows nothing interesting I think: > > Current Drive Temperature: 27 C >> Drive Trip Temperature:60 C >> >> Manufactured in week 04 of year 2016 >> Specified cycle count over device lifetime: 1 >> Accumulated start-stop cycles: 53 >> Specified load-unload count over device lifetime: 30 >> Accumulated load-unload cycles: 697 >> Elements in grown defect list: 0 >> >> Vendor (Seagate) cache information >> Blocks sent to initiator = 1933129649 >> Blocks received from initiator = 869206640 >> Blocks read from cache and sent to initiator = 2149311508 >> Number of read and write commands whose size <= segment size = 676356809 >> Number of read and write commands whose size > segment size = 12734900 >> >> Vendor (Seagate/Hitachi) factory information >> number of hours powered up = 13625.88 >> number of minutes until next internal SMART test = 8 >> > > Now my question: > Could it be that osd.10 just happens to contain some data chunks that are > heavily needed by the VMs around that time, and that the added load of an > xfs_fsr is simply too much for it to handle? > > In that case, how about reweighting that osd.10 to "0", wait until all > data has moved off osd.10, and then setting it back to "1". Would this > result in *exactly* the same situation as before, or would it at least > cause the data to have spread move better across the other OSDs? > > (with the idea that better data spread across OSDs brings also better > distribution of load between the OSDs) > > Or other ideas to check out? > > MJ > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Respectfully, Wes Dillingham wes_dilling...@harvard.edu Research Computing | Senior CyberInfrastructure Storage Engineer Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 204 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Switching a pool from EC to replicated online ?
You would need to create a new pool and migrate the data to that new pool. Replicated pool fronting an EC pool for RBD is a known-bad workload: http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#a-word-of-caution but others mileage may vary I suppose. In order to migrate you could do an RBD at a time, I would probably take a snapshot and than do an `rbd cp` operation from the poolA/snap to poolB/image If you are okay with the VMs being powered down you could do an `rbd mv` which doesnt support renames across pools, though I would prefer the cp method. You could also do a wholesale pool copy using `rados cppool` see http://ceph.com/geen-categorie/ceph-pool-migration/ best of luck. On Sat, Jan 13, 2018 at 6:37 PM, moftah moftahwrote: > Hi All, > is there a way to switch a pool that is set to be EC to being replicated > without the need to switch to new pool and migrate data ? > > I am getting poor results from EC and want to switch to replicated but i > already have customers on the system . > i using ceph 11 > the EC already have cache tier that is replicated > > Thanks > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Respectfully, Wes Dillingham wes_dilling...@harvard.edu Research Computing | Senior CyberInfrastructure Storage Engineer Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 204 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] subscribe to ceph-user list
___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Bug in RadosGW resharding? Hangs again...
Hi! After having a completely broken radosgw setup due to damaged buckets, I completely deleted all rgw pools, and started from scratch. But my problem is reproducible. After pushing ca. 10 objects into a bucket, the resharding process appears to start, and the bucket is now unresponsive. I just see lots of these messages in all rgw logs: 2018-01-15 16:57:45.108826 7fd1779b1700 0 block_while_resharding ERROR: bucket is still resharding, please retry 2018-01-15 16:57:45.119184 7fd1779b1700 0 NOTICE: resharding operation on bucket index detected, blocking 2018-01-15 16:57:45.260751 7fd1120e6700 0 block_while_resharding ERROR: bucket is still resharding, please retry 2018-01-15 16:57:45.280410 7fd1120e6700 0 NOTICE: resharding operation on bucket index detected, blocking 2018-01-15 16:57:45.300775 7fd15b979700 0 block_while_resharding ERROR: bucket is still resharding, please retry 2018-01-15 16:57:45.300971 7fd15b979700 0 WARNING: set_req_state_err err_no=2300 resorting to 500 2018-01-15 16:57:45.301042 7fd15b979700 0 ERROR: RESTFUL_IO(s)->complete_header() returned err=Input/output error One radosgw process and two OSDs housing the bucket index/metadata are still busy, but it seems to be stuck again. How long is this resharding process supposed to take? I cannot believe that an application is supposed to block for more than half an hour... I feel inclined to open a bug report, but I am yet unshure where the problem lies. Some information: * 3 RGW processes, 3 OSD hosts with 12 HDD OSDs and 6 SSD OSDs * Ceph 12.2.2 * Auto-Resharding on, Bucket Versioning & Lifecycle rule enabled. Thanks, Martin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Adding a host node back to ceph cluster
Good day, I'm having an issue re-deploying a host back into my production ceph cluster. Due to some bad memory (picked up by a scrub) which has been replaced I felt the need to re-install the host to be sure no host files were damaged. Prior to decommissioning the host I set the crush weight's on each osd to 0. Once to osd's had flushed all data I stopped the daemon's. I then purged the osd's from the crushmap with "ceph osd purge". Followed by "ceph osd crush rm {host}" to remove the host bucket from the crush map. I also ran "ceph-deploy purge {host}" & "ceph-deploy purgedata {host}" from the management node. I then reinstalled the host and made the necessary config changes followed by the appropriate ceph-deploy commands (ceph-deploy install..., ceph-deploy admin..., ceph-deploy osd create...) to bring the host & it's osd's back into the cluster, - same as I would when adding a new host node to the cluster. Running ceph osd df tree shows the osd's however the host node is not displayed. Inspecting the crush map I see no host bucket has been created or any host's osd's listed. The osd's also did not start which explains the weight being 0 but I presume the osd's not starting isn't the only issue since the crush map lacks the newly installed host detail. Could anybody maybe tell me where I've gone wrong? I'm also assuming there shouldn't be an issue using the same host name again or do I manually add the host bucket and osd detail back into the crush map or should ceph-deploy not take care of that? Thanks OS: Ubuntu 16.04.3 LTS Ceph version: 12.2.1 / 12.2.2 - Luminous Kind regards Geoffrey Rhodes ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw fails with "ERROR: failed to initialize watch: (34) Numerical result out of range"
i created the dump output but it looks very cryptic to me so i can't really make much sense of it. is there anything to look for in particular? i think i am going to read up on how interpret ltrace output... BR Alex - Ursprüngliche Mail - Von: "Brad Hubbard"An: "Alexander Peters" CC: "Ceph Users" Gesendet: Montag, 15. Januar 2018 03:09:53 Betreff: Re: [ceph-users] radosgw fails with "ERROR: failed to initialize watch: (34) Numerical result out of range" On Mon, Jan 15, 2018 at 11:38 AM, Brad Hubbard wrote: > On Mon, Jan 15, 2018 at 10:38 AM, Alexander Peters > wrote: >> Thanks for the reply - unfortunatly the link you send is behind a paywall so >> at least for now i can’t read it. > > That's why I provided the cause as laid out in that article (pgp num > pg > num). > > Do you have any settings in ceph.conf related to pg_num or pgp_num? > > If not, please add your details to http://tracker.ceph.com/issues/22351 Rados can return ERANGE (34) in multiple places so identifying where might be a big step towards working this out. $ ltrace -fo /tmp/ltrace.out /usr/bin/radosgw --cluster ceph --name client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d The objective is to find which function(s) return 34. > >> >> output of ceph osd dump shows that pgp num == pg num: >> >> [root@ctrl01 ~]# ceph osd dump >> epoch 142 >> fsid 0e2d841f-68fd-4629-9813-ab083e8c0f10 >> created 2017-12-20 23:04:59.781525 >> modified 2018-01-14 21:30:57.528682 >> flags sortbitwise,recovery_deletes,purged_snapdirs >> crush_version 6 >> full_ratio 0.95 >> backfillfull_ratio 0.9 >> nearfull_ratio 0.85 >> require_min_compat_client jewel >> min_compat_client jewel >> require_osd_release luminous >> pool 1 'glance' replicated size 3 min_size 2 crush_rule 0 object_hash >> rjenkins pg_num 64 pgp_num 64 last_change 119 flags hashpspool stripe_width >> 0 application rbd >> removed_snaps [1~3] >> pool 2 'cinder-2' replicated size 3 min_size 2 crush_rule 0 object_hash >> rjenkins pg_num 64 pgp_num 64 last_change 120 flags hashpspool stripe_width >> 0 application rbd >> removed_snaps [1~3] >> pool 3 'cinder-3' replicated size 3 min_size 2 crush_rule 0 object_hash >> rjenkins pg_num 64 pgp_num 64 last_change 121 flags hashpspool stripe_width >> 0 application rbd >> removed_snaps [1~3] >> pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash >> rjenkins pg_num 8 pgp_num 8 last_change 94 owner 18446744073709551615 flags >> hashpspool stripe_width 0 application rgw >> max_osd 3 >> osd.0 up in weight 1 up_from 82 up_thru 140 down_at 79 >> last_clean_interval [23,78) 10.16.0.11:6800/1795 10.16.0.11:6801/1795 >> 10.16.0.11:6802/1795 10.16.0.11:6803/1795 exists,up >> abe33844-6d98-4ede-81a8-a8bdc92dada8 >> osd.1 up in weight 1 up_from 73 up_thru 140 down_at 71 >> last_clean_interval [55,72) 10.16.0.13:6800/1756 10.16.0.13:6804/1001756 >> 10.16.0.13:6805/1001756 10.16.0.13:6806/1001756 exists,up >> 0dab9372-6ffe-4a23-a8b7-4edca3745a2a >> osd.2 up in weight 1 up_from 140 up_thru 140 down_at 133 >> last_clean_interval [31,132) 10.16.0.12:6800/1749 10.16.0.12:6801/1749 >> 10.16.0.12:6802/1749 10.16.0.12:6803/1749 exists,up >> 220bba17-8119-4035-9e43-5b8eaa27562f >> >> >> Am 15.01.2018 um 01:33 schrieb Brad Hubbard : >> >> On Mon, Jan 15, 2018 at 8:34 AM, Alexander Peters >> wrote: >> >> Hello >> >> I am currently experiencing a strange issue with my radosgw. It Fails to >> start and all tit says is: >> [root@ctrl02 ~]# /usr/bin/radosgw --cluster ceph --name >> client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d >> 2018-01-14 21:30:57.132007 7f44ddd18e00 0 deferred set uid:gid to 167:167 >> (ceph:ceph) >> 2018-01-14 21:30:57.132161 7f44ddd18e00 0 ceph version 12.2.2 >> (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process >> (unknown), pid 13928 >> 2018-01-14 21:30:57.556672 7f44ddd18e00 -1 ERROR: failed to initialize >> watch: (34) Numerical result out of range >> 2018-01-14 21:30:57.558752 7f44ddd18e00 -1 Couldn't init storage provider >> (RADOS) >> >> (when started via systemctl it writes the same lines to the logfile) >> >> strange thing is that it is working on an other env that was installed with >> the same set of ansible playbooks. >> OS is CentOS Linux release 7.4.1708 (Core) >> >> Ceph is up and running ( I am currently using it for storing volumes and >> images form Openstack ) >> >> Does anyone have an idea how to debug this? >> >> >> According to https://access.redhat.com/solutions/2778161 this can >> happen if your pgp num is higher than the pg num. >> >> Check "ceph osd dump" output for that possibility. >> >> >> Best Regards >> Alexander >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >>
[ceph-users] Error message in the logs: "meta sync: ERROR: failed to read mdlog info with (2) No such file or directory"
Hello, We've have a radosgw cluster(verion 12.2.2) in multisite mode. Our cluster is formed by one master realm, with one master zonegroup and two zones(which one is the master zone). We've followed the instructions of Ceph documentation to install and configure our cluster. The cluster works as expected, the objects and users are being replicated between the zones, but we always are getting this error message in our logs: 2018-01-15 12:25:00.119301 7f68868e5700 1 meta sync: ERROR: failed to read mdlog info with (2) No such file or directory Some details about the errors message(s): - They are only printed in the non-master zone log; - They are only printed when this "slave" zone try to sync the metadata info; - In each synchronization cycle of the metadata info, the number of this errors messages equals to the number of shards of metadata logs; - When we run the command "rados-admin mdlogs list", we've got a empty array as output in both zones; - The output of "rados-admin sync status" says every is ok and synced, which is true, despite the mdlog error messages in log. Anyone got this same problem? And how to fix it. I've tried and failed to many times to fix it. -- Victor Flávio de Oliveira Santos Fullstack Developer/DevOps http://victorflavio.me Twitter: @victorflavio Skype: victorflavio.oliveira Github: https://github.com/victorflavio Telefone/Phone: +55 62 81616477 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] slow requests on a specific osd
Hi, On our three-node 24 OSDs ceph 10.2.10 cluster, we have started seeing slow requests on a specific OSD, during the the two-hour nightly xfs_fsr run from 05:00 - 07:00. This started after we applied the meltdown patches. The specific osd.10 also has the highest space utilization of all OSDs cluster-wide, with 45%, while the others are mostly around 40%. All OSDs are the same 4TB platters with journal on ssd, all with weight 1. Smart info for osd.10 shows nothing interesting I think: Current Drive Temperature: 27 C Drive Trip Temperature:60 C Manufactured in week 04 of year 2016 Specified cycle count over device lifetime: 1 Accumulated start-stop cycles: 53 Specified load-unload count over device lifetime: 30 Accumulated load-unload cycles: 697 Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 1933129649 Blocks received from initiator = 869206640 Blocks read from cache and sent to initiator = 2149311508 Number of read and write commands whose size <= segment size = 676356809 Number of read and write commands whose size > segment size = 12734900 Vendor (Seagate/Hitachi) factory information number of hours powered up = 13625.88 number of minutes until next internal SMART test = 8 Now my question: Could it be that osd.10 just happens to contain some data chunks that are heavily needed by the VMs around that time, and that the added load of an xfs_fsr is simply too much for it to handle? In that case, how about reweighting that osd.10 to "0", wait until all data has moved off osd.10, and then setting it back to "1". Would this result in *exactly* the same situation as before, or would it at least cause the data to have spread move better across the other OSDs? (with the idea that better data spread across OSDs brings also better distribution of load between the OSDs) Or other ideas to check out? MJ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com