[ceph-users] New Blue Jean's Meeting ID for Performance and Testing Meetings
Hey all, I have updated the testing and performance meetings with a new call number and meeting ID to help with future recordings. The events should have the information you need, but just in case: US and Canada: 408-915-6466 See all numbers: https://www.redhat.com/en/conference-numbers Enter Meeting ID: 908675367 Thanks! -- Mike Perez (thingee) pgp6Yc3gzRyO4.pgp Description: PGP signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] omap vs. xattr in librados
Ok, that’s good to know. I was planning on using an EC pool. Maybe I'll store some of the larger kv pairs in their own objects or move the metadata into it's own replicated pool entirely. If the storage mechanism is the same, is there a reason xattrs are supported and omap is not? (Or is there some hidden cost to storing kv pairs in an EC pool I’m unaware of, e.g., does the kv data get replicated across all OSDs being used for a PG or something?) Thanks, Ben On Tue, Sep 11, 2018 at 1:46 PM Patrick Donnelly wrote: > On Tue, Sep 11, 2018 at 12:43 PM, Benjamin Cherian > wrote: > > On Tue, Sep 11, 2018 at 10:44 AM Gregory Farnum > wrote: > >> > >> > >> In general, if the key-value storage is of unpredictable or non-trivial > >> size, you should use omap. > >> > >> At the bottom layer where the data is actually stored, they're likely to > >> be in the same places (if using BlueStore, they are the same — in > FileStore, > >> a rados xattr *might* be in the local FS xattrs, or it might not). It is > >> somewhat more likely that something stored in an xattr will get pulled > into > >> memory at the same time as the object's internal metadata, but that only > >> happens if it's quite small (think the xfs or ext4 xattr rules). > > > > > > Based on this description, if I'm planning on using Bluestore, there is > no > > particular reason to ever prefer using xattrs over omap (outside of ease > of > > use in the API), correct? > > You may prefer xattrs on bluestore if the metadata is small and you > may need to store the xattrs on an EC pool. omap is not supported on > ecpools. > > -- > Patrick Donnelly > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Adding node efficient data move.
> When adding a node and I increment the crush weight like this. I have > the most efficient data transfer to the 4th node? > > sudo -u ceph ceph osd crush reweight osd.23 1 > sudo -u ceph ceph osd crush reweight osd.24 1 > sudo -u ceph ceph osd crush reweight osd.25 1 > sudo -u ceph ceph osd crush reweight osd.26 1 > sudo -u ceph ceph osd crush reweight osd.27 1 > sudo -u ceph ceph osd crush reweight osd.28 1 > sudo -u ceph ceph osd crush reweight osd.29 1 > > And then after recovery > > sudo -u ceph ceph osd crush reweight osd.23 2 I'm not sure if you're asking for the most *efficient* way to add capacity, or the least *impactful*. The most *efficient* would be to have the new OSDs start out at their full CRUSH weight. This way data only has to move once. However the overhead of that much movement can be quite significant, especially if I correctly read that you're expanding the size of the cluster by 33%. What I prefer to do (on replicated clusters) is to use this script: https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-gentle-reweight I set the CRUSH weights to 0 then run the script like ceph-gentle-reweight -o -b 10 -d 0.01 -t 3.48169 -i 10 -r | tee -a /var/tmp/upweight.log Note that I disable measure_latency() out of paranoia. This is less *efficient* in that some data ends up being moved more than once, and the elapsed time to complete is longer, but has the advantage of less impact. It also allows one to quickly stop data movement if a drive/HBA/server/network issue causes difficulties. Small steps means that each completes quickly. I also set osd_max_backfills = 1 osd_recovery_max_active = 1 osd_recovery_op_priority = 1 osd_recovery_max_single_start = 1 osd_scrub_during_recovery = false to additionally limit the impact of data movement on client operations. YMMV. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph Day at University of Santa Cruz - September 19
Hey all, Just a reminder that Ceph Day at UCSC Silicon Valley campus is coming this September 19. This is a great opportunity to hear various use cases around Ceph, and have discussions with various contributors in the community. Potentially we'll be hearing a presentation from the university that helped Sage with his research to start it all, and how Ceph enables Genomic research at the campus today! Registration is up and the schedule is posted: https://ceph.com/cephdays/ceph-day-silicon-valley-university-santa-cruz-silicon- +valley-campus/: -- Mike Perez (thingee) pgpo2CiOqgXaL.pgp Description: PGP signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph Day in Berlin - November 12 - CFP now open
Hey all, The call for presentations is now open for Ceph Day in Berlin November 12! https://ceph.formstack.com/forms/ceph_day_berlin_cfp Deadline is October 5th 11:59 UTC Registration and other information for the event: https://ceph.com/cephdays/ceph-day-berlin/ We are happy to be working with the OpenStack Foundation in providing a Ceph Day at the same venue that the OpenStack Summit will be taking place. You do need to purchase a Ceph Day pass even if you have a Full access pass for the OpenStack summit. -- Mike Perez (thingee) pgpVgCtKU6Hay.pgp Description: PGP signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Bluestore DB size and onode count
On 9/10/2018 11:39 PM, Nick Fisk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark Nelson Sent: 10 September 2018 18:27 To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Bluestore DB size and onode count On 09/10/2018 12:22 PM, Igor Fedotov wrote: Hi Nick. On 9/10/2018 1:30 PM, Nick Fisk wrote: If anybody has 5 minutes could they just clarify a couple of things for me 1. onode count, should this be equal to the number of objects stored on the OSD? Through reading several posts, there seems to be a general indication that this is the case, but looking at my OSD's the maths don't work. onode_count is the number of onodes in the cache, not the total number of onodes at an OSD. Hence the difference... Ok, thanks, that makes sense. I assume there isn't actually a counter which gives you the total number of objects on an OSD then? IIRC "bin/ceph daemon osd.1 calc_objectstore_db_histogram" might report what you need, see "num_onodes" field in the report.. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] omap vs. xattr in librados
On Tue, Sep 11, 2018 at 12:43 PM, Benjamin Cherian wrote: > On Tue, Sep 11, 2018 at 10:44 AM Gregory Farnum wrote: >> >> >> In general, if the key-value storage is of unpredictable or non-trivial >> size, you should use omap. >> >> At the bottom layer where the data is actually stored, they're likely to >> be in the same places (if using BlueStore, they are the same — in FileStore, >> a rados xattr *might* be in the local FS xattrs, or it might not). It is >> somewhat more likely that something stored in an xattr will get pulled into >> memory at the same time as the object's internal metadata, but that only >> happens if it's quite small (think the xfs or ext4 xattr rules). > > > Based on this description, if I'm planning on using Bluestore, there is no > particular reason to ever prefer using xattrs over omap (outside of ease of > use in the API), correct? You may prefer xattrs on bluestore if the metadata is small and you may need to store the xattrs on an EC pool. omap is not supported on ecpools. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] omap vs. xattr in librados
Yeah that’s basically the case. RADOS support for xattrs significantly precedes the introduction of the omap concept and is perfectly acceptable to use but also kind of vestigial at this point. :) On Tue, Sep 11, 2018 at 12:43 PM Benjamin Cherian < benjamin.cher...@gmail.com> wrote: > On Tue, Sep 11, 2018 at 10:44 AM Gregory Farnum > wrote: > >> >> In general, if the key-value storage is of unpredictable or non-trivial >> size, you should use omap. >> >> At the bottom layer where the data is actually stored, they're likely to >> be in the same places (if using BlueStore, they are the same — in >> FileStore, a rados xattr *might* be in the local FS xattrs, or it might >> not). It is somewhat more likely that something stored in an xattr will get >> pulled into memory at the same time as the object's internal metadata, but >> that only happens if it's quite small (think the xfs or ext4 xattr rules). >> > > Based on this description, if I'm planning on using Bluestore, there is no > particular reason to ever prefer using xattrs over omap (outside of ease of > use in the API), correct? > > Thanks, > Ben > > On Tue, Sep 11, 2018 at 10:44 AM Gregory Farnum > wrote: > >> On Tue, Sep 11, 2018 at 7:48 AM Benjamin Cherian < >> benjamin.cher...@gmail.com> wrote: >> >>> Hi, >>> >>> I'm interested in writing a relatively simple application that would use >>> librados for storage. Are there recommendations for when to use the omap as >>> opposed to an xattr? In theory, you could use either a set of xattrs or an >>> omap as a kv store associated with a specific object. Are there >>> recommendations for what kind of data xattrs and omaps are intended to >>> store? >>> >> >> In general, if the key-value storage is of unpredictable or non-trivial >> size, you should use omap. >> >> At the bottom layer where the data is actually stored, they're likely to >> be in the same places (if using BlueStore, they are the same — in >> FileStore, a rados xattr *might* be in the local FS xattrs, or it might >> not). It is somewhat more likely that something stored in an xattr will get >> pulled into memory at the same time as the object's internal metadata, but >> that only happens if it's quite small (think the xfs or ext4 xattr rules). >> >> If you have 250KB of key-value data, omap is definitely the place for it. >> -Greg >> >> >>> Just for background, I have some metadata i'd like to associate with >>> each object (total size of all kv pairs in object metadata is ~250k, some >>> values a few bytes, while others are 10-20k.) The object will store actual >>> data (a relatively large FP array) as a binary blob (~3-5 MB). >>> >>> >>> Thanks, >>> Ben >>> -- >>> Regards, >>> >>> Benjamin Cherian >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] omap vs. xattr in librados
On Tue, Sep 11, 2018 at 10:44 AM Gregory Farnum wrote: > > In general, if the key-value storage is of unpredictable or non-trivial > size, you should use omap. > > At the bottom layer where the data is actually stored, they're likely to > be in the same places (if using BlueStore, they are the same — in > FileStore, a rados xattr *might* be in the local FS xattrs, or it might > not). It is somewhat more likely that something stored in an xattr will get > pulled into memory at the same time as the object's internal metadata, but > that only happens if it's quite small (think the xfs or ext4 xattr rules). > Based on this description, if I'm planning on using Bluestore, there is no particular reason to ever prefer using xattrs over omap (outside of ease of use in the API), correct? Thanks, Ben On Tue, Sep 11, 2018 at 10:44 AM Gregory Farnum wrote: > On Tue, Sep 11, 2018 at 7:48 AM Benjamin Cherian < > benjamin.cher...@gmail.com> wrote: > >> Hi, >> >> I'm interested in writing a relatively simple application that would use >> librados for storage. Are there recommendations for when to use the omap as >> opposed to an xattr? In theory, you could use either a set of xattrs or an >> omap as a kv store associated with a specific object. Are there >> recommendations for what kind of data xattrs and omaps are intended to >> store? >> > > In general, if the key-value storage is of unpredictable or non-trivial > size, you should use omap. > > At the bottom layer where the data is actually stored, they're likely to > be in the same places (if using BlueStore, they are the same — in > FileStore, a rados xattr *might* be in the local FS xattrs, or it might > not). It is somewhat more likely that something stored in an xattr will get > pulled into memory at the same time as the object's internal metadata, but > that only happens if it's quite small (think the xfs or ext4 xattr rules). > > If you have 250KB of key-value data, omap is definitely the place for it. > -Greg > > >> Just for background, I have some metadata i'd like to associate with each >> object (total size of all kv pairs in object metadata is ~250k, some values >> a few bytes, while others are 10-20k.) The object will store actual data (a >> relatively large FP array) as a binary blob (~3-5 MB). >> >> >> Thanks, >> Ben >> -- >> Regards, >> >> Benjamin Cherian >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS does not always failover to hot standby
The monitors can't replace an MDS if they're trying to agree amongst themselves. Note the repeated monitor elections happening at the same time. A monitor election is *often* transparent to clients, since they and the OSDs only need the monitors when something happens in the cluster. But when you collide losing an MDS and losing monitors at the same time, the clients can't do their MDS requests but the monitors can't do a fast failover of the MDS because the monitors are trying to establish a quorum. (There's also the time sync warnings going on; those may or may not be causing issues here but certainly aren't helping anything!) -Greg On Fri, Sep 7, 2018 at 7:24 PM Bryan Henderson wrote: > > It's mds_beacon_grace. Set that on the monitor to control the > replacement of > > laggy MDS daemons, > > Sounds like William's issue is something else. William shuts down MDS 2 > and > MON 4 simultaneously. The log shows that some time later (we don't know > how > long), MON 3 detects that MDS 2 is gone ("MDS_ALL_DOWN"), but does nothing > about it until 30 seconds later, which happens to be when MDS 2 and MON 4 > come > back. At that point, MON 3 reports that the rank has been reassigned to > MDS > 1. > > 'mds_beacon_grace' determines when a monitor declares MDS_ALL_DOWN, right? > > I think if things are working as designed, the log should show MON 3 > reassigning the rank to MDS 1 immediately after it reports MDS 2 is gone. > > > From the original post: > > 2018-08-25 03:30:02.936554 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 > 55 : cluster [ERR] Health check failed: 1 filesystem is offline > (MDS_ALL_DOWN) > 2018-08-25 03:30:04.235703 mon.dub-sitv-ceph-05 mon.2 10.18.186.208:6789/0 > 226 : cluster [INF] mon.dub-sitv-ceph-05 calling monitor election > 2018-08-25 03:30:04.238672 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 > 56 : cluster [INF] mon.dub-sitv-ceph-03 calling monitor election > 2018-08-25 03:30:09.242595 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 > 57 : cluster [INF] mon.dub-sitv-ceph-03 is new leader, mons > dub-sitv-ceph-03,dub-sitv-ceph-05 in quorum (ranks 0,2) > 2018-08-25 03:30:09.252804 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 > 62 : cluster [WRN] Health check failed: 1/3 mons down, quorum > dub-sitv-ceph-03,dub-sitv-ceph-05 (MON_DOWN) > 2018-08-25 03:30:09.258693 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 > 63 : cluster [WRN] overall HEALTH_WARN 2 osds down; 2 hosts (2 osds) down; > 1/3 mons down, quorum dub-sitv-ceph-03,dub-sitv-ceph-05 > 2018-08-25 03:30:10.254162 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 > 64 : cluster [WRN] Health check failed: Reduced data availability: 2 pgs > inactive, 115 pgs peering (PG_AVAILABILITY) > 2018-08-25 03:30:12.429145 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 > 66 : cluster [WRN] Health check failed: Degraded data redundancy: 712/2504 > objects degraded (28.435%), 86 pgs degraded (PG_DEGRADED) > 2018-08-25 03:30:16.137408 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 > 67 : cluster [WRN] Health check update: Reduced data availability: 1 pg > inactive, 69 pgs peering (PG_AVAILABILITY) > 2018-08-25 03:30:17.193322 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 > 68 : cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data > availability: 1 pg inactive, 69 pgs peering) > 2018-08-25 03:30:18.432043 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 > 69 : cluster [WRN] Health check update: Degraded data redundancy: 1286/2572 > objects degraded (50.000%), 166 pgs degraded (PG_DEGRADED) > 2018-08-25 03:30:26.139491 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 > 71 : cluster [WRN] Health check update: Degraded data redundancy: 1292/2584 > objects degraded (50.000%), 166 pgs degraded (PG_DEGRADED) > 2018-08-25 03:30:31.355321 mon.dub-sitv-ceph-04 mon.1 10.18.53.155:6789/0 > 1 : cluster [INF] mon.dub-sitv-ceph-04 calling monitor election > 2018-08-25 03:30:31.371519 mon.dub-sitv-ceph-04 mon.1 10.18.53.155:6789/0 > 2 : cluster [WRN] message from mon.0 was stamped 0.817433s in the future, > clocks not synchronized > 2018-08-25 03:30:32.175677 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 > 72 : cluster [INF] mon.dub-sitv-ceph-03 calling monitor election > 2018-08-25 03:30:32.175864 mon.dub-sitv-ceph-05 mon.2 10.18.186.208:6789/0 > 227 : cluster [INF] mon.dub-sitv-ceph-05 calling monitor election > 2018-08-25 03:30:32.180615 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 > 73 : cluster [INF] mon.dub-sitv-ceph-03 is new leader, mons > dub-sitv-ceph-03,dub-sitv-ceph-04,dub-sitv-ceph-05 in quorum (ranks 0,1,2) > 2018-08-25 03:30:32.189593 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 > 78 : cluster [INF] Health check cleared: MON_DOWN (was: 1/3 mons down, > quorum dub-sitv-ceph-03,dub-sitv-ceph-05) > 2018-08-25 03:30:32.190820 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 > 79 : cluster [WRN] mon.1 10.18.53.155:6789/0 clock skew 0.811318s > max > 0.05s > 2018-08-25 03:30:32.194280 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 > 80 : cl
Re: [ceph-users] omap vs. xattr in librados
On Tue, Sep 11, 2018 at 7:48 AM Benjamin Cherian wrote: > Hi, > > I'm interested in writing a relatively simple application that would use > librados for storage. Are there recommendations for when to use the omap as > opposed to an xattr? In theory, you could use either a set of xattrs or an > omap as a kv store associated with a specific object. Are there > recommendations for what kind of data xattrs and omaps are intended to > store? > In general, if the key-value storage is of unpredictable or non-trivial size, you should use omap. At the bottom layer where the data is actually stored, they're likely to be in the same places (if using BlueStore, they are the same — in FileStore, a rados xattr *might* be in the local FS xattrs, or it might not). It is somewhat more likely that something stored in an xattr will get pulled into memory at the same time as the object's internal metadata, but that only happens if it's quite small (think the xfs or ext4 xattr rules). If you have 250KB of key-value data, omap is definitely the place for it. -Greg > Just for background, I have some metadata i'd like to associate with each > object (total size of all kv pairs in object metadata is ~250k, some values > a few bytes, while others are 10-20k.) The object will store actual data (a > relatively large FP array) as a binary blob (~3-5 MB). > > > Thanks, > Ben > -- > Regards, > > Benjamin Cherian > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph balancer "Error EAGAIN: compat weight-set not available"
ceph balancer status ceph config-key dump | grep balancer ceph osd dump | grep min_compat_client ceph osd crush dump | grep straw ceph osd crush dump | grep profile ceph features You didn't mention it, but based on your error and my experiences over the last week getting the balancer working, you're trying to use crush-compat. Running all of those commands should give you the information you need to fix everything up for the balancer to work. With the first 2, you need to make sure that you have your mode set properly as well as double check any other settings you're going for with the balancer. Everything else stems off of a requirement of having your buckets being straw2 instead of straw for the balancer to work. I'm sure you'll notice that your cluster has older compatibility requirements and crush profile than hammer and that your buckets are using the straw algorithm instead of straw2. Running [1] these commands will fix up your cluster so that you are now using straw2 and have your minimum required clients and profile to hammer which is the ceph release that introduced straw2. Before running these commands make sure that the output of `ceph features` does not show any firefly clients connected to your cluster. If you do have any, it is likely due to outdated kernels or clients installed without the upstream ceph repo and just using the version of ceph in the canonical repos or similar for your distribution. If you do happen to have any firefly, or older, clients connected to your cluster, then you need to update those clients before running the commands. There will be some data movement, but I didn't see more than ~5% data movement on any of the 8 clusters I ran them on. That data movement will be higher if you do not have a standard size of OSD drive in your clusters like some 2TB disks and some 8TB disks across your cluster will probably cause some more data movement then I saw, but it should still be within reason. This data movement is because straw2 can handle that situation better than straw did and will allow your cluster to better balance itself even without the balancer module. If you don't even have any hammer clients, then go ahead and set the min-compat-client to jewel as well as the crush tunables to jewel. Setting them to Jewel will cause a bit more data movement, but again for good reasons. The tl;dr of your error is that your cluster has been running since at least hammer which started with older default settings than are required by the balancer module. As you've updated your cluster you didn't allow it to utilize new features in the backend by leaving your crush tunables alone during all of the upgrades to new versions. To learn more about the changes to the crush tunables you can check out the ceph wiki [2] here. [1] ceph osd set-require-min-compat-client hammer ceph osd crush set-all-straw-buckets-to-straw2 ceph osd crush tunables hammer [2] http://docs.ceph.com/docs/master/rados/operations/crush-map/ On Tue, Sep 11, 2018 at 6:24 AM Marc Roos wrote: > > I am new, with using the balancer, I think this should generated a plan > not? Do not get what this error is about. > > > [@c01 ~]# ceph balancer optimize balancer-test.plan > Error EAGAIN: compat weight-set not available > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] omap vs. xattr in librados
Hi, I'm interested in writing a relatively simple application that would use librados for storage. Are there recommendations for when to use the omap as opposed to an xattr? In theory, you could use either a set of xattrs or an omap as a kv store associated with a specific object. Are there recommendations for what kind of data xattrs and omaps are intended to store? Just for background, I have some metadata i'd like to associate with each object (total size of all kv pairs in object metadata is ~250k, some values a few bytes, while others are 10-20k.) The object will store actual data (a relatively large FP array) as a binary blob (~3-5 MB). Thanks, Ben -- Regards, Benjamin Cherian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds_cache_memory_limit
We set it to 50% -- there seems to be some mystery inflation and possibly a small leak (in luminous, at least). -- dan On Tue, Sep 11, 2018 at 4:04 PM marc-antoine desrochers wrote: > > Hi, > > > > Is there any recommendation for the mds_cache_memory_limit ? Like a % of the > total ram or something ? > > > > Thanks. > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] mds_cache_memory_limit
Hi, Is there any recommendation for the mds_cache_memory_limit ? Like a % of the total ram or something ? Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Get supported features of all connected clients
On Tue, Sep 11, 2018 at 1:00 PM Tobias Florek wrote: > > Hi! > > I have a cluster serving RBDs and CephFS that has a big number of > clients I don't control. I want to know what feature flags I can safely > set without locking out clients. Is there a command analogous to `ceph > versions` that shows the connected clients and their feature support? Yes, "ceph features". https://ceph.com/community/new-luminous-upgrade-complete/ Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Get supported features of all connected clients
$ sudo ceph tell mds.0 client ls ? Le 11/09/2018 à 13:00, Tobias Florek a écrit : Hi! I have a cluster serving RBDs and CephFS that has a big number of clients I don't control. I want to know what feature flags I can safely set without locking out clients. Is there a command analogous to `ceph versions` that shows the connected clients and their feature support? If so, I would simply monitor this output for a few days and would have a pretty good estimate on how Ceph is used. If there is no such command, can I get that information from the Mon's, OSD's, or MDS's logs? Cheers, Tobias Florek ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Get supported features of all connected clients
Hi! Answering to myself. There is `ceph features`, but I am missing something. I have (on my Mimic-only test cluster). # ceph features { "mon": [ { "features": "0x3ffddff8ffa4fffb", "release": "luminous", "num": 3 } ], "mds": [ { "features": "0x3ffddff8ffa4fffb", "release": "luminous", "num": 3 } ], "osd": [ { "features": "0x3ffddff8ffa4fffb", "release": "luminous", "num": 10 } ], "client": [ { "features": "0x7018fb86aa42ada", "release": "jewel", "num": 21 }, { "features": "0x3ffddff8ffa4fffb", "release": "luminous", "num": 4 } ], "mgr": [ { "features": "0x3ffddff8ffa4fffb", "release": "luminous", "num": 3 } ] } This does not make sense to me. Is there no Mimic client connected even though I have e.g. an active Mimic ceph-fuse mount? Cheers, Tobias Florek signature.asc Description: signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Get supported features of all connected clients
Hi! I have a cluster serving RBDs and CephFS that has a big number of clients I don't control. I want to know what feature flags I can safely set without locking out clients. Is there a command analogous to `ceph versions` that shows the connected clients and their feature support? If so, I would simply monitor this output for a few days and would have a pretty good estimate on how Ceph is used. If there is no such command, can I get that information from the Mon's, OSD's, or MDS's logs? Cheers, Tobias Florek signature.asc Description: signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph balancer "Error EAGAIN: compat weight-set not available"
I am new, with using the balancer, I think this should generated a plan not? Do not get what this error is about. [@c01 ~]# ceph balancer optimize balancer-test.plan Error EAGAIN: compat weight-set not available ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v12.2.8 Luminous released
This is a friendly reminder that multi-active MDS clusters must be reduced to only 1 active during upgrades [1]. In the case of v12.2.8, the CEPH_MDS_PROTOCOL version has changed so if you try to upgrade one MDS it will get stuck in the resolve state, logging: conn(0x55e3d9671000 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply connect protocol version mismatch, my 31 != 30 Cheers, Dan [1] http://docs.ceph.com/docs/luminous/cephfs/upgrading/ On Wed, Sep 5, 2018 at 4:20 PM Dan van der Ster wrote: > > Thanks for the release! > > We've updated some test clusters (rbd, cephfs) and it looks good so far. > > -- dan > > > On Tue, Sep 4, 2018 at 6:30 PM Abhishek Lekshmanan wrote: > > > > > > We're glad to announce the next point release in the Luminous v12.2.X > > stable release series. This release contains a range of bugfixes and > > stability improvements across all the components of ceph. For detailed > > release notes with links to tracker issues and pull requests, refer to > > the blog post at http://ceph.com/releases/v12-2-8-released/ > > > > Upgrade Notes from previous luminous releases > > - > > > > When upgrading from v12.2.5 or v12.2.6 please note that upgrade caveats from > > 12.2.5 will apply to any _newer_ luminous version including 12.2.8. Please > > read > > the notes at > > https://ceph.com/releases/12-2-7-luminous-released/#upgrading-from-v12-2-6 > > > > For the cluster that installed the broken 12.2.6 release, 12.2.7 fixed the > > regression and introduced a workaround option `osd distrust data digest = > > true`, > > but 12.2.7 clusters still generated health warnings like :: > > > > [ERR] 11.288 shard 207: soid > > 11:1155c332:::rbd_data.207dce238e1f29.0527:head data_digest > > 0xc8997a5b != data_digest 0x2ca15853 > > > > > > 12.2.8 improves the deep scrub code to automatically repair these > > inconsistencies. Once the entire cluster has been upgraded and then fully > > deep > > scrubbed, and all such inconsistencies are resolved; it will be safe to > > disable > > the `osd distrust data digest = true` workaround option. > > > > Changelog > > - > > * bluestore: set correctly shard for existed Collection (issue#24761, > > pr#22860, Jianpeng Ma) > > * build/ops: Boost system library is no longer required to compile and link > > example librados program (issue#25054, pr#23202, Nathan Cutler) > > * build/ops: Bring back diff -y for non-FreeBSD (issue#24396, issue#21664, > > pr#22848, Sage Weil, David Zafman) > > * build/ops: install-deps.sh fails on newest openSUSE Leap (issue#25064, > > pr#23179, Kyr Shatskyy) > > * build/ops: Mimic build fails with -DWITH_RADOSGW=0 (issue#24437, > > pr#22864, Dan Mick) > > * build/ops: order rbdmap.service before remote-fs-pre.target (issue#24713, > > pr#22844, Ilya Dryomov) > > * build/ops: rpm: silence osd block chown (issue#25152, pr#23313, Dan van > > der Ster) > > * cephfs-journal-tool: Fix purging when importing an zero-length journal > > (issue#24239, pr#22980, yupeng chen, zhongyan gu) > > * cephfs: MDSMonitor: uncommitted state exposed to clients/mdss > > (issue#23768, pr#23013, Patrick Donnelly) > > * ceph-fuse mount failed because no mds (issue#22205, pr#22895, liyan) > > * ceph-volume add a __release__ string, to help version-conditional calls > > (issue#25170, pr#23331, Alfredo Deza) > > * ceph-volume: adds test for `ceph-volume lvm list /dev/sda` (issue#24784, > > issue#24957, pr#23350, Andrew Schoen) > > * ceph-volume: do not use stdin in luminous (issue#25173, issue#23260, > > pr#23367, Alfredo Deza) > > * ceph-volume enable the ceph-osd during lvm activation (issue#24152, > > pr#23394, Dan van der Ster, Alfredo Deza) > > * ceph-volume expand on the LVM API to create multiple LVs at different > > sizes (issue#24020, pr#23395, Alfredo Deza) > > * ceph-volume lvm.activate conditional mon-config on prime-osd-dir > > (issue#25216, pr#23397, Alfredo Deza) > > * ceph-volume lvm.batch remove non-existent sys_api property (issue#34310, > > pr#23811, Alfredo Deza) > > * ceph-volume lvm.listing only include devices if they exist (issue#24952, > > pr#23150, Alfredo Deza) > > * ceph-volume: process.call with stdin in Python 3 fix (issue#24993, > > pr#23238, Alfredo Deza) > > * ceph-volume: PVolumes.get() should return one PV when using name or uuid > > (issue#24784, pr#23329, Andrew Schoen) > > * ceph-volume: refuse to zap mapper devices (issue#24504, pr#23374, Andrew > > Schoen) > > * ceph-volume: tests.functional inherit SSH_ARGS from ansible (issue#34311, > > pr#23813, Alfredo Deza) > > * ceph-volume tests/functional run lvm list after OSD provisioning > > (issue#24961, pr#23147, Alfredo Deza) > > * ceph-volume: unmount lvs correctly before zapping (issue#24796, pr#23128, > > Andrew Schoen) > > * ceph-volume: update batch documentation to explain filestore strategies > > (issue#34309, pr#23825, Alfredo Deza) > > * c