Re: [ceph-users] krbd / kcephfs - jewel client features question
Hello llya and paul, Thanks for your reply. Yes, you are right, 0x7fddff8ee8cbffb is come from kernel upgrade, it's reported by a docker container (digitalocean/ceph_exporter) use for ceph monitoring. Now upmap mode is enabled, client features: "client": { "group": { "features": "0x7010fb86aa42ada", "release": "jewel", "num": 6 }, "group": { "features": "0x3ffddff8eea4fffb", "release": "luminous", "num": 6 } } Thanks again. Ilya Dryomov 于2019年10月21日周一 下午6:38写道: > On Sat, Oct 19, 2019 at 2:00 PM Lei Liu wrote: > > > > Hello llya, > > > > After updated client kernel version to 3.10.0-862 , ceph features shows: > > > > "client": { > > "group": { > > "features": "0x7010fb86aa42ada", > > "release": "jewel", > > "num": 5 > > }, > > "group": { > > "features": "0x7fddff8ee8cbffb", > > "release": "jewel", > > "num": 1 > > }, > > "group": { > > "features": "0x3ffddff8eea4fffb", > > "release": "luminous", > > "num": 6 > > }, > > "group": { > > "features": "0x3ffddff8eeacfffb", > > "release": "luminous", > > "num": 1 > > } > > } > > > > both 0x7fddff8ee8cbffb and 0x7010fb86aa42ada are reported by new kernel > client. > > > > Is it now possible to force set-require-min-compat-client to be > luminous, if not how to fix it? > > No, you haven't upgraded the one with features 0x7fddff8ee8cbffb (or > rather it looks like you have upgraded it from 0x7fddff8ee84bffb, but > to a version that is still too old). > > What exactly did you do on that machine? That change doesn't look like > it came from a kernel upgrade. What is the output of "uname -a" there? > > Thanks, > > Ilya > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] krbd / kcephfs - jewel client features question
Hello llya, After updated client kernel version to 3.10.0-862 , ceph features shows: "client": { "group": { "features": "0x7010fb86aa42ada", "release": "jewel", "num": 5 }, "group": { "features": "0x7fddff8ee8cbffb", "release": "jewel", "num": 1 }, "group": { "features": "0x3ffddff8eea4fffb", "release": "luminous", "num": 6 }, "group": { "features": "0x3ffddff8eeacfffb", "release": "luminous", "num": 1 } } both 0x7fddff8ee8cbffb and 0x7010fb86aa42ada are reported by new kernel client. Is it now possible to force set-require-min-compat-client to be luminous, if not how to fix it? Thanks Ilya Dryomov 于2019年10月17日周四 下午9:45写道: > On Thu, Oct 17, 2019 at 3:38 PM Lei Liu wrote: > > > > Hi Cephers, > > > > We have some ceph clusters in 12.2.x version, now we want to use upmap > balancer,but when i set set-require-min-compat-client to luminous, it's > failed > > > > # ceph osd set-require-min-compat-client luminous > > Error EPERM: cannot set require_min_compat_client to luminous: 6 > connected client(s) look like jewel (missing 0xa20); 1 > connected client(s) look like jewel (missing 0x800); 1 > connected client(s) look like jewel (missing 0x820); add > --yes-i-really-mean-it to do it anyway > > > > ceph features > > > > "client": { > > "group": { > > "features": "0x40106b84a842a52", > > "release": "jewel", > > "num": 6 > > }, > > "group": { > > "features": "0x7010fb86aa42ada", > > "release": "jewel", > > "num": 1 > > }, > > "group": { > > "features": "0x7fddff8ee84bffb", > > "release": "jewel", > > "num": 1 > > }, > > "group": { > > "features": "0x3ffddff8eea4fffb", > > "release": "luminous", > > "num": 7 > > } > > } > > > > and sessions > > > > "MonSession(unknown.0 10.10.100.6:0/1603916368 is open allow *, > features 0x40106b84a842a52 (jewel))", > > "MonSession(unknown.0 10.10.100.2:0/2484488531 is open allow *, > features 0x40106b84a842a52 (jewel))", > > "MonSession(client.? 10.10.100.6:0/657483412 is open allow *, features > 0x7fddff8ee84bffb (jewel))", > > "MonSession(unknown.0 10.10.14.67:0/500706582 is open allow *, features > 0x7010fb86aa42ada (jewel))" > > > > can i use --yes-i-really-mean-it to force enable it ? > > No. 0x40106b84a842a52 and 0x7fddff8ee84bffb are too old. > > Thanks, > > Ilya > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] kernel cephfs - too many caps used by client
Only osds is v12.2.8, all of mds and mon used v12.2.12 # ceph versions { "mon": { "ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 3 }, "mgr": { "ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 4 }, "osd": { "ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 24, "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)": 203 }, "mds": { "ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 5 }, "rgw": { "ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 1 }, "overall": { "ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 37, "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)": 203 } } Lei Liu 于2019年10月19日周六 上午10:09写道: > Thanks for your reply. > > Yes, Already set it. > > [mds] >> mds_max_caps_per_client = 10485760 # default is 1048576 > > > I think the current configuration is big enough for per client. Do I need > to continue to increase this value? > > Thanks. > > Patrick Donnelly 于2019年10月19日周六 上午6:30写道: > >> Hello Lei, >> >> On Thu, Oct 17, 2019 at 8:43 PM Lei Liu wrote: >> > >> > Hi cephers, >> > >> > We have some ceph clusters use cephfs in production(mount with kernel >> cephfs), but several of clients often keep a lot of caps(millions) >> unreleased. >> > I know this is due to the client's inability to complete the cache >> release, errors might have been encountered, but no logs. >> > >> > client kernel version is 3.10.0-957.21.3.el7.x86_64 >> > ceph version is mostly v12.2.8 >> > >> > ceph status shows: >> > >> > x clients failing to respond to cache pressure >> > >> > client kernel debug shows: >> > >> > # cat >> /sys/kernel/debug/ceph/a00cc99c-f9f9-4dd9-9281-43cd12310e41.client11291811/caps >> > total 23801585 >> > avail 1074 >> > used 23800511 >> > reserved 0 >> > min 1024 >> > >> > mds config: >> > [mds] >> > mds_max_caps_per_client = 10485760 >> > # 50G >> > mds_cache_memory_limit = 53687091200 >> > >> > I want to know if some ceph configurations can solve this problem ? >> >> mds_max_caps_per_client is new in Luminous 12.2.12. See [1]. You need >> to upgrade. >> >> [1] https://tracker.ceph.com/issues/38130 >> >> -- >> Patrick Donnelly, Ph.D. >> He / Him / His >> Senior Software Engineer >> Red Hat Sunnyvale, CA >> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D >> >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] kernel cephfs - too many caps used by client
Thanks for your reply. Yes, Already set it. [mds] > mds_max_caps_per_client = 10485760 # default is 1048576 I think the current configuration is big enough for per client. Do I need to continue to increase this value? Thanks. Patrick Donnelly 于2019年10月19日周六 上午6:30写道: > Hello Lei, > > On Thu, Oct 17, 2019 at 8:43 PM Lei Liu wrote: > > > > Hi cephers, > > > > We have some ceph clusters use cephfs in production(mount with kernel > cephfs), but several of clients often keep a lot of caps(millions) > unreleased. > > I know this is due to the client's inability to complete the cache > release, errors might have been encountered, but no logs. > > > > client kernel version is 3.10.0-957.21.3.el7.x86_64 > > ceph version is mostly v12.2.8 > > > > ceph status shows: > > > > x clients failing to respond to cache pressure > > > > client kernel debug shows: > > > > # cat > /sys/kernel/debug/ceph/a00cc99c-f9f9-4dd9-9281-43cd12310e41.client11291811/caps > > total 23801585 > > avail 1074 > > used 23800511 > > reserved 0 > > min 1024 > > > > mds config: > > [mds] > > mds_max_caps_per_client = 10485760 > > # 50G > > mds_cache_memory_limit = 53687091200 > > > > I want to know if some ceph configurations can solve this problem ? > > mds_max_caps_per_client is new in Luminous 12.2.12. See [1]. You need > to upgrade. > > [1] https://tracker.ceph.com/issues/38130 > > -- > Patrick Donnelly, Ph.D. > He / Him / His > Senior Software Engineer > Red Hat Sunnyvale, CA > GPG: 19F28A586F808C2402351B93C3301A3E258DD79D > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] kernel cephfs - too many caps used by client
Hi cephers, We have some ceph clusters use cephfs in production(mount with kernel cephfs), but several of clients often keep a lot of caps(millions) unreleased. I know this is due to the client's inability to complete the cache release, errors might have been encountered, but no logs. client kernel version is 3.10.0-957.21.3.el7.x86_64 ceph version is mostly v12.2.8 ceph status shows: x clients failing to respond to cache pressure client kernel debug shows: # cat /sys/kernel/debug/ceph/a00cc99c-f9f9-4dd9-9281-43cd12310e41.client11291811/caps total 23801585 avail 1074 used 23800511 reserved 0 min 1024 mds config: [mds] mds_max_caps_per_client = 10485760 # 50G mds_cache_memory_limit = 53687091200 I want to know if some ceph configurations can solve this problem ? Any suggestions? Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] krbd / kcephfs - jewel client features question
Well, I know kernel 4.13 is full support with ceph luminous and I know that Redhat backports a lot of things to the 3.10 kernel, So which is the matching version? 3.10-862 or above version ?Thanks 发自我的iPhone-- Original --From: 刘磊 Date: Thu,Oct 17,2019 9:38 PMTo: ceph-users Subject: Re: krbd / kcephfs - jewel client features questionHi Cephers,We have some ceph clusters in 12.2.x version, now we want to use upmap balancer,but when i set set-require-min-compat-client to luminous, it's failed# ceph osd set-require-min-compat-client luminousError EPERM: cannot set require_min_compat_client to luminous: 6 connected client(s) look like jewel (missing 0xa20); 1 connected client(s) look like jewel (missing 0x800); 1 connected client(s) look like jewel (missing 0x820); add --yes-i-really-mean-it to do it anywayceph features"client": { "group": { "features": "0x40106b84a842a52", "release": "jewel", "num": 6 }, "group": { "features": "0x7010fb86aa42ada", "release": "jewel", "num": 1 }, "group": { "features": "0x7fddff8ee84bffb", "release": "jewel", "num": 1 }, "group": { "features": "0x3ffddff8eea4fffb", "release": "luminous", "num": 7 } }and sessions"MonSession(unknown.0 10.10.100.6:0/1603916368 is open allow *, features 0x40106b84a842a52 (jewel))","MonSession(unknown.0 10.10.100.2:0/2484488531 is open allow *, features 0x40106b84a842a52 (jewel))","MonSession(client.? 10.10.100.6:0/657483412 is open allow *, features 0x7fddff8ee84bffb (jewel))","MonSession(unknown.0 10.10.14.67:0/500706582 is open allow *, features 0x7010fb86aa42ada (jewel))"can i use --yes-i-really-mean-it to force enable it ? I know kernel 4.13 is full support with ceph luminous and I know that Redhat backports a lot of things to the 3.10 kernel, So which is the matching version? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] krbd / kcephfs - jewel client features question
Hi Cephers, We have some ceph clusters in 12.2.x version, now we want to use upmap balancer,but when i set set-require-min-compat-client to luminous, it's failed # ceph osd set-require-min-compat-client luminous Error EPERM: cannot set require_min_compat_client to luminous: 6 connected client(s) look like jewel (missing 0xa20); 1 connected client(s) look like jewel (missing 0x800); 1 connected client(s) look like jewel (missing 0x820); add --yes-i-really-mean-it to do it anyway ceph features "client": { "group": { "features": "0x40106b84a842a52", "release": "jewel", "num": 6 }, "group": { "features": "0x7010fb86aa42ada", "release": "jewel", "num": 1 }, "group": { "features": "0x7fddff8ee84bffb", "release": "jewel", "num": 1 }, "group": { "features": "0x3ffddff8eea4fffb", "release": "luminous", "num": 7 } } and sessions "MonSession(unknown.0 10.10.100.6:0/1603916368 is open allow *, features 0x40106b84a842a52 (jewel))", "MonSession(unknown.0 10.10.100.2:0/2484488531 is open allow *, features 0x40106b84a842a52 (jewel))", "MonSession(client.? 10.10.100.6:0/657483412 is open allow *, features 0x7fddff8ee84bffb (jewel))", "MonSession(unknown.0 10.10.14.67:0/500706582 is open allow *, features 0x7010fb86aa42ada (jewel))" can i use --yes-i-really-mean-it to force enable it ? I know kernel 4.13 is full support with ceph luminous and I know that Redhat backports a lot of things to the 3.10 kernel, So which is the matching version? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] What's the best practice for Erasure Coding
Hi Frank, Thanks for sharing valuable experience. Frank Schilder 于2019年7月8日周一 下午4:36写道: > Hi David, > > I'm running a cluster with bluestore on raw devices (no lvm) and all > journals collocated on the same disk with the data. Disks are spinning > NL-SAS. Our goal was to build storage at lowest cost, therefore all data on > HDD only. I got a few SSDs that I'm using for FS and RBD meta data. All > large pools are EC on spinning disk. > > I spent at least one month to run detailed benchmarks (rbd bench) > depending on EC profile, object size, write size, etc. Results were varying > a lot. My advice would be to run benchmarks with your hardware. If there > was a single perfect choice, there wouldn't be so many options. For > example, my tests will not be valid when using separate fast disks for WAL > and DB. > > There are some results though that might be valid in general: > > 1) EC pools have high throughput but low IOP/s compared with replicated > pools > > I see single-thread write speeds of up to 1.2GB (gigabyte) per second, > which is probably the network limit and not the disk limit. IOP/s get > better with more disks, but are way lower than what replicated pools can > provide. On a cephfs with EC data pool, small-file IO will be comparably > slow and eat a lot of resources. > > 2) I observe massive network traffic amplification on small IO sizes, > which is due to the way EC overwrites are handled. This is one bottleneck > for IOP/s. We have 10G infrastructure and use 2x10G client and 4x10G OSD > network. OSD bandwidth at least 2x client network, better 4x or more. > > 3) k should only have small prime factors, power of 2 if possible > > I tested k=5,6,8,10,12. Best results in decreasing order: k=8, k=6. All > other choices were poor. The value of m seems not relevant for performance. > Larger k will require more failure domains (more hardware). > > 4) object size matters > > The best throughput (1M write size) I see with object sizes of 4MB or 8MB, > with IOP/s getting somewhat better with slower object sizes but throughput > dropping fast. I use the default of 4MB in production. Works well for us. > > 5) jerasure is quite good and seems most flexible > > jerasure is quite CPU efficient and can handle smaller chunk sizes than > other plugins, which is preferrable for IOP/s. However, CPU usage can > become a problem and a plugin optimized for specific values of k and m > might help here. Under usual circumstances I see very low load on all OSD > hosts, even under rebalancing. However, I remember that once I needed to > rebuild something on all OSDs (I don't remember what it was, sorry). In > this situation, CPU load went up to 30-50% (meaning up to half the cores > were at 100%), which is really high considering that each server has only > 16 disks at the moment and is sized to handle up to 100. CPU power could > become a bottle for us neck in the future. > > These are some general observations and do not replace benchmarks for > specific use cases. I was hunting for a specific performance pattern, which > might not be what you want to optimize for. I would recommend to run > extensive benchmarks if you have to live with a configuration for a long > time - EC profiles cannot be changed. > > We settled on 8+2 and 6+2 pools with jerasure and object size 4M. We also > use bluestore compression. All meta data pools are on SSD, only very little > SSD space is required. This choice works well for the majority of our use > cases. We can still build small expensive pools to accommodate special > performance requests. > > Best regards, > > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: ceph-users on behalf of David < > xiaomajia...@gmail.com> > Sent: 07 July 2019 20:01:18 > To: ceph-users@lists.ceph.com > Subject: [ceph-users] What's the best practice for Erasure Coding > > Hi Ceph-Users, > > I'm working with a Ceph cluster (about 50TB, 28 OSDs, all Bluestore on > lvm). > Recently, I'm trying to use the Erasure Code pool. > My question is "what's the best practice for using EC pools ?". > More specifically, which plugin (jerasure, isa, lrc, shec or clay) should > I adopt, and how to choose the combinations of (k,m) (e.g. (k=3,m=2), > (k=6,m=3) ). > > Does anyone share some experience? > > Thanks for any help. > > Regards, > David > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com