[ceph-users] /var/lib/ceph/osd/ceph-xxx/current/meta shows "Structure needs cleaning"
Hi All, Every time after we activate osd, we got “Structure needs cleaning” in /var/lib/ceph/osd/ceph-xxx/current/meta. /var/lib/ceph/osd/ceph-xxx/current/meta # ls -l ls: reading directory .: Structure needs cleaning total 0 Could Anyone say something about this error? Thank you! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Multipart Upload - POST fails
No-one? -Ursprüngliche Nachricht- Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag von Ingo Reimann Gesendet: Freitag, 2. März 2018 14:15 An: ceph-users Betreff: [ceph-users] Multipart Upload - POST fails Hi, we discovered some problem with our installation - Multipart upload is not working. What we did: * tried upload with cyberduck as well as with script from http://tracker.ceph.com/issues/12790 * tried against jewel gateways and luminous gateways from old cluster * tried against 12.2.4 gateway with jewel-era cluster Surprisingly this is no signature problem as in the issue above, instead I get the following in the logs: 2018-03-02 13:59:04.927353 7fe2053ca700 1 == starting new request req=0x7fe2053c42c0 = 2018-03-02 13:59:04.927383 7fe2053ca700 2 req 61:0.30::POST /luminous-12-2-4/Data128MB::initializing for trans_id = tx0003d-005a994a98-10c84997-default 2018-03-02 13:59:04.927396 7fe2053ca700 10 rgw api priority: s3=5 s3website=4 2018-03-02 13:59:04.927399 7fe2053ca700 10 host=cephrgw01.dunkel.de 2018-03-02 13:59:04.927422 7fe2053ca700 20 subdomain= domain=cephrgw01.dunkel.de in_hosted_domain=1 in_hosted_domain_s3website=0 2018-03-02 13:59:04.927427 7fe2053ca700 20 final domain/bucket subdomain= domain=cephrgw01.dunkel.de in_hosted_domain=1 in_hosted_domain_s3website=0 s->info.domain=cephrgw01.dunkel.de s->info.request_uri=/luminous-12-2-4/Data128MB 2018-03-02 13:59:04.927447 7fe2053ca700 10 meta>> HTTP_X_AMZ_CONTENT_SHA256 2018-03-02 13:59:04.927454 7fe2053ca700 10 meta>> HTTP_X_AMZ_DATE 2018-03-02 13:59:04.927459 7fe2053ca700 10 x>> x-amz-content-sha256:254bcc3fc4f27172636df4bf32de9f107f620d559b20d760197e4 52b97453917 2018-03-02 13:59:04.927464 7fe2053ca700 10 x>> x-amz-date:20180302T125904Z 2018-03-02 13:59:04.927493 7fe2053ca700 20 get_handler handler=22RGWHandler_REST_Obj_S3 2018-03-02 13:59:04.927500 7fe2053ca700 10 handler=22RGWHandler_REST_Obj_S3 2018-03-02 13:59:04.927505 7fe2053ca700 2 req 61:0.000152:s3:POST /luminous-12-2-4/Data128MB::getting op 4 2018-03-02 13:59:04.927512 7fe2053ca700 10 op=28RGWInitMultipart_ObjStore_S3 2018-03-02 13:59:04.927514 7fe2053ca700 2 req 61:0.000161:s3:POST /luminous-12-2-4/Data128MB:init_multipart:verifying requester 2018-03-02 13:59:04.927519 7fe2053ca700 20 rgw::auth::StrategyRegistry::s3_main_strategy_t: trying rgw::auth::s3::AWSAuthStrategy 2018-03-02 13:59:04.927524 7fe2053ca700 20 rgw::auth::s3::AWSAuthStrategy: trying rgw::auth::s3::S3AnonymousEngine 2018-03-02 13:59:04.927531 7fe2053ca700 20 rgw::auth::s3::S3AnonymousEngine denied with reason=-1 2018-03-02 13:59:04.927533 7fe2053ca700 20 rgw::auth::s3::AWSAuthStrategy: trying rgw::auth::s3::LocalEngine 2018-03-02 13:59:04.927569 7fe2053ca700 10 v4 signature format = 48cc8c61a70dde17932d925f65f843116199c1ca10094db83e7de05bfbd57dc4 2018-03-02 13:59:04.927584 7fe2053ca700 10 v4 credential format = 8DGDGA57XL9YPM8DGEQQ/20180302/us-east-1/s3/aws4_request 2018-03-02 13:59:04.927587 7fe2053ca700 10 access key id = 8DGDGA57XL9YPM8DGEQQ 2018-03-02 13:59:04.927589 7fe2053ca700 10 credential scope = 20180302/us-east-1/s3/aws4_request 2018-03-02 13:59:04.927620 7fe2053ca700 10 canonical headers format = content-type:application/octet-stream date:Fri, 02 Mar 2018 12:59:04 GMT host:cephrgw01.dunkel.de x-amz-content-sha256:254bcc3fc4f27172636df4bf32de9f107f620d559b20d760197e4 52b97453917 x-amz-date:20180302T125904Z 2018-03-02 13:59:04.927634 7fe2053ca700 10 payload request hash = 254bcc3fc4f27172636df4bf32de9f107f620d559b20d760197e452b97453917 2018-03-02 13:59:04.927690 7fe2053ca700 10 canonical request = POST /luminous-12-2-4/Data128MB uploads= content-type:application/octet-stream date:Fri, 02 Mar 2018 12:59:04 GMT host:cephrgw01.dunkel.de x-amz-content-sha256:254bcc3fc4f27172636df4bf32de9f107f620d559b20d760197e4 52b97453917 x-amz-date:20180302T125904Z content-type;date;host;x-amz-content-sha256;x-amz-date 254bcc3fc4f27172636df4bf32de9f107f620d559b20d760197e452b97453917 2018-03-02 13:59:04.927696 7fe2053ca700 10 canonical request hash = 54e9858263535b46a3c4e51b2ae5c1d0bf5e7a7690c5bba722eea749e7b936c4 2018-03-02 13:59:04.927716 7fe2053ca700 10 string to sign = AWS4-HMAC-SHA256 20180302T125904Z 20180302/us-east-1/s3/aws4_request 54e9858263535b46a3c4e51b2ae5c1d0bf5e7a7690c5bba722eea749e7b936c4 2018-03-02 13:59:04.927920 7fe2053ca700 10 date_k= dcef1f3be70873f1cb3240f7a56320e3c6763e7cf4bfae0e3182d2f9525292cd 2018-03-02 13:59:04.927954 7fe2053ca700 10 region_k = 3d83dd9161cf7ba15e6c8c28d264f6cfce9b848e927359f34364a6c8c98209b7 2018-03-02 13:59:04.927963 7fe2053ca700 10 service_k = e0708e00dc6b52aa1d889f45cd1dcced2bb1b2eee1b62e94ad9813c555e8eda9 2018-03-02 13:59:04.927972 7fe2053ca700 10 signing_k = 1ae362c4b2f1666786404fdb56c62d4f393635b2ce76d46ba325097fd3aa645e 2018-03-02 13:59:04.928021 7fe2053ca700 10 generated signature = 48cc8c61a70dde17932d925f65f843116199c1ca10094db83e7de05bfbd57dc4 2018-03-02 13:59:04.928031 7fe2053ca700 15
Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock
Hi David, Thanks for the info. Could I assume that if use active/passive multipath with rbd exclusive lock then all targets which support rbd(via block) are safe? 2018-03-08 shadow_lin 发件人:David Disseldorp发送时间:2018-03-08 08:47 主题:Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock 收件人:"shadow_lin" 抄送:"Mike Christie" ,"Lazuardi Nasution" ,"Ceph Users" Hi shadowlin, On Wed, 7 Mar 2018 23:24:42 +0800, shadow_lin wrote: > Is it safe to use active/active multipath If use suse kernel with > target_core_rbd? > Thanks. A cross-gateway failover race-condition similar to what Mike described is currently possible with active/active target_core_rbd. It's a corner case that is dependent on a client assuming that unacknowledged I/O has been implicitly terminated and can be resumed via an alternate path, while the original gateway at the same time issues the original request such that it reaches the Ceph cluster after differing I/O to the same region via the alternate path. It's not something that we've observed in the wild, but is nevertheless a bug that is being worked on, with a resolution that should also be usable for active/active tcmu-runner. Cheers, David ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] improve single job sequencial read performance.
On Wed, Mar 7, 2018 at 8:37 PM, Alex Gorbachevwrote: > On Wed, Mar 7, 2018 at 9:43 AM, Cassiano Pilipavicius > wrote: >> Hi all, this issue already have been discussed in older threads and I've >> already tried most of the solutions proposed in older threads. >> >> >> I have a small and old ceph cluster (slarted in hammer and upgraded until >> luminous 12.2.2) , connected thru single 1gbe link shared (I know this is >> not optimal but for my workload it is handling the load reasonably well). I >> use for RBD for small VMs in libvirtu/qemu. >> >> My problem is... If i need to copy a large file (cp, dd, tar), the read >> speed is very low (15MB/s). I've tested the write speed of a single job with >> dd zero (direct) > file and the speed is good enought for my environment >> (80MB/s) >> >> If I run paralell jobs, I can saturate the network connection, the speed >> scales with the number of jobs. I've tried setting read ahead on ceph.conf >> and in the guest O.S >> >> I've never heard any report of a cluster using single 1gbe, maybe this speed >> is what should I expect? The next week I will be upgrading the network for 2 >> x 10gbe (private and public) but I would like to know if I have any issue >> that I need to address before, as the problem can be masked by the network >> upgrade. >> >> If anyone can throw some light or point me in any direction or tell me >> this is what you should expect I really apreciate. If anyone need more >> info please let me know. > > Workarounds I have heard of or used: > > 1. Use fancy striping and parallelize that way > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-April/017744.html > > 2. Use lvm and set up a striped volume over multiple RBDs > > 3. Weird but we had seen improvement in sequential speeds with larger > object size (16 MB) in the past > > 4. Caching solutions may help smooth out peaks and valleys of IO - > bcache, flashcache and we have successfully used EnhanceIO with > writethrough mode > > 5. Better SSD journals help if using filestore > > 6. Caching controllers, e.g. Areca > > -- > Alex Gorbachev > Storcium > > >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock
Hi shadowlin, On Wed, 7 Mar 2018 23:24:42 +0800, shadow_lin wrote: > Is it safe to use active/active multipath If use suse kernel with > target_core_rbd? > Thanks. A cross-gateway failover race-condition similar to what Mike described is currently possible with active/active target_core_rbd. It's a corner case that is dependent on a client assuming that unacknowledged I/O has been implicitly terminated and can be resumed via an alternate path, while the original gateway at the same time issues the original request such that it reaches the Ceph cluster after differing I/O to the same region via the alternate path. It's not something that we've observed in the wild, but is nevertheless a bug that is being worked on, with a resolution that should also be usable for active/active tcmu-runner. Cheers, David ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Don't use ceph mds set max_mds
On Wed, Mar 7, 2018 at 5:29 AM, John Spraywrote: > On Wed, Mar 7, 2018 at 10:11 AM, Dan van der Ster wrote: >> Hi all, >> >> What is the purpose of >> >>ceph mds set max_mds >> >> ? >> >> We just used that by mistake on a cephfs cluster when attempting to >> decrease from 2 to 1 active mds's. >> >> The correct command to do this is of course >> >> ceph fs set max_mds >> >> So, is `ceph mds set max_mds` useful for something? If not, should it >> be removed from the CLI? > > It's the legacy version of the command from before we had multiple > filesystems. Those commands are marked as obsolete internally so that > they're not included in the --help output, but they're still handled > (applied to the "default" filesystem) if called. > > The multi-fs stuff went in for Jewel, so maybe we should think about > removing the old commands in Mimic: any thoughts Patrick? These commands have already been removed (obsoleted) in master/Mimic. You can no longer use them. In Luminous, the commands are deprecated (basically, omitted from --help). See also: https://tracker.ceph.com/issues/20596 -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pg inconsistent
On Thu, Mar 8, 2018 at 1:22 AM, Harald Staubwrote: > "ceph pg repair" leads to: > 5.7bd repair 2 errors, 0 fixed > > Only an empty list from: > rados list-inconsistent-obj 5.7bd --format=json-pretty > > Inspired by http://tracker.ceph.com/issues/12577 , I tried again with more > verbose logging and searched the osd logs e.g. for "!=", "mismatch", could > not find anything interesting. Oh well, these are several millions of lines > ... > > Any hint what I could look for? Try searching for "scrub_compare_maps" and looking for "5.7bd" in that context. > > The 3 OSDs involved are running on 12.2.4, one of them is on BlueStore. > > Cheers > Harry > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Cheers, Brad ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock
Hi Christie, Is it safe to use active/passive multipath with krbd with exclusive lock for lio/tgt/scst/tcmu? Is it safe to use active/active multipath If use suse kernel with target_core_rbd? Thanks. 2018-03-07 shadowlin 发件人:Mike Christie发送时间:2018-03-07 03:51 主题:Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock 收件人:"Lazuardi Nasution" ,"Ceph Users" 抄送: On 03/06/2018 01:17 PM, Lazuardi Nasution wrote: > Hi, > > I want to do load balanced multipathing (multiple iSCSI gateway/exporter > nodes) of iSCSI backed with RBD images. Should I disable exclusive lock > feature? What if I don't disable that feature? I'm using TGT (manual > way) since I get so many CPU stuck error messages when I was using LIO. > You are using LIO/TGT with krbd right? You cannot or shouldn't do active/active multipathing. If you have the lock enabled then it bounces between paths for each IO and will be slow. If you do not have it enabled then you can end up with stale IO overwriting current data. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] pg inconsistent
"ceph pg repair" leads to: 5.7bd repair 2 errors, 0 fixed Only an empty list from: rados list-inconsistent-obj 5.7bd --format=json-pretty Inspired by http://tracker.ceph.com/issues/12577 , I tried again with more verbose logging and searched the osd logs e.g. for "!=", "mismatch", could not find anything interesting. Oh well, these are several millions of lines ... Any hint what I could look for? The 3 OSDs involved are running on 12.2.4, one of them is on BlueStore. Cheers Harry ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS Client Capabilities questions
On Wed, Mar 7, 2018 at 2:45 PM, Kenneth Waegemanwrote: > Hi all, > > I am playing with limiting client access to certain subdirectories of cephfs > running latest 12.2.4 and latest centos 7.4 kernel, both using kernel client > and fuse > > I am following http://docs.ceph.com/docs/luminous/cephfs/client-auth/: > > To completely restrict the client to the bar directory, omit the root > directory > > ceph fs authorize cephfs client.foo /bar rw > > When I mount this directory with fuse, this works. When I try to mount the > subdirectory directly with the kernel client, I get > > mount error 13 = Permission denied > > > This only seems to work when the root is readable. > > --> Is there a way to mount subdirectory with kernel client when parent in > cephfs is not readable ? The latest CentOS kernel isn't necessarily very recent: it sounds like the version in use there is a little older (at one point the subdir mount support had this quirk with the kclient that required the root be readable). > Then I checked the data pool with rados, but I can list/get/.. every object > in the data pool using the client.foo key. > > I saw in the docs of master > http://docs.ceph.com/docs/master/cephfs/client-auth/ that you can add a tag > cephfs, but if I add this I can't write anything to cephfs anymore, so I > guess this is not yet supported in luminous. > > --> Is there a way to limit the cephfs user to his data only (through > cephfs) instead of being able to do everything on the pool, without needing > a pool for every single cephfs client? Yes. You can do this with namespaces: set the ceph.dir.layout.pool_namespace on the restricted subdir (before any files are written in there), and then restrict the client's OSD caps to that namespace within the pool, with a cap like "allow rw pool=foo namespace=baz". John > > > Thanks!! > > Kenneth > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CephFS Client Capabilities questions
Hi all, I am playing with limiting client access to certain subdirectories of cephfs running latest 12.2.4 and latest centos 7.4 kernel, both using kernel client and fuse I am following http://docs.ceph.com/docs/luminous/cephfs/client-auth/: /To completely restrict the client to the //|bar|//directory, omit the root directory/ // ///cephfsauthorizecephfsclient//.//foo///barrw/// When I mount this directory with fuse, this works. When I try to mount the subdirectory directly with the kernel client, I get /mount error 13 = Permission denied / This only seems to work when the root is readable. --> Is there a way to mount subdirectory with kernel client when parent in cephfs is not readable ? Then I checked the data pool with rados, but I can list/get/.. every object in the data pool using the client.foo key. I saw in the docs of master http://docs.ceph.com/docs/master/cephfs/client-auth/ that you can add a tag cephfs, but if I add this I can't write anything to cephfs anymore, so I guess this is not yet supported in luminous. --> Is there a way to limit the cephfs user to his data only (through cephfs) instead of being able to do everything on the pool, without needing a pool for every single cephfs client? Thanks!! Kenneth ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] improve single job sequencial read performance.
Hi all, this issue already have been discussed in older threads and I've already tried most of the solutions proposed in older threads. I have a small and old ceph cluster (slarted in hammer and upgraded until luminous 12.2.2) , connected thru single 1gbe link shared (I know this is not optimal but for my workload it is handling the load reasonably well). I use for RBD for small VMs in libvirtu/qemu. My problem is... If i need to copy a large file (cp, dd, tar), the read speed is very low (15MB/s). I've tested the write speed of a single job with dd zero (direct) > file and the speed is good enought for my environment (80MB/s) If I run paralell jobs, I can saturate the network connection, the speed scales with the number of jobs. I've tried setting read ahead on ceph.conf and in the guest O.S I've never heard any report of a cluster using single 1gbe, maybe this speed is what should I expect? The next week I will be upgrading the network for 2 x 10gbe (private and public) but I would like to know if I have any issue that I need to address before, as the problem can be masked by the network upgrade. If anyone can throw some light or point me in any direction or tell me this is what you should expect I really apreciate. If anyone need more info please let me know. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Don't use ceph mds set max_mds
On Wed, Mar 7, 2018 at 2:02 PM, Dan van der Sterwrote: > On Wed, Mar 7, 2018 at 2:29 PM, John Spray wrote: >> On Wed, Mar 7, 2018 at 10:11 AM, Dan van der Ster >> wrote: >>> Hi all, >>> >>> What is the purpose of >>> >>>ceph mds set max_mds >>> >>> ? >>> >>> We just used that by mistake on a cephfs cluster when attempting to >>> decrease from 2 to 1 active mds's. >>> >>> The correct command to do this is of course >>> >>> ceph fs set max_mds >>> >>> So, is `ceph mds set max_mds` useful for something? If not, should it >>> be removed from the CLI? >> >> It's the legacy version of the command from before we had multiple >> filesystems. Those commands are marked as obsolete internally so that >> they're not included in the --help output, > > Ahhh! It is indeed omitted from --help but I hadn't noticed because it > is still rather helpful if you go ahead and run the command: > > # ceph mds set > Invalid command: missing required parameter > var(max_mds|max_file_size|allow_new_snaps|inline_data|allow_multimds|allow_dirfrags) > mds set > max_mds|max_file_size|allow_new_snaps|inline_data|allow_multimds|allow_dirfrags > {} : set mds parameter to > Error EINVAL: invalid command > > I suppose we just need a new generation of operators that would never > even try these old deprecated commands ;) > >> but they're still handled >> (applied to the "default" filesystem) if called. > > Hmm... does it apply if we never set the default fs (though only have one) ? > (How do we even see/get the default fs?) It'll automatically be set to the first filesystem created. Now that I go look for the setting, I remember it's actually got the slightly esoteric internal name of "legacy_client_fscid" (because it's the filesystem ID that will get mounted by a legacy client that doesn't know which filesystem it wants). You set it with "ceph fs set-default", but it looks like it got left out of FSMap::dump, so there's no easy way to peek at it. Created https://github.com/ceph/ceph/pull/20780 John > What happened in our case is that I did `ceph mds set max_mds 1` then > deactivated rank 2. This caused some sort of outage which deadlocked > the mds's (they recovered after restarting). I assume the outage > happened because I deactivated rank 2 while we still had max_mds=2 at > the fs scope (and we had no standbys -- due to the v12.2.2->4 upgrade > breakage). > > Thanks John! > > Dan > >> >> The multi-fs stuff went in for Jewel, so maybe we should think about >> removing the old commands in Mimic: any thoughts Patrick? >> >> John >> >>> >>> Cheers, Dan >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Don't use ceph mds set max_mds
On Wed, Mar 7, 2018 at 2:29 PM, John Spraywrote: > On Wed, Mar 7, 2018 at 10:11 AM, Dan van der Ster wrote: >> Hi all, >> >> What is the purpose of >> >>ceph mds set max_mds >> >> ? >> >> We just used that by mistake on a cephfs cluster when attempting to >> decrease from 2 to 1 active mds's. >> >> The correct command to do this is of course >> >> ceph fs set max_mds >> >> So, is `ceph mds set max_mds` useful for something? If not, should it >> be removed from the CLI? > > It's the legacy version of the command from before we had multiple > filesystems. Those commands are marked as obsolete internally so that > they're not included in the --help output, Ahhh! It is indeed omitted from --help but I hadn't noticed because it is still rather helpful if you go ahead and run the command: # ceph mds set Invalid command: missing required parameter var(max_mds|max_file_size|allow_new_snaps|inline_data|allow_multimds|allow_dirfrags) mds set max_mds|max_file_size|allow_new_snaps|inline_data|allow_multimds|allow_dirfrags {} : set mds parameter to Error EINVAL: invalid command I suppose we just need a new generation of operators that would never even try these old deprecated commands ;) > but they're still handled > (applied to the "default" filesystem) if called. Hmm... does it apply if we never set the default fs (though only have one) ? (How do we even see/get the default fs?) What happened in our case is that I did `ceph mds set max_mds 1` then deactivated rank 2. This caused some sort of outage which deadlocked the mds's (they recovered after restarting). I assume the outage happened because I deactivated rank 2 while we still had max_mds=2 at the fs scope (and we had no standbys -- due to the v12.2.2->4 upgrade breakage). Thanks John! Dan > > The multi-fs stuff went in for Jewel, so maybe we should think about > removing the old commands in Mimic: any thoughts Patrick? > > John > >> >> Cheers, Dan >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Don't use ceph mds set max_mds
On Wed, Mar 7, 2018 at 10:11 AM, Dan van der Sterwrote: > Hi all, > > What is the purpose of > >ceph mds set max_mds > > ? > > We just used that by mistake on a cephfs cluster when attempting to > decrease from 2 to 1 active mds's. > > The correct command to do this is of course > > ceph fs set max_mds > > So, is `ceph mds set max_mds` useful for something? If not, should it > be removed from the CLI? It's the legacy version of the command from before we had multiple filesystems. Those commands are marked as obsolete internally so that they're not included in the --help output, but they're still handled (applied to the "default" filesystem) if called. The multi-fs stuff went in for Jewel, so maybe we should think about removing the old commands in Mimic: any thoughts Patrick? John > > Cheers, Dan > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Uneven pg distribution cause high fs_apply_latency on osds with more pgs
Hi list, Ceph version is jewel 10.2.10 and all osd are using filestore. The Cluster has 96 osds and 1 pool with size=2 replication with 4096 pg(base on pg calculate method from ceph doc for 100pg/per osd). The osd with the most pg count has 104 PGs and there are 6 osds have above 100 PGs Most of the osd have around 7x-9x PGs The osd with the least pg count has 58 PGs During the write test some of the osds have very high fs_apply_latency like 1000ms-4000ms while the normal ones are like 100-600ms. The osds with high latency are always the ones with more pg on it. iostat on the high latency osd shows the hdds are having high %util at about 95%-96% while the normal ones are having %util at 40%-60% I think the reason to cause this is because the osds have more pgs need to handle more write request to it.Is this right? But even though the pg distribution is not even but the variation is not that much.How could the performance be so sensitive to it? Is there anything I can do to improve the performance and reduce the latency? How can I make the pg distribution to be more even? Thanks 2018-03-07 shadowlin___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Journaling feature causes cluster to have slow requests and inconsistent PG
First noticed this problem in our ESXi/iSCSI cluster, but not I can replicate it in lab with just Ubuntu: 1. Create an image with journaling (and required exclusive-lock) feature 2. Mount the image, make a fs and write a large file to it: rbd-nbd map matte/scuttle2 /dev/nbd0 mkfs.xfs /dev/nbd0 mount -t xfs /dev/nbd0 /srv/exports/sclun69 xfs_io -c "extsize 256M" /srv/exports/sclun69 root@lumd1:/var/log# dd if=/dev/zero of=/srv/exports/sclun69/junk bs=1M count=280 280+0 records in 280+0 records out 293601280 bytes (2.9 TB, 2.7 TiB) copied, 35199.2 s, 83.4 MB/s 3. At some point, slow requests begin. 2018-03-06 22:00:00.000175 mon.lumc1 [INF] overall HEALTH_OK 2018-03-06 22:27:27.945814 mon.lumc1 [WRN] Health check failed: 1 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-03-06 22:27:34.406352 mon.lumc1 [WRN] Health check update: 10 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-03-06 22:27:38.496184 mon.lumc1 [INF] Health check cleared: REQUEST_SLOW (was: 10 slow requests are blocked > 32 sec) 2018-03-06 22:27:38.496215 mon.lumc1 [INF] Cluster is now healthy 2018-03-06 23:00:00.000196 mon.lumc1 [INF] overall HEALTH_OK 2018-03-06 23:29:45.538387 osd.4 [ERR] 12.308 shard 17: soid 12:10dbc229:::rbd_data.39e1022ae8944a.000cd96d:head candidate had a read error 2018-03-06 23:29:56.937346 mon.lumc1 [ERR] Health check failed: 1 scrub errors (OSD_SCRUB_ERRORS) 2018-03-06 23:29:56.937415 mon.lumc1 [ERR] Health check failed: Possible data damage: 1 pg inconsistent (PG_DAMAGED) 2018-03-06 23:29:54.835693 osd.4 [ERR] 12.308 deep-scrub 0 missing, 1 inconsistent objects 2018-03-06 23:29:54.835703 osd.4 [ERR] 12.308 deep-scrub 1 errors 2018-03-07 00:00:00.000155 mon.lumc1 [ERR] overall HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent 2018-03-07 01:00:00.000201 mon.lumc1 [ERR] overall HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent 2018-03-07 02:00:00.000179 mon.lumc1 [ERR] overall HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent 2018-03-07 03:00:00.000235 mon.lumc1 [ERR] overall HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable) -- Alex Gorbachev Storcium ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] No more Luminous packages for Debian Jessie ??
On Wed, Mar 07, 2018 at 02:04:52PM +0100, Fabian Grünbichler wrote: > On Wed, Feb 28, 2018 at 10:24:50AM +0100, Florent B wrote: > > Hi, > > > > Since yesterday, the "ceph-luminous" repository does not contain any > > package for Debian Jessie. > > > > Is it expected ? > > AFAICT the packages are all there[2], but the Packages file only > references the ceph-deploy package so apt does not find the rest. > > IMHO this looks like something went wrong when generating the repository > metadata files - so maybe it's just a question of getting the people who > maintain the repository to notice this thread ;) > and as alfredo just pointed out on IRC, it has already been fixed! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] No more Luminous packages for Debian Jessie ??
On Wed, Feb 28, 2018 at 10:24:50AM +0100, Florent B wrote: > Hi, > > Since yesterday, the "ceph-luminous" repository does not contain any > package for Debian Jessie. > > Is it expected ? AFAICT the packages are all there[2], but the Packages file only references the ceph-deploy package so apt does not find the rest. IMHO this looks like something went wrong when generating the repository metadata files - so maybe it's just a question of getting the people who maintain the repository to notice this thread ;) 2: http://download.ceph.com/debian-luminous/pool/main/c/ceph/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD crash during pg repair - recovery_info.ss.clone_snaps.end and other problems
On 6.3.2018 22:28, Gregory Farnum wrote: On Sat, Mar 3, 2018 at 2:28 AM Jan Pekař - Imatic> wrote: Hi all, I have few problems on my cluster, that are maybe linked together and now caused OSD down during pg repair. First few notes about my cluster: 4 nodes, 15 OSDs installed on Luminous (no upgrade). Replicated pools with 1 pool (pool 6) cached by ssd disks. I don't detect any hardware failures (disk IO errors, restarts, corrupted data etc). I'm running RBDs using libvirt on debian wheezy and jessie (stable and oldstable). I'm snapshotting RBD's using Luminous client on Debian Jessie only. When you say "cached by", do you mean there's a cache pool? Or are you using bcache or something underneath? I mean cache pool. Now problems, from light to severe: 1) Almost every day I notice health some problems after deep scrub 1-2 inconsistent PG's with "read_error" on some osd's. When I don't repair it, it disappears after few days (? another deep scrub). There is no read_error on disks (disk check ok, no errors logged in syslog). 2) I noticed on my pool 6 (cached pool), that scrub reports some objects, that shouldn't be there: 2018-02-27 23:43:06.490152 7f4b3820e700 -1 osd.1 pg_epoch: 8712 pg[6.20( v 8712'771984 (8712'770478,8712'771984] local-lis/les=8710/8711 n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771984 lcod 8712'771983 mlcod 8712'771983 active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps no head for 6:07ffbc7b:::rbd_data.967992ae8944a.00061cb8:c2 (have MIN) I think, that means orphaned snap object without his head replica. Maybe snaptrim left it there? Why? Maybe error during snaptrim? Or fstrim/discard removed "head" object (this is I hope nonsense)? 3) I ended with one object (probably snap object), that has only 1 replica (out from size 3) and when I try to repair it, my OSD crash with /build/ceph-12.2.3/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != recovery_info.ss.clone_snaps.end()) I guess, that it detected orphaned snap object I noticed at 2) and don't repair it, just assterts and stop OSD. Am I right? I noticed comment "// hmm, should we warn?" on ceph source at that assert code. So should someone remove that assert? There's a ticket https://tracker.ceph.com/issues/23030, which links to a much longer discussion on this mailing list between Sage and Stefan which discusses this particular assert. I'm not entirely clear from the rest of your story (and the lng history in that thread) if there are other potential causes, or if your story might help diagnose it. But I'd start there since AFAIK it's still a mystery that looks serious but has only a very small number of incidences. :/ -Greg Thank you, I will go through it, but it looks not to be related to my issue. Now I added new disk to cluster, upgraded to 12.2.4. on some nodes and so far no scrub errors. If there is no clear answer to my OSD crash, I will try to wipe OSD containing problematic object (that causes my primary OSD fail), and rebuild it from other copies. Hope without that object, crash will not appear. If you think, that osdmaptool is safe for fuse-mount and delete object by hand, I can try it. But I'm looking for some tool to do that "online" on all PG copies and with checks (that snap object is not referenced somewhere). With regards Jan Pekar And my questions are How can I fix issue with crashing OSD? How can I safely remove that objects with missing head? Is there any tool or force-snaptrim on non-existent snapshots? It is prod cluster so I want to be careful. I have no problems now with data availability. My last idea is to move RBD's to another pool, but have not enough space to do that (as I know RBD can only copy not move) so I'm looking for another clean solution. And last question - how can I find, what is causing that read_erros and snap object leftovers? Should I paste my whole log? It is bigger than allowed post size. Pasting most important events: -23> 2018-02-27 23:43:07.903368 7f4b3820e700 2 osd.1 pg_epoch: 8712 pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711 n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985 mlcod 8712'771985 active+clean+scrubbing+deep+inconsistent+repair] 6.20 repair 1 missing, 0 inconsistent objects -22> 2018-02-27 23:43:07.903410 7f4b3820e700 -1 log_channel(cluster) log [ERR] : 6.20 repair 1 missing, 0 inconsistent objects -21> 2018-02-27 23:43:07.903446 7f4b3820e700 -1 log_channel(cluster) log [ERR] : 6.20 repair 3 errors, 2 fixed
Re: [ceph-users] No more Luminous packages for Debian Jessie ??
On Wed, 7 Mar 2018, Wei Jin said: > Same issue here. > Will Ceph community support Debian Jessie in the future? Seems odd to stop it right in the middle of minor point releases. Maybe it was an oversight? Jessie's still supported in Debian as oldstable and not even in LTS yet. Sean > On Mon, Mar 5, 2018 at 6:33 PM, Florent Bwrote: > > Jessie is no more supported ?? > > https://download.ceph.com/debian-luminous/dists/jessie/main/binary-amd64/Packages > > only contains ceph-deploy package ! > > > > > > On 28/02/2018 10:24, Florent B wrote: > >> Hi, > >> > >> Since yesterday, the "ceph-luminous" repository does not contain any > >> package for Debian Jessie. > >> > >> Is it expected ? > >> > >> Thank you. > >> > >> Florent > >> > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] No more Luminous packages for Debian Jessie ??
Same issue here. Will Ceph community support Debian Jessie in the future? On Mon, Mar 5, 2018 at 6:33 PM, Florent Bwrote: > Jessie is no more supported ?? > https://download.ceph.com/debian-luminous/dists/jessie/main/binary-amd64/Packages > only contains ceph-deploy package ! > > > On 28/02/2018 10:24, Florent B wrote: >> Hi, >> >> Since yesterday, the "ceph-luminous" repository does not contain any >> package for Debian Jessie. >> >> Is it expected ? >> >> Thank you. >> >> Florent >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Don't use ceph mds set max_mds
Hi all, What is the purpose of ceph mds set max_mds ? We just used that by mistake on a cephfs cluster when attempting to decrease from 2 to 1 active mds's. The correct command to do this is of course ceph fs set max_mds So, is `ceph mds set max_mds` useful for something? If not, should it be removed from the CLI? Cheers, Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Why one crippled osd can slow down or block all request to the whole ceph cluster?
What you said make sense. I have encountered a few hardware related issue that caused one osd to work abnormal and blocked all io of the whole cluster(all osd in one pool) which makes me think how to avoid this situation. 2018-03-07 shadow_lin 发件人:David Turner发送时间:2018-03-07 13:51 主题:Re: Re: [ceph-users] Why one crippled osd can slow down or block all request to the whole ceph cluster? 收件人:"shadow_lin" 抄送:"ceph-users" Marking osds down is not without risks. You are taking away one of the copies of data for every PG on that osd. Also you are causing every PG on that osd to peer. If that osd comes back up, every PG on it again needs to peer and then they need to recover. That is a lot of load and risks to automate into the system. Now let's take into consideration other causes of slow requests like having more IO load than your spindle can handle, backfilling settings set to aggressively (related to the first option), or networking problems. If the mon is detecting slow requests on OSDs and marking them down, you could end up marking half of your cluster down or causing corrupt data by flapping OSDs. The mon will mark osds down if those settings I mentioned are met. If the osd isn't unresponsive enough to not respond to other OSDs or the mons, then there really isn't much that ceph can do to automate this safely. There are just so many variables. If ceph was a closed system on specific hardware, it could certainly be monitoring that hardware closely for early warning signs... But people are running Ceph on everything they can compile it for including raspberry pis. The cluster admin, however, should be able to add their own early detection for failures. You can monitor a lot about disks including things such as average await in a host to see if the disks are taking longer than normal to respond. That particular check led us to find that we had several storage nodes with bad cache batteries on the controllers. Finding that explained some slowness we had noticed in the cluster. It also led us to a better method to catch that scenario sooner. On Tue, Mar 6, 2018, 11:22 PM shadow_lin wrote: Hi Turner, Thanks for your insight. I am wondering if the mon can detect slow/blocked request from certain osd why can't mon mark a osd with blocked request down if the request is blocked for a certain time. 2018-03-07 shadow_lin 发件人:David Turner 发送时间:2018-03-06 23:56 主题:Re: [ceph-users] Why one crippled osd can slow down or block all request to the whole ceph cluster? 收件人:"shadow_lin" 抄送:"ceph-users" There are multiple settings that affect this. osd_heartbeat_grace is probably the most apt. If an OSD is not getting a response from another OSD for more than the heartbeat_grace period, then it will tell the mons that the OSD is down. Once mon_osd_min_down_reporters have told the mons that an OSD is down, then the OSD will be marked down by the cluster. If the OSD does not then talk to the mons directly to say that it is up, it will be marked out after mon_osd_down_out_interval is reached. If it does talk to the mons to say that it is up, then it should be responding again and be fine. In your case where the OSD is half up, half down... I believe all you can really do is monitor your cluster and troubleshoot OSDs causing problems like this. Basically every storage solution is vulnerable to this. Sometimes an OSD just needs to be restarted due to being in a bad state somehow, or simply removed from the cluster because the disk is going bad. On Sun, Mar 4, 2018 at 2:28 AM shadow_lin wrote: Hi list, During my test of ceph,I find sometime the whole ceph cluster are blocked and the reason was one unfunctional osd.Ceph can heal itself if some osd is down, but it seems if some osd is half dead (have heart beat but can't handle request) then all the request which are directed to that osd would be blocked. If all osds are in one pool and the whole cluster would be blocked due to that one hanged osd. I think this is because ceph will try to distribute the request to all osds and if one of the osd wont confirm the request is done then everything is blocked. Is there a way to let ceph to mark the the crippled osd down if the requests direct to that osd are blocked more than certain time to avoid the whole cluster is blocked? 2018-03-04 shadow_lin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Delete a Pool - how hard should be?
Il 06/03/2018 16:23, David Turner ha scritto: That said, I do like the idea of being able to disable buckets, rbds, pools, etc so that no client could access them. That is useful for much more than just data deletion and won't prevent people from deleting data prematurely. To me, if nobody can access data for 30 days and the customer didn't call me within those days, it's ok to delete definitly the data. Which is the way should be. Make easy to the admin delete data when he really wants. Make possible to the user to stay some days without it's data till these data is obsolete and useless. The autopurge of the trash of your mailbox works in the sameway and seems to me a reasonable way to handle precious data such personal emails. It could be added as a requisite step to deleting a pool, rbd, etc. The process would need to be refactored as adding another step isn't viable. This feature is much more complicated than it may seem on the surface. For pools, you could utilize cephx, except not everyone uses that... So maybe logic added to the osd map. Buckets would have to be completely in rgw. Rbds would probably have to be in the osd map as well. This is not a trivial change. Mine was just a "/nice-to-have/" proposal. There is no hurry in implement a secondary feature such this one. About the logic is it possible to use something like this: * snapshot the pool with a special poolname * remove the original pool * give the possibility to restore the snapshot with it's original name. I think this should suddenly stop all the connection to the original pool but leave all the data intact. Maybe. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com