Re: [ceph-users] Inconsistents + FAILED assert(recovery_info.oi.legacy_snaps.size())
Excuse the top-posting. When looking at the logs it helps to filter by the actual thread that crashed. $ grep 7f08af3b6700 ceph-osd.27.log.last.error.txt|tail -15 -1001> 2019-10-30 12:55:41.498823 7f08af3b6700 1 -- 129.20.199.93:6803/977508 --> 129.20.199.7:0/2975967502 -- osd_op_reply(283046730 rbd_data.384d296b8b4567.0f99 [set-alloc-hint object_size 4194304 write_size 4194304,write 3145728~4096] v194345'6696469 uv6696469 ondisk = 0) v8 -- 0x5598ed521440 con 0 -651> 2019-10-30 12:55:42.211634 7f08af3b6700 5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 4294967295'18446744073709551615, trimmed: , trimmed_dups: , clear_divergent_priors: 0 -565> 2019-10-30 12:55:42.775786 7f08af3b6700 1 -- 129.20.177.3:6802/977508 --> 129.20.177.2:6823/3002168 -- MOSDScrubReserve(5.2d8 REJECT e194345) v1 -- 0x5598ed7e4000 con 0 -457> 2019-10-30 12:55:43.390134 7f08af3b6700 5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 194345'4406723, trimmed: , trimmed_dups: , clear_divergent_priors: 0 -435> 2019-10-30 12:55:43.850768 7f08af3b6700 5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 194345'1735861, trimmed: , trimmed_dups: , clear_divergent_priors: 0 -335> 2019-10-30 12:55:44.637635 7f08af3b6700 5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 194345'7602452, trimmed: , trimmed_dups: , clear_divergent_priors: 0 -325> 2019-10-30 12:55:44.682357 7f08af3b6700 1 -- 129.20.177.3:6802/977508 --> 129.20.177.1:6802/3802 -- osd_repop(client.108792126.1:283046901 6.369 e194345/194339 6:96f81e66:::rbd_data.384d296b8b4567.0f99:head v 194345'6696470) v2 -- 0x5598ee591600 con 0 -324> 2019-10-30 12:55:44.682450 7f08af3b6700 1 -- 129.20.177.3:6802/977508 --> 129.20.177.2:6821/6004637 -- osd_repop(client.108792126.1:283046901 6.369 e194345/194339 6:96f81e66:::rbd_data.384d296b8b4567.0f99:head v 194345'6696470) v2 -- 0x5598cf2ad600 con 0 -323> 2019-10-30 12:55:44.682510 7f08af3b6700 5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 194345'6696470, trimmed: , trimmed_dups: , clear_divergent_priors: 0 -20> 2019-10-30 12:55:46.366704 7f08af3b6700 1 -- 129.20.177.3:6802/977508 --> 129.20.177.2:6806/1848108 -- pg_scan(digest 2.1d9 2:9b97b661:::rb.0.a7bb39.238e1f29.00107c9b:head-MAX e 194345/194345) v2 -- 0x5598efc0bb80 con 0 0> 2019-10-30 12:55:46.496423 7f08af3b6700 -1 /build/ceph-12.2.12/src/osd/PrimaryLogPG.cc: In function 'virtual void PrimaryLogPG::on_local_recover(const hobject_t&, const ObjectRecoveryInfo&, ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f08af3b6700 time 2019-10-30 12:55:46.487842 2019-10-30 12:55:46.557930 7f08af3b6700 -1 *** Caught signal (Aborted) ** in thread 7f08af3b6700 thread_name:tp_osd_tp 0> 2019-10-30 12:55:46.557930 7f08af3b6700 -1 *** Caught signal (Aborted) ** in thread 7f08af3b6700 thread_name:tp_osd_tp Since PrimaryLogPG::on_local_recover() prints the object id when the function is entered at debug level 10 I'd suggest gathering a log at a higher 'debug_osd' level (I'd suggest 20) to be sure about what object is causing the issue. 334 void PrimaryLogPG::on_local_recover( 335 const hobject_t , 336 const ObjectRecoveryInfo &_recovery_info, 337 ObjectContextRef obc, 338 bool is_delete, 339 ObjectStore::Transaction *t 340 ) 341 { 342 dout(10) << __func__ << ": " << hoid << dendl; On Wed, Oct 30, 2019 at 11:43 PM Jérémy Gardais wrote: > > The "best" health i was able to get was : > HEALTH_ERR norecover flag(s) set; 1733/37482459 objects misplaced (0.005%); 5 > scrub errors; Possible data damage: 2 pgs inconsistent; Degraded data > redundancy: 7461/37482459 objects degraded (0.020%), 24 pgs degraded, 2 pgs > undersized > OSDMAP_FLAGS norecover flag(s) set > OBJECT_MISPLACED 1733/37482459 objects misplaced (0.005%) > OSD_SCRUB_ERRORS 5 scrub errors > PG_DAMAGED Possible data damage: 2 pgs inconsistent > pg 2.2ba is active+clean+inconsistent, acting [42,29,30] > pg 2.2bb is active+clean+inconsistent, acting [25,42,18] > PG_DEGRADED Degraded data redundancy: 7461/37482459 objects degraded > (0.020%), 24 pgs degraded, 2 pgs undersized > pg 2.3e is active+recovery_wait+degraded, acting [27,31,5] > pg 2.9d is active+recovery_wait+degraded, acting [27,22,37] > pg 2.a3 is active+recovery_wait+degraded, acting [27,30,35] > pg 2.136 is active+recovery_wait+degraded, acting [27,18,22] > pg 2.150 is active+recovery_wait+degraded, acting [27,19,35] > pg 2.15e is active+recovery_wait+degraded, acting [27,11,36] > pg 2.1d9 is stuck undersized for 14023.243179, current state > active+undersized+degraded+remapped+backfill_wait, last acting [25,30] > pg 2.20f is
Re: [ceph-users] cephfs 1 large omap objects
On Wed, Oct 30, 2019 at 9:28 AM Jake Grimmett wrote: > > Hi Zheng, > > Many thanks for your helpful post, I've done the following: > > 1) set the threshold to 1024 * 1024: > > # ceph config set osd \ > osd_deep_scrub_large_omap_object_key_threshold 1048576 > > 2) deep scrubbed all of the pgs on the two OSD that reported "Large omap > object found." - these were all in pool 1, which has just four osd. > > > Result: After 30 minutes, all deep-scrubs completed, and all "large omap > objects" warnings disappeared. > > ...should we be worried about the size of these OMAP objects? No. There are only a few of these objects and it's not caused problems up to now in any other cluster. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph pg in inactive state
Thanks 潘东元 for the response. The creation of a new pool works, and all the PGs corresponding to that pool have active+clean state. When I initially set ceph 3 node cluster using juju charms (replication count per object was set to 3), there were issues with ceph-osd services. So I had to delete the units and readd them (I did all of them together, which must have created issues with rebalancing). I assume that the PGs in the inactive state points to the 3 old OSDs which were deleted. I assume I will have to create all the pools again. But my concern is about the default pools. --- pool 1 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 last_change 15 flags hashpspool stripe_width 0 application rgw pool 2 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 2 pgp_num 2 last_change 19 flags hashpspool stripe_width 0 application rgw pool 3 'default.rgw.data.root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 2 pgp_num 2 last_change 23 flags hashpspool stripe_width 0 application rgw pool 4 'default.rgw.gc' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 2 pgp_num 2 last_change 27 flags hashpspool stripe_width 0 application rgw pool 5 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 2 pgp_num 2 last_change 31 flags hashpspool stripe_width 0 application rgw pool 6 'default.rgw.intent-log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 2 pgp_num 2 last_change 35 flags hashpspool stripe_width 0 application rgw pool 7 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 2 pgp_num 2 last_change 39 flags hashpspool stripe_width 0 application rgw pool 8 'default.rgw.usage' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 2 pgp_num 2 last_change 43 flags hashpspool stripe_width 0 application rgw pool 9 'default.rgw.users.keys' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 2 pgp_num 2 last_change 47 flags hashpspool stripe_width 0 application rgw pool 10 'default.rgw.users.email' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 2 pgp_num 2 last_change 51 flags hashpspool stripe_width 0 application rgw pool 11 'default.rgw.users.swift' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 2 pgp_num 2 last_change 55 flags hashpspool stripe_width 0 application rgw pool 12 'default.rgw.users.uid' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 2 pgp_num 2 last_change 59 flags hashpspool stripe_width 0 application rgw pool 13 'default.rgw.buckets.extra' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 2 pgp_num 2 last_change 63 flags hashpspool stripe_width 0 application rgw pool 14 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 4 pgp_num 4 last_change 67 flags hashpspool stripe_width 0 application rgw --- Can you please update if recreating them using rados cli will break anything? On Wed, Oct 30, 2019 at 4:56 PM 潘东元 wrote: > your pg acting set is empty,and cluster report i don't have pg that > indicate pg dost not have primary osd. > What are you cluster status when you are create poo?l > > Wido den Hollander 于2019年10月30日周三 下午1:30写道: > > > > > > > > On 10/30/19 3:04 AM, soumya tr wrote: > > > Hi all, > > > > > > I have a 3 node ceph cluster setup using juju charms. ceph health shows > > > having inactive pgs. > > > > > > --- > > > /# ceph status > > > cluster: > > > id: 0e36956e-ef64-11e9-b472-00163e6e01e8 > > > health: HEALTH_WARN > > > Reduced data availability: 114 pgs inactive > > > > > > services: > > > mon: 3 daemons, quorum > > > juju-06c3e9-0-lxd-0,juju-06c3e9-2-lxd-0,juju-06c3e9-1-lxd-0 > > > mgr: juju-06c3e9-0-lxd-0(active), standbys: juju-06c3e9-1-lxd-0, > > > juju-06c3e9-2-lxd-0 > > > osd: 3 osds: 3 up, 3 in > > > > > > data: > > > pools: 18 pools, 114 pgs > > > objects: 0 objects, 0 B > > > usage: 3.0 GiB used, 34 TiB / 34 TiB avail > > > pgs: 100.000% pgs unknown > > > 114 unknown/ > > > --- > > > > > > *PG health as well shows the PGs are in inactive state* > > > > > > --- > > > /# ceph health detail > > > HEALTH_WARN Reduced data availability: 114 pgs inactive > > > PG_AVAILABILITY Reduced data availability: 114 pgs inactive > > > pg 1.0 is stuck inactive for 1454.593774, current state unknown, > > > last acting [] > > > pg 1.1 is stuck inactive for 1454.593774, current state unknown, > > > last acting [] > > > pg 1.2 is stuck inactive for 1454.593774, current state unknown, > > > last acting [] > > > pg 1.3 is stuck inactive for 1454.593774, current state unknown, > > > last
Re: [ceph-users] Ceph pg in inactive state
Thanks, Wido for the update. Yeah, I have already tried a restart of ceph-mgr. But it didn't help. On Wed, Oct 30, 2019 at 4:30 PM Wido den Hollander wrote: > > > On 10/30/19 3:04 AM, soumya tr wrote: > > Hi all, > > > > I have a 3 node ceph cluster setup using juju charms. ceph health shows > > having inactive pgs. > > > > --- > > /# ceph status > > cluster: > > id: 0e36956e-ef64-11e9-b472-00163e6e01e8 > > health: HEALTH_WARN > > Reduced data availability: 114 pgs inactive > > > > services: > > mon: 3 daemons, quorum > > juju-06c3e9-0-lxd-0,juju-06c3e9-2-lxd-0,juju-06c3e9-1-lxd-0 > > mgr: juju-06c3e9-0-lxd-0(active), standbys: juju-06c3e9-1-lxd-0, > > juju-06c3e9-2-lxd-0 > > osd: 3 osds: 3 up, 3 in > > > > data: > > pools: 18 pools, 114 pgs > > objects: 0 objects, 0 B > > usage: 3.0 GiB used, 34 TiB / 34 TiB avail > > pgs: 100.000% pgs unknown > > 114 unknown/ > > --- > > > > *PG health as well shows the PGs are in inactive state* > > > > --- > > /# ceph health detail > > HEALTH_WARN Reduced data availability: 114 pgs inactive > > PG_AVAILABILITY Reduced data availability: 114 pgs inactive > > pg 1.0 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 1.1 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 1.2 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 1.3 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 1.4 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 1.5 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 1.6 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 1.7 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 1.8 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 1.9 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 1.a is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 2.0 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 2.1 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 3.0 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 3.1 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 4.0 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 4.1 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 5.0 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 5.1 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 6.0 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 6.1 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 7.0 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 7.1 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 8.0 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 8.1 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 9.0 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 9.1 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 10.1 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 11.0 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 17.10 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 17.11 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 17.12 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 17.13 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 17.14 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 17.15 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 17.16 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 17.17 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 17.18 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 17.19 is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 17.1a is stuck inactive for 1454.593774, current state unknown, > > last acting [] > > pg 18.10 is stuck inactive for
[ceph-users] Using multisite to migrate data between bucket data pools.
This is a tangent on Paul Emmerich's response to "[ceph-users] Correct Migration Workflow Replicated -> Erasure Code". I've tried Paul's method before to migrate between 2 data pools. However I ran into some issues. The first issue seems like a bug in RGW where the RGW for the new zone was able to pull data directly from the data pool of the original zone after the metadata had been sync'd. The metadata seemed to realize the file actually exists and so it went ahead and grabbed it from the pool backing the other zone. I worked around that slightly by using cephx to specify which pools each RGW user could access, but it gives a permission denied error instead of a file not found error. This happens on buckets that are set not to replicate as well as buckets that failed to sync properly. Seems like a bit of a security threat, but not a super common situation at all. The second issue I think has to do with corrupt index files in my index pool. Some of the buckets I don't need any more so I went to delete them for simplicity, but the command failed to delete them. I just set them aside for now and can just set the ones that I don't need any more to not replicate on the bucket level. That works for most things, but then I have a few buckets that I need to migrate, but when I set them to start replicating the data sync between zones gets stuck. Does anyone have any ideas on how to clean up the bucket indexes to make these operations possible? At this point I've disabled multisite and cleared up the new zone so I can run operations on these buckets without dealing with multisite and replication. I've tried a few things and can get some additional information on my specific errors tomorrow at work. -- Forwarded message - From: Paul Emmerich Date: Wed, Oct 30, 2019 at 4:32 AM Subject: [ceph-users] Re: Correct Migration Workflow Replicated -> Erasure Code To: Konstantin Shalygin Cc: Mac Wynkoop , ceph-users We've solved this off-list (because I already got access to the cluster) For the list: Copying on rados level is possible, but requires to shut down radosgw to get a consistent copy. This wasn't feasible here due to the size and performance. We've instead added a second zone where the placement maps to an EC pool to the zonegroup and it's currently copying over data. We'll then make the second zone master and default and ultimately delete the first one. This allows for a migration without downtime. Another possibility would be using a Transition lifecycle rule, but that's not ideal because it doesn't actually change the bucket. I don't think it would be too complicated to add a native bucket migration mechanism that works similar to "bucket rewrite" (which is intended for something similar but different). Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs 1 large omap objects
Hi Zheng, Many thanks for your helpful post, I've done the following: 1) set the threshold to 1024 * 1024: # ceph config set osd \ osd_deep_scrub_large_omap_object_key_threshold 1048576 2) deep scrubbed all of the pgs on the two OSD that reported "Large omap object found." - these were all in pool 1, which has just four osd. Result: After 30 minutes, all deep-scrubs completed, and all "large omap objects" warnings disappeared. ...should we be worried about the size of these OMAP objects? again many thanks, Jake On 10/30/19 3:15 AM, Yan, Zheng wrote: > see https://tracker.ceph.com/issues/42515. just ignore the warning for now > > On Mon, Oct 7, 2019 at 7:50 AM Nigel Williams > wrote: >> >> Out of the blue this popped up (on an otherwise healthy cluster): >> >> HEALTH_WARN 1 large omap objects >> LARGE_OMAP_OBJECTS 1 large omap objects >> 1 large objects found in pool 'cephfs_metadata' >> Search the cluster log for 'Large omap object found' for more details. >> >> "Search the cluster log" is somewhat opaque, there are logs for many >> daemons, what is a "cluster" log? In the ML history some found it in the OSD >> logs? >> >> Another post suggested removing lost+found, but using cephfs-shell I don't >> see one at the top-level, is there another way to disable this "feature"? >> >> thanks. >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Jake Grimmett MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Inconsistents + FAILED assert(recovery_info.oi.legacy_snaps.size())
The "best" health i was able to get was : HEALTH_ERR norecover flag(s) set; 1733/37482459 objects misplaced (0.005%); 5 scrub errors; Possible data damage: 2 pgs inconsistent; Degraded data redundancy: 7461/37482459 objects degraded (0.020%), 24 pgs degraded, 2 pgs undersized OSDMAP_FLAGS norecover flag(s) set OBJECT_MISPLACED 1733/37482459 objects misplaced (0.005%) OSD_SCRUB_ERRORS 5 scrub errors PG_DAMAGED Possible data damage: 2 pgs inconsistent pg 2.2ba is active+clean+inconsistent, acting [42,29,30] pg 2.2bb is active+clean+inconsistent, acting [25,42,18] PG_DEGRADED Degraded data redundancy: 7461/37482459 objects degraded (0.020%), 24 pgs degraded, 2 pgs undersized pg 2.3e is active+recovery_wait+degraded, acting [27,31,5] pg 2.9d is active+recovery_wait+degraded, acting [27,22,37] pg 2.a3 is active+recovery_wait+degraded, acting [27,30,35] pg 2.136 is active+recovery_wait+degraded, acting [27,18,22] pg 2.150 is active+recovery_wait+degraded, acting [27,19,35] pg 2.15e is active+recovery_wait+degraded, acting [27,11,36] pg 2.1d9 is stuck undersized for 14023.243179, current state active+undersized+degraded+remapped+backfill_wait, last acting [25,30] pg 2.20f is active+recovery_wait+degraded, acting [27,30,2] pg 2.2a1 is active+recovery_wait+degraded, acting [27,18,35] pg 2.2b7 is active+recovery_wait+degraded, acting [27,18,36] pg 2.386 is active+recovery_wait+degraded, acting [27,42,17] pg 2.391 is active+recovery_wait+degraded, acting [27,15,36] pg 2.448 is stuck undersized for 51520.798900, current state active+recovery_wait+undersized+degraded+remapped, last acting [27,38] pg 2.456 is active+recovery_wait+degraded, acting [27,5,43] pg 2.45a is active+recovery_wait+degraded, acting [27,43,36] pg 2.45f is active+recovery_wait+degraded, acting [27,16,36] pg 2.46c is active+recovery_wait+degraded, acting [27,30,38] pg 2.4bf is active+recovery_wait+degraded, acting [27,39,18] pg 2.522 is active+recovery_wait+degraded, acting [27,17,3] pg 2.535 is active+recovery_wait+degraded, acting [27,29,36] pg 2.55a is active+recovery_wait+degraded, acting [27,29,18] pg 5.23f is active+recovery_wait+degraded, acting [27,39,18] pg 5.356 is active+recovery_wait+degraded, acting [27,36,15] pg 5.4a6 is active+recovery_wait+degraded, acting [29,40,30] After that, the flapping started again : 2019-10-30 12:55:46.772593 mon.r730xd1 [INF] osd.38 failed (root=default,datacenter=IPR,room=11B,rack=baie2,host=r740xd1) (connection refused reported by osd.22) 2019-10-30 12:55:46.850239 mon.r730xd1 [INF] osd.27 failed (root=default,datacenter=IPR,room=11B,rack=baie2,host=r730xd3) (connection refused reported by osd.19) 2019-10-30 12:55:56.714029 mon.r730xd1 [WRN] Health check update: 2 osds down (OSD_DOWN) Setting "norecover" flag allow these 2 OSDs to recover up state and limit the flapping states and many backfills. In both osd.27 and osd.38 logs, i still find these logs before one FAILED assert : -2> 2019-10-30 12:52:31.999571 7fb5c125b700 5 -- 129.20.177.3:6802/870834 >> 129.20.177.3:6808/810999 conn(0x564c31d31000 :6802 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=97 cs=7 l=0). rx osd.25 seq 2600 0x564c3a02e1c0 MOSDPGPush(2.1d9 194334/194298 [PushOp(2:9b97b818:::rbd_data.0c16b76b8b4567.0001426e:5926, version: 127481'7241006, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 1, recovery_info: ObjectRecoveryInfo(2:9b97b818:::rbd_data.0c16b76b8b4567.0001426e:5926@127481'7241006, size: 4194304, copy_subset: [], clone_subset: {}, snapset: 0=[]:[]), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0, data_complete:true, omap_recovered_to:, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) v3 -1> 2019-10-30 12:52:31.999633 7fb5c125b700 1 -- 129.20.177.3:6802/870834 <== osd.25 129.20.177.3:6808/810999 2600 MOSDPGPush(2.1d9 194334/194298 [PushOp(2:9b97b818:::rbd_data.0c16b76b8b4567.0001426e:5926, version: 127481'7241006, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 1, recovery_info: ObjectRecoveryInfo(2:9b97b818:::rbd_data.0c16b76b8b4567.0001426e:5926@127481'7241006, size: 4194304, copy_subset: [], clone_subset: {}, snapset: 0=[]:[]), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0, data_complete:true, omap_recovered_to:, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) v3 909+0+0 (4137937376 0 0) 0x564c3a02e1c0 con 0x564c31d31000 0> 2019-10-30 12:52:32.008605 7fb59b03d700 -1 /build/ceph-12.2.12/src/osd/PrimaryLogPG.cc: In function 'virtual void
Re: [ceph-users] ceph-ansible / block-db block-wal
I don't use ansible anymore. But this was my config for the host onode1: ./host_vars/onode2.yml: lvm_volumes: - data: /dev/sdb db: '1' db_vg: host-2-db - data: /dev/sdc db: '2' db_vg: host-2-db - data: /dev/sde db: '3' db_vg: host-2-db - data: /dev/sdf db: '4' db_vg: host-2-db … one config file per host. The LVs were created by hand on a PV over RAID1 over two SSDs. The hosts had empty slots for hdds to be bought later. So I had to "partition" the PV by hand, because ansible uses the whole RAID1 only for the present HDDs. It is said that only certain sizes of DB & WAL partitions are sensible. I now use 58GiB LVs. The remaining space in the RAID1 is used for a faster OSD. Lars Wed, 30 Oct 2019 10:02:23 + CUZA Frédéric ==> "ceph-users@lists.ceph.com" : > Hi Everyone, > > Does anyone know how to indicate block-db and block-wal to device on ansible ? > In ceph-deploy it is quite easy : > ceph-deploy osd create osd_host08 --data /dev/sdl --block-db /dev/sdm12 > --block-wal /dev/sdn12 -bluestore > > On my data nodes I have 12 HDDs and 2 SSDs I use those SSDs for block-db and > block-wal. > How to indicate for each osd which partition to use ? > > And finally, how do you handle the deployment if you have multiple data nodes > setup ? > SSDs on sdm and sdn on one host and SSDs on sda and sdb on another ? > > Thank you for your help. > > Regards, -- Informationstechnologie Berlin-Brandenburgische Akademie der Wissenschaften Jägerstraße 22-23 10117 Berlin Tel.: +49 30 20370-352 http://www.bbaw.de ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Inconsistents + FAILED assert(recovery_info.oi.legacy_snaps.size())
Thus spake Brad Hubbard (bhubb...@redhat.com) on mercredi 30 octobre 2019 à 12:50:50: > Maybe you should set nodown and noout while you do these maneuvers? > That will minimise peering and recovery (data movement). As the commands don't take too long, i just had a few slow requests before the osd was back online. Thanks for the nodown|noout tip. > > snapid 22772 from osd.29 and osd.42 : > > ceph-objectstore-tool --pgid 2.2ba --data-path /var/lib/ceph/osd/ceph-29/ > > '["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.425f","key":"","snapid":22772,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]' > > remove > > ceph-objectstore-tool --pgid 2.2ba --data-path /var/lib/ceph/osd/ceph-42/ > > '["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.425f","key":"","snapid":22772,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]' > > remove > > That looks right. Done preceded by some dump, get-attrs,… commands. Yeah, not sure about the real interest, but just to be cautious ^^ The PG still looks inconsistent. I asked for a deep-scrub 2.2ba, still waiting. `list-inconsistent-obj` and `list-inconsistent-snapset` returns "No scrub information available for pg 2.2ba" for the moment. I also tried to manage pg 2.371 with : ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-27/ '["2.371",{"oid":"rbd_data.0c16b76b8b4567.000420bb","key":"","snapid":22822,"hash":3394498417,"max":0,"pool":2,"namespace":"","max":0}]' remove This one doesn't looks inconsistent anymore but i also asked for a deep-scrup. > You should probably try and work out what caused the issue and take > steps to minimise the likelihood of a recurrence. This is not expected > behaviour in a correctly configured and stable environment. Yes… I wait a little bit to see what happens with these commands first and keep an eye on the cluster health and logs… -- Gardais Jérémy Institut de Physique de Rennes Université Rennes 1 Téléphone: 02-23-23-68-60 Mail & bonnes pratiques: http://fr.wikipedia.org/wiki/Nétiquette --- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph-ansible / block-db block-wal
Hi Everyone, Does anyone know how to indicate block-db and block-wal to device on ansible ? In ceph-deploy it is quite easy : ceph-deploy osd create osd_host08 --data /dev/sdl --block-db /dev/sdm12 --block-wal /dev/sdn12 -bluestore On my data nodes I have 12 HDDs and 2 SSDs I use those SSDs for block-db and block-wal. How to indicate for each osd which partition to use ? And finally, how do you handle the deployment if you have multiple data nodes setup ? SSDs on sdm and sdn on one host and SSDs on sda and sdb on another ? Thank you for your help. Regards, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS client hanging and cache issues
Thanks a lot and sorry for the spam, I should have checked ! We are on 18.04, kernel is currently upgrading so if you don't hear back from me then it is fixed. Thanks for the amazing support ! On Wed, 30 Oct 2019, 09:54 Lars Täuber, wrote: > Hi. > > Sounds like you use kernel clients with kernels from canonical/ubuntu. > Two kernels have a bug: > 4.15.0-66 > and > 5.0.0-32 > > Updated kernels are said to have fixes. > Older kernels also work: > 4.15.0-65 > and > 5.0.0-31 > > > Lars > > Wed, 30 Oct 2019 09:42:16 + > Bob Farrell ==> ceph-users > : > > Hi. We are experiencing a CephFS client issue on one of our servers. > > > > ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus > > (stable) > > > > Trying to access, `umount`, or `umount -f` a mounted CephFS volumes > causes > > my shell to hang indefinitely. > > > > After a reboot I can remount the volumes cleanly but they drop out after > < > > 1 hour of use. > > > > I see this log entry multiple times when I reboot the server: > > ``` > > cache_from_obj: Wrong slab cache. inode_cache but object is from > > ceph_inode_info > > ``` > > The machine then reboots after approx. 30 minutes. > > > > All other Ceph/CephFS clients and servers seem perfectly happy. CephFS > > cluster is HEALTH_OK. > > > > Any help appreciated. If I can provide any further details please let me > > know. > > > > Thanks in advance, > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS client hanging and cache issues
Hi. Sounds like you use kernel clients with kernels from canonical/ubuntu. Two kernels have a bug: 4.15.0-66 and 5.0.0-32 Updated kernels are said to have fixes. Older kernels also work: 4.15.0-65 and 5.0.0-31 Lars Wed, 30 Oct 2019 09:42:16 + Bob Farrell ==> ceph-users : > Hi. We are experiencing a CephFS client issue on one of our servers. > > ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus > (stable) > > Trying to access, `umount`, or `umount -f` a mounted CephFS volumes causes > my shell to hang indefinitely. > > After a reboot I can remount the volumes cleanly but they drop out after < > 1 hour of use. > > I see this log entry multiple times when I reboot the server: > ``` > cache_from_obj: Wrong slab cache. inode_cache but object is from > ceph_inode_info > ``` > The machine then reboots after approx. 30 minutes. > > All other Ceph/CephFS clients and servers seem perfectly happy. CephFS > cluster is HEALTH_OK. > > Any help appreciated. If I can provide any further details please let me > know. > > Thanks in advance, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS client hanging and cache issues
Kernel bug due to a bad backport, see recent posts here. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Wed, Oct 30, 2019 at 10:42 AM Bob Farrell wrote: > > Hi. We are experiencing a CephFS client issue on one of our servers. > > ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus > (stable) > > Trying to access, `umount`, or `umount -f` a mounted CephFS volumes causes my > shell to hang indefinitely. > > After a reboot I can remount the volumes cleanly but they drop out after < 1 > hour of use. > > I see this log entry multiple times when I reboot the server: > ``` > cache_from_obj: Wrong slab cache. inode_cache but object is from > ceph_inode_info > ``` > The machine then reboots after approx. 30 minutes. > > All other Ceph/CephFS clients and servers seem perfectly happy. CephFS > cluster is HEALTH_OK. > > Any help appreciated. If I can provide any further details please let me know. > > Thanks in advance, > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CephFS client hanging and cache issues
Hi. We are experiencing a CephFS client issue on one of our servers. ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable) Trying to access, `umount`, or `umount -f` a mounted CephFS volumes causes my shell to hang indefinitely. After a reboot I can remount the volumes cleanly but they drop out after < 1 hour of use. I see this log entry multiple times when I reboot the server: ``` cache_from_obj: Wrong slab cache. inode_cache but object is from ceph_inode_info ``` The machine then reboots after approx. 30 minutes. All other Ceph/CephFS clients and servers seem perfectly happy. CephFS cluster is HEALTH_OK. Any help appreciated. If I can provide any further details please let me know. Thanks in advance, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] changing set-require-min-compat-client will cause hiccup?
Hi,I need to change set-require-min-compat-clientto use upmap mode for the PG balancer. Will this cause a disconnect of all clients? We're talking cephfs and RBD images for VMs. Or is it save to switch that live? Is safe. k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] changing set-require-min-compat-client will cause hiccup?
Hi,I need to change set-require-min-compat-clientto use upmap mode for the PG balancer. Will this cause a disconnect of all clients? We're talking cephfs and RBD images for VMs. Or is it save to switch that live? thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] very high ram usage by OSDs on Nautilus
Yes you were right, somehow there was an unusual high memory target set, not sure where this came from. I set it back to normal now, that should fix it I guess. Thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com