Re: [ceph-users] Ceph nautilus upgrade problem
Op 2-4-2019 om 12:16 schreef Stefan Kooman: Quoting Stadsnet (jwil...@stads.net): On 26-3-2019 16:39, Ashley Merrick wrote: Have you upgraded any OSD's? No didn't go through with the osd's Just checking here: are your sure all PGs have been scrubbed while running Luminous? As the release notes [1] mention this: "If you are unsure whether or not your Luminous cluster has completed a full scrub of all PGs, you can check your clusters state by running: # ceph osd dump | grep ^flags In order to be able to proceed to Nautilus, your OSD map must include the recovery_deletes and purged_snapdirs flags." Yes I did check that. No everything went fine, exactly as Ashley predicted "On a test cluster I saw the same and as I upgraded / restarted the OSD's the PG's started to show online till it was 100%." So I upgraded the first osd, and exactly that amount of percentage of OSD's became active. And every server the same percentage was added. And then finaly, with the last one I got 100% active. So went without problems. But it looked a bit uggly that's why I asked. And the new Nautilus versions is really a big plus in almost every way. Sorry for not getting back how it went. I was not sure if I should bother the mailing list. Thanks for your time. Gr. Stefan [1]: http://docs.ceph.com/docs/master/releases/nautilus/#upgrading-from-mimic-or-luminous P.s. I expect most users upgrade to Mimic first, then go to Nautilus. It might be a better tested upgrade path ... ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] S3 objects deleted but storage doesn't free space
Hi there all, Perhaps someone can help. We tried to free some storage so we deleted a lot S3 objects. The bucket has also valuable data so we can't delete the whole bucket. Everything went fine, but used storage space doesn't get less. We are expecting several TB of data to be freed. We then learned of garbage collection. So we thought let's wait. But even day's later no real change. We started " radosgw-admin gc process ", that never finished , or displayed any error or anything. Could find anything like -verbose or debug for this command or find a place with log to debug what is going on when radosgw-admin is working We tried to change the default settings, we got from old posting. We have put them in global and tried also in [client.rgw..] rgw_gc_max_objs =7877 ( but also rgw_gc_max_objs =200 or rgw_gc_max_objs =1000) rgw_lc_max_objs = 7877 rgw_gc_obj_min_wait = 300 rgw_gc_processor_period = 600 rgw_gc_processor_max_time = 600 We restarted the ceph-radosgw several times, the computers, all over period of days etc . Tried radosgw-admin gc process a few times etc. Did not find any references in radosgw logs like gc:: delete etc. But we don't know what to look for System is well, nor errors or warnings. But system is in use ( we are loading up data) -> Will GC only run when idle? When we count them with "radosgw-admin gc list | grep oid | wc -l" we get 11:00 18.086.665 objects 13:00 18.086.665 objects 15:00 18.086.665 objects so no change in objects after hours When we list "radosgw-admin gc list" we get files like radosgw-admin gc list | more [ { "tag": "b5687590-473f-4386-903f-d91a77b8d5cd.7354141.21122\u", "time": "2017-12-06 11:04:56.0.459704s", "objs": [ { "pool": "default.rgw.buckets.data", "oid": "b5687590-473f-4386-903f-d91a77b8d5cd.44121.4__shadow_.5OtA02n_GU8TkP08We_SLrT5GL1ihuS_1", "key": "", "instance": "" }, { "pool": "default.rgw.buckets.data", "oid": "b5687590-473f-4386-903f-d91a77b8d5cd.44121.4__shadow_.5OtA02n_GU8TkP08We_SLrT5GL1ihuS_2", "key": "", "instance": "" }, { "pool": "default.rgw.buckets.data", "oid": "b5687590-473f-4386-903f-d91a77b8d5cd.44121.4__shadow_.5OtA02n_GU8TkP08We_SLrT5GL1ihuS_3", "key": "", "instance": "" }, A few questions -> Who purges the gc list. Is it on the the radosgw machines. Or is it done distributed on the OSD's? Where do i have to change default "rgw_gc_max_objs =1000". We tried everywhere. We have used "tell" to change them in OSD and MON systems and also on the RGW endpoint's which we restarted. We have two radosgw endpoints. Is there a lock that only one will act, or will they both try to delete. Can we free / display such a lock How can I debug the application radosgw-admin. In which log files to look, what would be example of message. If I know an oid like above. Can I manually delete such an oid. Suppose we would delete the complete bucket with "radosgw-admin bucket rm --bucket=mybucket --purge-objects --inconsistent-index" would that also get rid of the GC files that allready there? Thanks ahead for your time, JW Michels q ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Solved] Oeps: lost cluster with: ceph osd require-osd-release luminous
On 9/12/17 9:13 PM, Josh Durgin wrote: Could you post your crushmap? PGs mapping to no OSDs is a symptom of something wrong there. You can stop the osds from changing position at startup with 'osd crush update on start = false': Yes I had found that. Thanks. Seems be be by design, which we didn't understand. We will try device classes. http://docs.ceph.com/docs/master/rados/operations/crush-map/#crush-location My "Big" problem, turned out to be a cosmetic problem. Although the whole problem looks quite ugly, and every metric is 0 where ever you look. And you can't really use any ceph management anymore. But the whole system kept functioning. Since it was a remote test site I didn't notice that earlier. So the whole problem was that the MGR server where up. But that a firewall prevented contact. At the moment that "ceph osd require-osd-release luminous" was set the old compatible metric system stopped working and switched to the mandatory MGR servers. And then you get this types of 0 messages. So even without visual management, and administrator thinking it was dead, ceph kept running. One could say that Ceph also managed to create a succesfull upgrade path to 12.2. Well done. Thanks for your time The only minor problem left is with scrub error and pg repair doing nothing. And because of bluestore not easy acces to the files rados list-inconsistent-pg default.rgw.buckets.data ["15.720"] rados list-inconsistent-obj 15.720 --format=json-pretty No scrub information available for pg 15.720 error 2: (2) No such file or directory Other people seem to also have this problem. http://tracker.ceph.com/issues/15781 I've read that perhaps a better pg-repair will be build. Will wait for that. Sent from Nine <http://www.9folders.com/> *From:* Jan-Willem Michels <jwil...@stads.net> *Sent:* Sep 11, 2017 23:50 *To:* ceph-users@lists.ceph.com *Subject:* [ceph-users] Oeps: lost cluster with: ceph osd require-osd-release luminous We have a kraken cluster, at the time newly build, with bluestore enabled. it is 8 systems, with each 10 disks 10TB , and each computer has 1 NVME 2TB disk 3 monitor etc About 700 TB and 300TB used. Mainly S3 objectstore Of course there is more to the story: We have one strange thing in our cluster. We tried to create two pools of storage, default and ssd, and created a new crush rule. Worked without problems for months But when we restart a computer / nvme-osd, it would "forget" that the nvme should be connected the SSD pool ( for that particular computer). Since we don't restart systems, we didn't notice that. The nvme would appear back a default pool. When we re-apply the same crush rule again it would go back to the SSD pool. All while data kept working on the nvme disks Clearly something is not ideal there. And luminous has a different approach to separating SSD from HDD. So we thought first go to luminous 12.2.0 and later see how we fix this. We did an upgrade to luminous and that went well. That requires a reboot / restart off osd's, so all nvme devices where a default. Reapplying the crush rule brought them back to the ssd pool. Also while doing the upgrade we switched off in ceph.conf the rule: # enable experimental unrecoverable data corrupting features = bluestore, sine in luminous that was no problem Everything was working fine. In Ceph -s we had this health warning all OSDs are running luminous or later but require_osd_release < luminous So i thought i would set the minimum OSD version to luminous with; ceph osd require-osd-release luminous To us that seemed nothing more than a minimum software version that was required to connect tot the cluster the system answered back recovery_deletes is set and that was it, the same second, ceph-s went to "0" ceph -s cluster: id: 5bafad08-31b2-4716-be77-07ad2e2647eb health: HEALTH_WARN noout flag(s) set Reduced data availability: 3248 pgs inactive Degraded data redundancy: 3248 pgs unclean services: mon: 3 daemons, quorum Ceph-Mon1,Ceph-Mon2,Ceph-Mon3 mgr: Ceph-Mon2(active), standbys: Ceph-Mon3, Ceph-Mon1 osd: 88 osds: 88 up, 88 in; 297 remapped pgs flags noout data: pools: 26 pools, 3248 pgs objects: 0 objects, 0 bytes usage: 0 kB used, 0 kB / 0 kB avail pgs: 100.000% pgs unknown 3248 unknown And it was something like this. The errors (apart from the scrub error) you see would where from the upgrade / restarting, and I would expect them to go away very fast. ceph -s cluster:
[ceph-users] Oeps: lost cluster with: ceph osd require-osd-release luminous
We have a kraken cluster, at the time newly build, with bluestore enabled. it is 8 systems, with each 10 disks 10TB , and each computer has 1 NVME 2TB disk 3 monitor etc About 700 TB and 300TB used. Mainly S3 objectstore Of course there is more to the story: We have one strange thing in our cluster. We tried to create two pools of storage, default and ssd, and created a new crush rule. Worked without problems for months But when we restart a computer / nvme-osd, it would "forget" that the nvme should be connected the SSD pool ( for that particular computer). Since we don't restart systems, we didn't notice that. The nvme would appear back a default pool. When we re-apply the same crush rule again it would go back to the SSD pool. All while data kept working on the nvme disks Clearly something is not ideal there. And luminous has a different approach to separating SSD from HDD. So we thought first go to luminous 12.2.0 and later see how we fix this. We did an upgrade to luminous and that went well. That requires a reboot / restart off osd's, so all nvme devices where a default. Reapplying the crush rule brought them back to the ssd pool. Also while doing the upgrade we switched off in ceph.conf the rule: # enable experimental unrecoverable data corrupting features = bluestore, sine in luminous that was no problem Everything was working fine. In Ceph -s we had this health warning all OSDs are running luminous or later but require_osd_release < luminous So i thought i would set the minimum OSD version to luminous with; ceph osd require-osd-release luminous To us that seemed nothing more than a minimum software version that was required to connect tot the cluster the system answered back recovery_deletes is set and that was it, the same second, ceph-s went to "0" ceph -s cluster: id: 5bafad08-31b2-4716-be77-07ad2e2647eb health: HEALTH_WARN noout flag(s) set Reduced data availability: 3248 pgs inactive Degraded data redundancy: 3248 pgs unclean services: mon: 3 daemons, quorum Ceph-Mon1,Ceph-Mon2,Ceph-Mon3 mgr: Ceph-Mon2(active), standbys: Ceph-Mon3, Ceph-Mon1 osd: 88 osds: 88 up, 88 in; 297 remapped pgs flags noout data: pools: 26 pools, 3248 pgs objects: 0 objects, 0 bytes usage: 0 kB used, 0 kB / 0 kB avail pgs: 100.000% pgs unknown 3248 unknown And it was something like this. The errors (apart from the scrub error) you see would where from the upgrade / restarting, and I would expect them to go away very fast. ceph -s cluster: id: 5bafad08-31b2-4716-be77-07ad2e2647eb health: HEALTH_ERR 385 pgs backfill_wait 5 pgs backfilling 135 pgs degraded 1 pgs inconsistent 1 pgs peering 4 pgs recovering 131 pgs recovery_wait 98 pgs stuck degraded 525 pgs stuck unclean recovery 119/612465488 objects degraded (0.000%) recovery 24/612465488 objects misplaced (0.000%) 1 scrub errors noout flag(s) set all OSDs are running luminous or later but require_osd_release < luminous services: mon: 3 daemons, quorum Ceph-Mon1,Ceph-Mon2,Ceph-Mon3 mgr: Ceph-Mon2(active), standbys: Ceph-Mon1, Ceph-Mon3 osd: 88 osds: 88 up, 88 in; 387 remapped pgs flags noout data: pools: 26 pools, 3248 pgs objects: 87862k objects, 288 TB usage: 442 TB used, 300 TB / 742 TB avail pgs: 0.031% pgs not active 119/612465488 objects degraded (0.000%) 24/612465488 objects misplaced (0.000%) 2720 active+clean 385 active+remapped+backfill_wait 131 active+recovery_wait+degraded 5active+remapped+backfilling 4active+recovering+degraded 1active+clean+inconsistent 1peering 1active+clean+scrubbing+deep io: client: 34264 B/s rd, 2091 kB/s wr, 38 op/s rd, 48 op/s wr recovery: 4235 kB/s, 6 objects/s current ceph health detail HEALTH_WARN noout flag(s) set; Reduced data availability: 3248 pgs inactive; Degraded data redundancy: 3248 pgs unclean OSDMAP_FLAGS noout flag(s) set PG_AVAILABILITY Reduced data availability: 3248 pgs inactive pg 15.7cd is stuck inactive for 24780.157341, current state unknown, last acting [] pg 15.7ce is stuck inactive for 24780.157341, current state unknown, last acting [] pg 15.7cf is stuck inactive for 24780.157341, current state unknown, last acting [] .. pg 15.7ff is stuck inactive for 24728.059692, current state unknown, last acting [] PG_DEGRADED Degraded data redundancy: 3248 pgs unclean pg 15.7cd is stuck unclean for 24728.059692, current state unknown, last acting [] pg 15.7ce is stuck unclean for 24728.059692, current state unknown, last