[ceph-users] Re: large bucket index in multisite environement (how to deal with large omap objects warning)?
I am just creating a bucket with a lot of files to test it. Who would have thought that uploading a million 1k files would take days? Am Di., 9. Nov. 2021 um 00:50 Uhr schrieb prosergey07 : > When resharding is performed I believe its considered as bucket operation > and undergoes through updating the bucket stats. Like new bucket shard is > created and it may increase the number of objects within the bucket stats. > If it was broken during resharding, you could check the current bucket id > from: > radosgw-admin metadata get "bucket:BUCKET_NAME". > > That would hive an idea which bucket index objects to keep. > > Then you could remove corrupted bucket shards (not the ones with the > bucket id from the previous command) .dir.corrupted_bucket_index.SHARD_NUM > objects from bucket.index pool: > > rados -p bucket.index .dir.corrupted_bucket_index.SHARD_NUM > > Where SHARD_NUM is the shard number you want to delete. > > And then running "radosgw-admin bucket check --fix --bucket=BUCKET_NAME" > > That should have resolved your issue with the number of objects. > > As for slow object deletion. Do you run your metadata pools for rgw on > nvme drives ? Specifically bucket.index pool. The problem is that you have > a lot of objects and probably not enough shards. Radosgw retrieves the list > of objects from bucket.index and if I remember correct it retrieves them as > ordered list which is very expensive operation. Hence handful of time might > be spent just on getting the object list. > > We get 1000 objects per second deleted inside our storage. > > > I would not recommend using "--inconsistent-index" to avoid more > consitency issues. > > > > > Надіслано з пристрою Galaxy > > > ---- Оригінальне повідомлення > Від: mhnx > Дата: 08.11.21 13:28 (GMT+02:00) > Кому: Сергей Процун > Копія: "Szabo, Istvan (Agoda)" , Boris Behrens < > b...@kervyn.de>, Ceph Users > Тема: Re: [ceph-users] Re: large bucket index in multisite environement > (how to deal with large omap objects warning)? > > (There should not be any issues using rgw for other buckets while > re-sharding.) > If it is then disabling the bucket access will work right? Also sync > should be disabled. > > Yes, after the manual reshard it should clear the leftovers but in my > situation resharding failed and I got double entries for that bucket. > I didn't push further, instead I divide the bucket to new buckets and > reduce object count with a new bucket tree. Copied all of the objects with > rclone and started bucket remove "radosgw-admin bucket rm --bucket=mybucket > --bypass-gc --purge-objects --max-concurrent-ios=128" it has been very very > long time "started at Sep08" and it is still working. There was 250M > objects in that bucket and after the manual reshard faiI I got 500M object > count when I check with bucket stats num_objects. Now I have; > "size_kb": 10648067645, > "num_objects": 132270190 > > Remove speed is 50-60 objects in a second. It's not because of the cluster > speed. Cluster is fine. > I have space so I let it go. When I see stable object count I will stop > the remove process and start again with the > " --inconsistent-index" parameter. > I wonder is it safe to use the parameter with referenced objects? I want > to learn how "--inconsistent-index" works and what it does. > > Сергей Процун , 5 Kas 2021 Cum, 17:46 tarihinde > şunu yazdı: > >> There should not be any issues using rgw for other buckets while >> re-sharding. >> >> As for doubling number of objects after reshard is an interesting >> situation. After the manual reshard is done, there might be leftover from >> the old bucket index. As during reshard new .dir.new_bucket_index objects >> are created. They contain all data related to the objects which are stored >> in buckets.data pool. Just wondering if the issue with the doubled number >> of objects was related to old bucket index. If so its save to delete old >> bucket index. >> >> In the perfect world, it would be ideal to know the eventoal number of >> objects inside the bucket and set number of shards to the corresponding >> setting initially. >> >> In the real world when the client re-purpose the usage of the bucket, we >> have to deal with reshards. >> >> пт, 5 лист. 2021, 14:43 користувач mhnx пише: >> >>> I also use this method and I hate it. >>> >>> Stopping all of the RGW clients is never an option! It shouldn't be. >>> Sharding is hell. I was have 250M objects in a bucket and reshard faile
[ceph-users] Re: Question if WAL/block.db partition will benefit us
Hi, we use enterprise SSDs like SAMSUNG MZ7KM1T9. The work very well for our block storage. Some NVMe would be a lot nicer but we have some good experience with them. One SSD fail takes down 10 OSDs might sound hard, but this would be an okayish risk. Most of the tunables are defaul in our setup and this looks like PGs have a failure domain of a host. I restart the systems on a regular basis for kernel updates. Also checking disk io with dstat seems to be rather low on the SSDs (below 1k IOPs) root@s3db18:~# dstat --disk --io -T -D sdd --dsk/sdd-- ---io/sdd-- --epoch--- read writ| read writ| epoch 214k 1656k|7.21 126 |1636536603 144k 1176k|2.00 200 |1636536604 128k 1400k|2.00 230 |1636536605 Normaly I would now try this configuration: 1 SSD / 10 OSDs - having 150GB of block.db and block.wal, both on the same partition as someone stated before, and 200GB extra to move all pools except the .data pool to SSDs. But thinking about 10 downed OSDs if one SSD fails let's me wonder how to recover from that. IIRC the configuration per OSDs is in the LVM tags: root@s3db18:~# lvs -o lv_tags LV Tags ceph.block_device=...,ceph.db_device=/dev/sdd8,ceph.db_uuid=011275a3-4201-8840-a678-c2e23d38bfd6,... When the SSD fails, can I just remove the tags and restart the OSD with ceph-volume lvm activate --all? And after replacing the failed SSD readd the tags with the correct IDs? Do I need to do anything else to prepare a block.db partition? Cheers Boris Am Di., 9. Nov. 2021 um 22:15 Uhr schrieb prosergey07 : > Not sure how much it would help the performance with osd's backed with ssd > db and wal devices. Even if you go this route with one ssd per 10 hdd, you > might want to set the failure domain per host in crush rules in case ssd is > out of service. > > But from the practice ssd will not help too much to boost the performance > especially for sharing it between 10 hdds. > > We use nvme db+wal per osd and separate nvme specifically for metadata > pools. There will be a lot of I/O on bucket.index pool and rgw pool which > stores user, bucket metadata. So you might want to put them into separate > fast storage. > > Also if there will not be too much objects, like huge objects but not > tens-hundreds million of them then bucket index will have less presure and > ssd might be okay for metadata pools in that case. > > > > Надіслано з пристрою Galaxy > > > Оригінальне повідомлення > Від: Boris Behrens > Дата: 08.11.21 13:08 (GMT+02:00) > Кому: ceph-users@ceph.io > Тема: [ceph-users] Question if WAL/block.db partition will benefit us > > Hi, > we run a larger octopus s3 cluster with only rotating disks. > 1.3 PiB with 177 OSDs, some with a SSD block.db and some without. > > We have a ton of spare 2TB disks and we just wondered if we can bring the > to good use. > For every 10 spinning disks we could add one 2TB SSD and we would create > two partitions per OSD (130GB for block.db and 20GB for block.wal). This > would leave some empty space on the SSD for waer leveling. > > The question now is: would we benefit from this? Most of the data that is > written to the cluster is very large (50GB and above). This would take a > lot of work into restructuring the cluster and also two other clusters. > > And does it make a different to have only a block.db partition or a > block.db and a block.wal partition? > > Cheers > Boris > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Question if WAL/block.db partition will benefit us
> That does not seem like a lot. Having SSD based metadata pools might > reduce latency though. > So block.db and block.wal doesn't make sense? I would like to have a consistent cluster. In either case I would need to remove or add SSDs, because we currently have this mixed. It does waste a lot of space. But might be worth it if performance > improves a lot. You might also be able to separate small objects from > large objects based on placement targets / storage classes [1]. This > would allow you to store small objects on SSD. Those might be more > latency sensitive than large objects anyway? > > Gr. Stefan > > [1]: https://docs.ceph.com/en/latest/radosgw/placement/ > Puh, large topic. Would removing the smaller files from the spinning disks release enough pressure from the flying heads to speed up large file uploads? Could be a test, but I don't know if this would work as expected. I can imagine that this leads to larger problems, when the SSD OSDs run out of space. Also I would rather add more spinning disks because we also need a lot of space. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: large bucket index in multisite environement (how to deal with large omap objects warning)?
yday and delete it at the end of the week or >>> month >>> then you should definitely use a temp bucket. No versioning, No >>> multisite, >>> No index if it's possible. >>> >>> >>> >>> Szabo, Istvan (Agoda) , 5 Kas 2021 Cum, 12:30 >>> tarihinde şunu yazdı: >>> >>> > You mean prepare or reshard? >>> > Prepare: >>> > I collect as much information for the users before onboarding so I can >>> > prepare for their use case in the future and set things up. >>> > >>> > Preshard: >>> > After created the bucket: >>> > radosgw-admin bucket reshard --bucket=ex-bucket --num-shards=101 >>> > >>> > Also when you shard the buckets, you need to use prime numbers. >>> > >>> > Istvan Szabo >>> > Senior Infrastructure Engineer >>> > --- >>> > Agoda Services Co., Ltd. >>> > e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com> >>> > --- >>> > >>> > From: Boris Behrens >>> > Sent: Friday, November 5, 2021 4:22 PM >>> > To: Szabo, Istvan (Agoda) ; ceph-users@ceph.io >>> > Subject: Re: [ceph-users] large bucket index in multisite environement >>> > (how to deal with large omap objects warning)? >>> > >>> > Email received from the internet. If in doubt, don't click any link nor >>> > open any attachment ! >>> > >>> > Cheers Istvan, >>> > >>> > how do you do this? >>> > >>> > Am Do., 4. Nov. 2021 um 19:45 Uhr schrieb Szabo, Istvan (Agoda) < >>> > istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>>: >>> > This one you need to prepare, you beed to preshard the bucket which you >>> > know that will hold more than millions of objects. >>> > >>> > I have a bucket where we store 1.2 billions of objects with 24xxx >>> shard. >>> > No omap issue. >>> > Istvan Szabo >>> > Senior Infrastructure Engineer >>> > --- >>> > Agoda Services Co., Ltd. >>> > e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com> >>> > --- >>> > >>> > >>> > >>> > -- >>> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend >>> im >>> > groüen Saal. >>> > ___ >>> > ceph-users mailing list -- ceph-users@ceph.io >>> > To unsubscribe send an email to ceph-users-le...@ceph.io >>> > >>> ___ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io >>> >> -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Question if WAL/block.db partition will benefit us
Hi Stefan, for a 6:1 or 3:1 ration we do not have enough slots (I think). There is some read but I don't know if this is a lot. client: 27 MiB/s rd, 289 MiB/s wr, 1.07k op/s rd, 261 op/s wr Putting the to use for some special rgw pools also came to my mind. But would this make a lot of difference? .rgw.root 164 150 KiB 142 26 MiB 0 42 TiB eu-central-1.rgw.control 264 0 B8 0 B 0 42 TiB eu-central-1.rgw.data.root364 1.2 MiB3.96k 743 MiB 0 42 TiB eu-central-1.rgw.gc 464 329 MiB 128 998 MiB 0 42 TiB eu-central-1.rgw.log 564 939 KiB 370 3.1 MiB 0 42 TiB eu-central-1.rgw.users.uid664 12 MiB7.10k 1.2 GiB 0 42 TiB eu-central-1.rgw.users.keys 764 297 KiB7.40k 1.4 GiB 0 42 TiB eu-central-1.rgw.meta 864 392 KiB 1k 191 MiB 0 42 TiB eu-central-1.rgw.users.email 964 40 B1 192 KiB 0 42 TiB eu-central-1.rgw.buckets.index 1064 22 GiB2.55k 67 GiB 0.05 42 TiB eu-central-1.rgw.buckets.data11 2048 318 TiB 132.31M 961 TiB 88.38 42 TiB eu-central-1.rgw.buckets.non-ec 1264 467 MiB 13.28k 2.4 GiB 0 42 TiB eu-central-1.rgw.usage 1364 767 MiB 32 2.2 GiB 0 42 TiB I would have put the rgw.buckets.index and maybe the rgw.meta pools on it, but it looks like a waste of space. Having a 2TB OSD in evey chassis that only handles 23GB of data. Am Mo., 8. Nov. 2021 um 12:30 Uhr schrieb Stefan Kooman : > On 11/8/21 12:07, Boris Behrens wrote: > > Hi, > > we run a larger octopus s3 cluster with only rotating disks. > > 1.3 PiB with 177 OSDs, some with a SSD block.db and some without. > > > > We have a ton of spare 2TB disks and we just wondered if we can bring the > > to good use. > > For every 10 spinning disks we could add one 2TB SSD and we would create > > two partitions per OSD (130GB for block.db and 20GB for block.wal). This > > would leave some empty space on the SSD for waer leveling. > > A 10:1 ratio looks rather high. Discussions on this list indicate this > ratio normally is in the 3:1 up to 6:1 range (for high end NVMe / SSD). > > > > > The question now is: would we benefit from this? Most of the data that is > > written to the cluster is very large (50GB and above). This would take a > > lot of work into restructuring the cluster and also two other clusters. > > > > And does it make a different to have only a block.db partition or a > > block.db and a block.wal partition? > > Does this cluster also gets a lot of reads? I wonder if using the SSD > drives for S3 metadata pools would make more sense. And also be a lot > less work. > > Gr. Stefan > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Question if WAL/block.db partition will benefit us
Hi, we run a larger octopus s3 cluster with only rotating disks. 1.3 PiB with 177 OSDs, some with a SSD block.db and some without. We have a ton of spare 2TB disks and we just wondered if we can bring the to good use. For every 10 spinning disks we could add one 2TB SSD and we would create two partitions per OSD (130GB for block.db and 20GB for block.wal). This would leave some empty space on the SSD for waer leveling. The question now is: would we benefit from this? Most of the data that is written to the cluster is very large (50GB and above). This would take a lot of work into restructuring the cluster and also two other clusters. And does it make a different to have only a block.db partition or a block.db and a block.wal partition? Cheers Boris ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: large bucket index in multisite environement (how to deal with large omap objects warning)?
Cheers Istvan, how do you do this? Am Do., 4. Nov. 2021 um 19:45 Uhr schrieb Szabo, Istvan (Agoda) < istvan.sz...@agoda.com>: > This one you need to prepare, you beed to preshard the bucket which you > know that will hold more than millions of objects. > > I have a bucket where we store 1.2 billions of objects with 24xxx shard. > No omap issue. > > Istvan Szabo > Senior Infrastructure Engineer > --- > Agoda Services Co., Ltd. > e: istvan.sz...@agoda.com > --- > > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: large bucket index in multisite environement (how to deal with large omap objects warning)?
Hi Teoman, I don't sync the bucket content. It's just the metadata that get's synced. But turning off the access to our s3 is not an option, because our customer rely on it (the make backups and serve objects for their web applications through it). Am Do., 4. Nov. 2021 um 18:20 Uhr schrieb Teoman Onay : > AFAIK dynamic resharding is not supported for multisite setups but you can > reshard manually. > Note that this is a very expensive process which requires you to: > > - disable the sync of the bucket you want to reshard. > - Stops all the RGW (no more access to your Ceph cluster) > - On a node of the master zone, reshard the bucket > - On the secondary zone, purge the bucket > - Restart the RGW(s) > - re-enable sync of the bucket. > > 4m objects/bucket is way to much... > > Regards > > Teoman > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] large bucket index in multisite environement (how to deal with large omap objects warning)?
Hi everybody, we maintain three ceph clusters (2x octopus, 1x nautilus) that use three zonegroups to sync metadata, without syncing the actual data (only one zone per zonegroup). Some customer got buckets with >4m objects in our largest cluster (the other two a very fresh with close to 0 data in it) How do I handle that in regards of the "Large OMAP objects" warning? - Sharding is not an option, because it is a multisite environment (at least thats what I read everywere) - Limiting the customers is not a great option, because he already got that huge amount of files in their buckets - disabling the warning / increasing the threashold, is IMHO a bad option (people might have put some thinking in that limit and having 40x the limit is far off the "just roll with it" threashold) I really hope that someone does have an answer, or maybe there is some roadmap which addresses this issue. Cheers Boris ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: s3cmd does not show multiparts in nautilus RGW on specific bucket (--debug shows loop)
Hi guys, we just updated the cluster to latest octopus, but we still can not list multipart uploads if there are more than 2k multiparts. Is there any way to show the multiparts and maybe cancel them? Am Mo., 25. Okt. 2021 um 16:23 Uhr schrieb Boris Behrens : > Hi Casey, > > thanks a lot for that hint. That sound a lot like this is the problem. > Is there a way to show incomplete multipart uploads via radosgw-admin? > > So I would be able to cancel it. > > Upgrading to octopus might take a TON of time, as we have 1.1 PiB in 160 > OSDs rotational disks. :) > > Am Mo., 25. Okt. 2021 um 16:19 Uhr schrieb Casey Bodley < > cbod...@redhat.com>: > >> hi Boris, this sounds a lot like >> https://tracker.ceph.com/issues/49206, which says "When deleting a >> bucket with an incomplete multipart upload that has about 2000 parts >> uploaded, we noticed an infinite loop, which stopped s3cmd from >> deleting the bucket forever." >> >> i'm afraid this fix was merged after nautilus went end-of-life, so >> you'd need to upgrade to octopus for it >> >> On Mon, Oct 25, 2021 at 9:52 AM Boris Behrens wrote: >> > >> > Good day everybody, >> > >> > I just came across very strange behavior. I have two buckets where s3cmd >> > hangs when I try to show current multipart uploads. >> > >> > When I use --debug I see that it loops over the same response. >> > What I tried to fix it on one bucket: >> > * radosgw-admin bucket check --bucket=BUCKETNAME >> > * radosgw-admin bucket check --check-objects --fix --bucket=BUCKETNAME >> > >> > The check command now reports an empty array [], but I still can't show >> the >> > multiparts. I can interact very normal with the bucket (list/put/get >> > objects). >> > >> > The debug output shows always the same data and >> > DEBUG: Listing continues after 'FILENAME' >> > >> > Did someone already came across this error? >> > ___ >> > ceph-users mailing list -- ceph-users@ceph.io >> > To unsubscribe send an email to ceph-users-le...@ceph.io >> > >> >> > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > groüen Saal. > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: upgrade OSDs before mon
Hi Yury, unfortunally not. It's a package installation and there are no nautilus packages in ubuntu 20.04 (just realised this). Now the question: downgrade ubuntu to 18.04 and start over, or keep the octopus OSDs in a nautilus cluster? Would be cool if the last one is working properly. Am Di., 26. Okt. 2021 um 15:47 Uhr schrieb Yury Kirsanov < y.kirsa...@gmail.com>: > You can downgrade any CEPH packages if you want to. Just specify the > number you'd like to go to. > > On Wed, Oct 27, 2021 at 12:36 AM Boris Behrens wrote: > >> Hi, >> I just added new storage to our s3 cluster and saw that ubuntu didn't >> priortize the nautilus package over the octopus package. >> >> Now I have 10 OSDs with octopus in a pure nautilus cluster. >> >> Can I leave it this way, or should I remove the OSDs and first upgrade the >> mons? >> >> Cheers >> Boris >> >> -- >> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im >> groüen Saal. >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] upgrade OSDs before mon
Hi, I just added new storage to our s3 cluster and saw that ubuntu didn't priortize the nautilus package over the octopus package. Now I have 10 OSDs with octopus in a pure nautilus cluster. Can I leave it this way, or should I remove the OSDs and first upgrade the mons? Cheers Boris -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: s3cmd does not show multiparts in nautilus RGW on specific bucket (--debug shows loop)
Hi Casey, thanks a lot for that hint. That sound a lot like this is the problem. Is there a way to show incomplete multipart uploads via radosgw-admin? So I would be able to cancel it. Upgrading to octopus might take a TON of time, as we have 1.1 PiB in 160 OSDs rotational disks. :) Am Mo., 25. Okt. 2021 um 16:19 Uhr schrieb Casey Bodley : > hi Boris, this sounds a lot like > https://tracker.ceph.com/issues/49206, which says "When deleting a > bucket with an incomplete multipart upload that has about 2000 parts > uploaded, we noticed an infinite loop, which stopped s3cmd from > deleting the bucket forever." > > i'm afraid this fix was merged after nautilus went end-of-life, so > you'd need to upgrade to octopus for it > > On Mon, Oct 25, 2021 at 9:52 AM Boris Behrens wrote: > > > > Good day everybody, > > > > I just came across very strange behavior. I have two buckets where s3cmd > > hangs when I try to show current multipart uploads. > > > > When I use --debug I see that it loops over the same response. > > What I tried to fix it on one bucket: > > * radosgw-admin bucket check --bucket=BUCKETNAME > > * radosgw-admin bucket check --check-objects --fix --bucket=BUCKETNAME > > > > The check command now reports an empty array [], but I still can't show > the > > multiparts. I can interact very normal with the bucket (list/put/get > > objects). > > > > The debug output shows always the same data and > > DEBUG: Listing continues after 'FILENAME' > > > > Did someone already came across this error? > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] s3cmd does not show multiparts in nautilus RGW on specific bucket (--debug shows loop)
Good day everybody, I just came across very strange behavior. I have two buckets where s3cmd hangs when I try to show current multipart uploads. When I use --debug I see that it loops over the same response. What I tried to fix it on one bucket: * radosgw-admin bucket check --bucket=BUCKETNAME * radosgw-admin bucket check --check-objects --fix --bucket=BUCKETNAME The check command now reports an empty array [], but I still can't show the multiparts. I can interact very normal with the bucket (list/put/get objects). The debug output shows always the same data and DEBUG: Listing continues after 'FILENAME' Did someone already came across this error? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: recreate a period in radosgw
What I've tested so far in a testcluster: 1. create a new realm with the same name (just swap two letters) 2. remove the realm 3. get the periods file from the .rgw.root pool 4. correct the name at the end and switch out the two realmid 5. upload the file again 6. change the period for the realm with realm set 7. period update; period update --commit This looks like it is correct, but I am not sure if this is the correct way. Does someone got another way to do this? Am Do., 14. Okt. 2021 um 15:44 Uhr schrieb Boris Behrens : > Hi, > is there a way to restore a deleted period? > > The realm, zonegroup and zone are still there, but I can't apply any > changes, because the period is missing. > > Cheers > Boris > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] recreate a period in radosgw
Hi, is there a way to restore a deleted period? The realm, zonegroup and zone are still there, but I can't apply any changes, because the period is missing. Cheers Boris ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] shards falling behind on multisite metadata sync
Hi, does someone got a quick fix for falling behin shards in the metadata sync? I can do a radosgw-admin metadata sync init and restart the rgw daemons to get a full sync, but after a day the first shards falls behind, and after two days I also get the message with "oldest incremental change not applied ... [root@3cecef5afc28 ~]# radosgw-admin sync status realm 5d6f2ea4-b84a-459b-bce2-bccac338b3ef (company) zonegroup f6f3f550-89f0-4c0d-b9b0-301a06c52c16 (bc01) zone a7edb6fe-737f-4a1c-a333-0ba0566bb3dd (bc01) metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is behind on 6 shards behind shards: [13,31,35,36,46,60] oldest incremental change not applied: 2021-09-30 17:54:22.0.270207s [35] I've tried to check the sync errors, but here I get a lot of "failed to read remote metadata entry: (5) Input/output error" and trimming it does not seem to work. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: debugging radosgw sync errors
Ah found it. It was a SSL certificate that was invalid (some PoC which started to mold). Now the sync is running fine, but there is one bucket that got a ton of data in the mdlog. [root@s3db16 ~]# radosgw-admin mdlog list | grep temonitor | wc -l No --period given, using current period=e8fc96f1-ae86-4dc1-b432-470b0772fded 284760 [root@s3db16 ~]# radosgw-admin mdlog list | grep name | wc -l No --period given, using current period=e8fc96f1-ae86-4dc1-b432-470b0772fded 343078 Is it safe to clear the mdlog? Am Mo., 20. Sept. 2021 um 01:00 Uhr schrieb Boris Behrens : > I just deleted the rados object from .rgw.data.root and this removed the > bucket.instance, but this did not solve the problem. > > It looks like there is some access error when I try to radosgw-admin > metadata sync init. > The 403 http response code on the post to the /admin/realm/period endpoint. > > I checked the system_key and added a new system user and set the keys with > zone modify and period update --commit on both sides. > This also did not help. > > After a weekend digging through the mailing list and trying to fix it, I > am totally stuck. > I hope that someone of you people can help me. > > > > > Am Fr., 17. Sept. 2021 um 17:54 Uhr schrieb Boris Behrens : > >> While searching for other things I came across this: >> [root ~]# radosgw-admin metadata list bucket | grep www1 >> "www1", >> [root ~]# radosgw-admin metadata list bucket.instance | grep www1 >> "www1:ff7a8b0c-07e6-463a-861b-78f0adeba8ad.81095307.31103", >> "www1.company.dev", >> [root ~]# radosgw-admin bucket list | grep www1 >> "www1", >> [root ~]# radosgw-admin metadata rm bucket.instance:www1.company.dev >> ERROR: can't remove key: (22) Invalid argument >> >> Maybe this is part of the problem. >> >> Did somebody saw this and know what to do? >> -- >> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im >> groüen Saal. >> > > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > groüen Saal. > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: debugging radosgw sync errors
I just deleted the rados object from .rgw.data.root and this removed the bucket.instance, but this did not solve the problem. It looks like there is some access error when I try to radosgw-admin metadata sync init. The 403 http response code on the post to the /admin/realm/period endpoint. I checked the system_key and added a new system user and set the keys with zone modify and period update --commit on both sides. This also did not help. After a weekend digging through the mailing list and trying to fix it, I am totally stuck. I hope that someone of you people can help me. Am Fr., 17. Sept. 2021 um 17:54 Uhr schrieb Boris Behrens : > While searching for other things I came across this: > [root ~]# radosgw-admin metadata list bucket | grep www1 > "www1", > [root ~]# radosgw-admin metadata list bucket.instance | grep www1 > "www1:ff7a8b0c-07e6-463a-861b-78f0adeba8ad.81095307.31103", > "www1.company.dev", > [root ~]# radosgw-admin bucket list | grep www1 > "www1", > [root ~]# radosgw-admin metadata rm bucket.instance:www1.company.dev > ERROR: can't remove key: (22) Invalid argument > > Maybe this is part of the problem. > > Did somebody saw this and know what to do? > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > groüen Saal. > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: debugging radosgw sync errors
While searching for other things I came across this: [root ~]# radosgw-admin metadata list bucket | grep www1 "www1", [root ~]# radosgw-admin metadata list bucket.instance | grep www1 "www1:ff7a8b0c-07e6-463a-861b-78f0adeba8ad.81095307.31103", "www1.company.dev", [root ~]# radosgw-admin bucket list | grep www1 "www1", [root ~]# radosgw-admin metadata rm bucket.instance:www1.company.dev ERROR: can't remove key: (22) Invalid argument Maybe this is part of the problem. Did somebody saw this and know what to do? -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: radosgw find buckets which use the s3website feature
Found it: for bucket in `radosgw-admin metadata list bucket.instance | jq .[] | cut -f2 -d\"`; do if radosgw-admin metadata get --metadata-key=bucket.instance:$bucket | grep --silent website_conf; then echo $bucket fi done Am Do., 16. Sept. 2021 um 09:49 Uhr schrieb Boris Behrens : > Hi people, > > is there a way to find bucket that use the s3website feature? > > Cheers > Boris > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] debugging radosgw sync errors
Hello again, as my tests with some fresh clusters answerd most of my config questions, I now wanted to start with our production cluster and the basic setup looks good, but the sync does not work: [root@3cecef5afb05 ~]# radosgw-admin sync status realm 5d6f2ea4-b84a-459b-bce2-bccac338b3ef (company) zonegroup f6f3f550-89f0-4c0d-b9b0-301a06c52c16 (bc01) zone a7edb6fe-737f-4a1c-a333-0ba0566bb3dd (bc01) metadata sync preparing for full sync full sync: 64/64 shards full sync: 0 entries to sync failed to fetch master sync status: (5) Input/output error [root@3cecef5afb05 ~]# radosgw-admin metadata sync run 2021-09-17 16:23:08.346 7f6c83c63840 0 meta sync: ERROR: failed to fetch metadata sections ERROR: sync.run() returned ret=-5 2021-09-17 16:23:08.474 7f6c83c63840 0 RGW-SYNC:meta: ERROR: failed to fetch all metadata keys (r=-5) And when I check "radosgw-admin period get", the sync_status is just an array of empty strings: [root@3cecef5afb05 ~]# radosgw-admin period get { "id": "e8fc96f1-ae86-4dc1-b432-470b0772fded", "epoch": 71, "predecessor_uuid": "5349ac85-3d6d-4088-993f-7a1d4be3835a", "sync_status": [ "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", How can I debug what is going wrong? I triet to dig into the logs and see a lot of these messages: 2021-09-17 14:06:04.144 7f755b4e7700 1 civetweb: 0x5641a22b33a8: IPV6_OF_OUR_HAPROXY - - [17/Sep/2021:14:06:04 +] "GET /admin/log/?type=metadata&status&rgwx-zonegroup=da651dc1-2663-4e1b-af2e-ac4454f24c9d HTTP/1.1" 403 439 - - 2021-09-17 14:06:11.646 7f755f4ef700 1 civetweb: 0x5641a22ae4e8: IPV6_OF_OUR_HAPROXY - - [17/Sep/2021:14:06:11 +] "POST /admin/realm/period?period=e8fc96f1-ae86-4dc1-b432-470b0772fded&epoch=71&rgwx-zonegroup=da651dc1-2663-4e1b-af2e-ac4454f24c9d HTTP/1.1" 403 439 - - The 403 status makes me think I might have an access problem, but pulling the realm/period from the master was successful. Also the period commit from the new cluster worked fine. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] radosgw find buckets which use the s3website feature
Hi people, is there a way to find bucket that use the s3website feature? Cheers Boris ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Questions about multiple zonegroups (was Problem with multi zonegroup configuration)
Ok, I think I found the basic problem. I used to talk to the endpoint that is also the Domain for the s3websites. After switching the domains around everything worked fine. :partyemote: I have wrote down what I think how things work together (wrote down here IYAI https://pastebin.com/6Gj9Q5hJ), and I got three additional questions: * how do I "pull" a zone to another storage cluster? * how to make the syncing user more secure? * how to limit users to a specific zone or zonegroup? cheers :) -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Problem with multi zonegroup configuration
Someone got any ideas? I am not even sure if I am thinking it correctly. I only want to have users and bucketname synced, so they are unique. But not the data. I don't want to have redundance. The documentation reads like I need multiple zonegroups with a single zone each. Am Mo., 13. Sept. 2021 um 11:47 Uhr schrieb Boris Behrens : > Dear ceph community, > > I am still stuck with the multi zonegroup configuration. I did these steps: > 1. Create realm (company), zonegroup(eu), zone(eu-central-1), sync user on > the site fra1 > 2. Pulled the realm and the period in fra2 > 3. Creted the zonegroup(eu-central-2), zone (eu-central-2), modified zone > (eu-centrla-2) >with the credentials of the sunc user on the site fra2. > 4. Did a 'period update --commit' and 'metadata sync init; metadata sync > run' on the site fra2. > > Syncing now seem to work. If I create a user it will be synced. If the > user creates a bucket, > this also gets synced, without data (I don't want to sync data. Only > metadata). > > But I still have some issues with working with these clusters. I am not > able to upload any data. > If I try to list bucket, I receive "NoSuchBucket". > > I currently think it is a configuration problem with mit period and > ceph.conf > > Down below: > * The output from s3cmd > * my s3cmd config > * radosgw-admin period get > * ceph.conf (fra1/fra2) > > ## > [workstation]# s3cmd --config ~/.s3cfg_testing_fra1 la > ERROR: Error parsing xml: no element found: line 9, column 0 > ERROR: b'\n 404 Not Found\n \n > 404 Not Found\n \n Code: NoSuchBucket\n > RequestId: tx0130d0071-00613f1c58-69a6e-eu-central-1\n > HostId: 69a6e-eu-central-1-eu\n' > ERROR: S3 error: 404 (Not Found) > > ## > [workstation]# cat ~/.s3cfg_testing_fra1 > [default] > access_key = > bucket_location = eu-central-1 > host_base = eu-central-1.company.dev > host_bucket = %(bucket)s.eu-central-1.company.dev > secret_key = Y > website_endpoint = https://%(bucket)s.eu-central-1.company.dev > > ## > [fra1]# radosgw-admin period get > { > "id": "f8aed695-8f57-47dd-a0b9-de847ccc5cb5", > "epoch": 42, > "predecessor_uuid": "c748ead2-424a-4209-b183-b0989c8bda0c", > "sync_status": [], > "period_map": { > "id": "f8aed695-8f57-47dd-a0b9-de847ccc5cb5", > "zonegroups": [ > { > "id": "61dfe354-bf61-4a08-9e4d-e7a2228cc651", > "name": "eu-central-2", > "api_name": "eu-central-2", > "is_master": "false", > "endpoints": [ > "https://eu-central-2.company.dev"; > ], > "hostnames": [ > "eu-central-2.company.dev" > ], > "hostnames_s3website": [ > "eu-central-2.company.dev" > ], > "master_zone": "aafa8c61-84f0-48f0-a4f1-110306f83bce", > "zones": [ > { > "id": "aafa8c61-84f0-48f0-a4f1-110306f83bce", > "name": "eu-central-2", > "endpoints": [ > "https://eu-central-2.company.dev"; > ], > "log_meta": "false", > "log_data": "false", > "bucket_index_max_shards": 11, > "read_only": "false", > "tier_type": "", > "sync_from_all": "true", > "sync_from": [], > "redirect_zone": "" > } > ], > "placement_targets": [ > { > "name": "default-placement", > "tags": [], > "storage_classes": [ > "STANDARD" > ] > } > ], > "
[ceph-users] Re: [Suspicious newsletter] Problem with multi zonegroup configuration
I don't want to sync data between zones. I only want to sync the metadata. This is meant to have users and buckets unique over multiple datacenter, but not build a mirror for data. Am Mo., 13. Sept. 2021 um 13:14 Uhr schrieb Szabo, Istvan (Agoda) < istvan.sz...@agoda.com>: > I don't see any sync rule like you want to do directional sync between 2 > zones, no pipe and no flow also. > > Istvan Szabo > Senior Infrastructure Engineer > --- > Agoda Services Co., Ltd. > e: istvan.sz...@agoda.com > --- > > -Original Message- > From: Boris Behrens > Sent: Monday, September 13, 2021 4:48 PM > To: ceph-users@ceph.io > Subject: [Suspicious newsletter] [ceph-users] Problem with multi zonegroup > configuration > > Email received from the internet. If in doubt, don't click any link nor > open any attachment ! > > > Dear ceph community, > > I am still stuck with the multi zonegroup configuration. I did these steps: > 1. Create realm (company), zonegroup(eu), zone(eu-central-1), sync user on > the site fra1 2. Pulled the realm and the period in fra2 3. Creted the > zonegroup(eu-central-2), zone (eu-central-2), modified zone > (eu-centrla-2) >with the credentials of the sunc user on the site fra2. > 4. Did a 'period update --commit' and 'metadata sync init; metadata sync > run' on the site fra2. > > Syncing now seem to work. If I create a user it will be synced. If the > user creates a bucket, this also gets synced, without data (I don't want to > sync data. Only metadata). > > But I still have some issues with working with these clusters. I am not > able to upload any data. > If I try to list bucket, I receive "NoSuchBucket". > > I currently think it is a configuration problem with mit period and > ceph.conf > > Down below: > * The output from s3cmd > * my s3cmd config > * radosgw-admin period get > * ceph.conf (fra1/fra2) > > ## > [workstation]# s3cmd --config ~/.s3cfg_testing_fra1 la > ERROR: Error parsing xml: no element found: line 9, column 0 > ERROR: b'\n 404 Not Found\n \n > 404 Not Found\n \n Code: NoSuchBucket\n > RequestId: tx0130d0071-00613f1c58-69a6e-eu-central-1\n > HostId: 69a6e-eu-central-1-eu\n' > ERROR: S3 error: 404 (Not Found) > > ## > [workstation]# cat ~/.s3cfg_testing_fra1 [default] access_key = > bucket_location = eu-central-1 host_base = > eu-central-1.company.dev host_bucket = %(bucket)s.eu-central-1.company.dev > secret_key = Y > website_endpoint = https://%(bucket)s.eu-central-1.company.dev > > ## > [fra1]# radosgw-admin period get > { > "id": "f8aed695-8f57-47dd-a0b9-de847ccc5cb5", > "epoch": 42, > "predecessor_uuid": "c748ead2-424a-4209-b183-b0989c8bda0c", > "sync_status": [], > "period_map": { > "id": "f8aed695-8f57-47dd-a0b9-de847ccc5cb5", > "zonegroups": [ > { > "id": "61dfe354-bf61-4a08-9e4d-e7a2228cc651", > "name": "eu-central-2", > "api_name": "eu-central-2", > "is_master": "false", > "endpoints": [ > "https://eu-central-2.company.dev"; > ], > "hostnames": [ > "eu-central-2.company.dev" > ], > "hostnames_s3website": [ > "eu-central-2.company.dev" > ], > "master_zone": "aafa8c61-84f0-48f0-a4f1-110306f83bce", > "zones": [ > { > "id": "aafa8c61-84f0-48f0-a4f1-110306f83bce", > "name": "eu-central-2", > "endpoints": [ > "https://eu-central-2.company.dev"; > ], > "log_meta": "false", > "log_data": "false", > "bucket_index_max_shards": 11, > "read_only": "false", > "tier_type": "", > "sync_fr
[ceph-users] Problem with multi zonegroup configuration
Dear ceph community, I am still stuck with the multi zonegroup configuration. I did these steps: 1. Create realm (company), zonegroup(eu), zone(eu-central-1), sync user on the site fra1 2. Pulled the realm and the period in fra2 3. Creted the zonegroup(eu-central-2), zone (eu-central-2), modified zone (eu-centrla-2) with the credentials of the sunc user on the site fra2. 4. Did a 'period update --commit' and 'metadata sync init; metadata sync run' on the site fra2. Syncing now seem to work. If I create a user it will be synced. If the user creates a bucket, this also gets synced, without data (I don't want to sync data. Only metadata). But I still have some issues with working with these clusters. I am not able to upload any data. If I try to list bucket, I receive "NoSuchBucket". I currently think it is a configuration problem with mit period and ceph.conf Down below: * The output from s3cmd * my s3cmd config * radosgw-admin period get * ceph.conf (fra1/fra2) ## [workstation]# s3cmd --config ~/.s3cfg_testing_fra1 la ERROR: Error parsing xml: no element found: line 9, column 0 ERROR: b'\n 404 Not Found\n \n 404 Not Found\n \n Code: NoSuchBucket\n RequestId: tx0130d0071-00613f1c58-69a6e-eu-central-1\n HostId: 69a6e-eu-central-1-eu\n' ERROR: S3 error: 404 (Not Found) ## [workstation]# cat ~/.s3cfg_testing_fra1 [default] access_key = bucket_location = eu-central-1 host_base = eu-central-1.company.dev host_bucket = %(bucket)s.eu-central-1.company.dev secret_key = Y website_endpoint = https://%(bucket)s.eu-central-1.company.dev ## [fra1]# radosgw-admin period get { "id": "f8aed695-8f57-47dd-a0b9-de847ccc5cb5", "epoch": 42, "predecessor_uuid": "c748ead2-424a-4209-b183-b0989c8bda0c", "sync_status": [], "period_map": { "id": "f8aed695-8f57-47dd-a0b9-de847ccc5cb5", "zonegroups": [ { "id": "61dfe354-bf61-4a08-9e4d-e7a2228cc651", "name": "eu-central-2", "api_name": "eu-central-2", "is_master": "false", "endpoints": [ "https://eu-central-2.company.dev"; ], "hostnames": [ "eu-central-2.company.dev" ], "hostnames_s3website": [ "eu-central-2.company.dev" ], "master_zone": "aafa8c61-84f0-48f0-a4f1-110306f83bce", "zones": [ { "id": "aafa8c61-84f0-48f0-a4f1-110306f83bce", "name": "eu-central-2", "endpoints": [ "https://eu-central-2.company.dev"; ], "log_meta": "false", "log_data": "false", "bucket_index_max_shards": 11, "read_only": "false", "tier_type": "", "sync_from_all": "true", "sync_from": [], "redirect_zone": "" } ], "placement_targets": [ { "name": "default-placement", "tags": [], "storage_classes": [ "STANDARD" ] } ], "default_placement": "default-placement", "realm_id": "be137deb-1072-447c-bd96-def84626872f" }, { "id": "b65bbdfd-0555-43eb-9365-8bc72df2efd5", "name": "eu", "api_name": "eu", "is_master": "true", "endpoints": [ "https://eu-central-1.company.dev"; ], "hostnames": [ "eu-central-1.company.dev" ], "hostnames_s3website": [ "eu-central-1.company.dev" ], "master_zone": "6afad715-c0e1-4100-9db2-98ed31de0123", "zones": [ { "id": "6afad715-c0e1-4100-9db2-98ed31de0123", "name": "eu-central-1", "endpoints": [ "https://eu-central-1.company.dev"; ], "log_meta": "false", "log_data": "false", "bucket_index_max_shards": 0, "read_only": "false", "tier_type": "", "sync_from_all": "true", "sync_from": [], "redirect_zone": "" } ], "placement_targets": [ {
[ceph-users] Re: [Suspicious newsletter] Re: create a Multi-zone-group sync setup
Yes, I want to open up a new DC where people can store their objects, but I want the bucket names and users unique over both DC. After some reading I found that I need one realm with multiple zonegroups, each containing only one zone. No sync of actual user data, but metadata like users or used bucket names. So I created a test setup which contains three servers on each side, each server is used for mon,mgr,osd,radosgw. One is a nautilus installation (the master) and the other is a octopus installation. I've set up realm,first zonegroup with the zone and a sync user in the master setup, and commited. Then I've pulled the periode on the 2nd setup and added a 2nd zonegroup with a zone and commited. Now I can create users in the master setup, but not in the 2nd (as it doesn't sync back). But I am not able to create a bucket or so with the credentials of the users I created. Am Mi., 18. Aug. 2021 um 06:08 Uhr schrieb Szabo, Istvan (Agoda) < istvan.sz...@agoda.com>: > Hi, > > " but have a global namespace where all buckets and users are uniqe." > > You mean manage multiple cluster from 1 "master" cluster but ono sync? So > 1 realm, multiple dc BUT no sync? > > Istvan Szabo > Senior Infrastructure Engineer > --- > Agoda Services Co., Ltd. > e: istvan.sz...@agoda.com > ------- > > -Original Message- > From: Boris Behrens > Sent: Tuesday, August 17, 2021 8:51 PM > To: ceph-users@ceph.io > Subject: [Suspicious newsletter] [ceph-users] Re: create a > Multi-zone-group sync setup > > Email received from the internet. If in doubt, don't click any link nor > open any attachment ! > > > Hi, after some trial and error I got it working, so users will get synced. > > However, If I try to create a bucket via s3cmd I receive the following > error: > s3cmd --access_key=XX --secret_key=YY --host=HOST mb s3://test > ERROR: S3 error: 403 (InvalidAccessKeyId) > > When I try the same with ls I just get an empty response (because there > are no buckets to list). > > I get this against both radosgw locations. > I have an nginx in between the internet and radosgw that will just proxy > pass every address and sets host and x-forwarded-for header. > > > Am Fr., 30. Juli 2021 um 16:46 Uhr schrieb Boris Behrens : > > > Hi people, > > > > I try to create a Multi-zone-group setup (like it is described here: > > https://docs.ceph.com/en/latest/radosgw/multisite/) > > > > But I simply fail. > > > > I just created a testcluster to mess with it, and no matter how I try to. > > > > Is there a howto avaialable? > > > > I don't want to get a multi-zone setup, where I sync the actual zone > > data, but have a global namespace where all buckets and users are uniqe. > > > > Cheers > > Boris > > > > -- > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend > > im groüen Saal. > > > > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > groüen Saal. > ___ > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an > email to ceph-users-le...@ceph.io > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: create a Multi-zone-group sync setup
Hi, after some trial and error I got it working, so users will get synced. However, If I try to create a bucket via s3cmd I receive the following error: s3cmd --access_key=XX --secret_key=YY --host=HOST mb s3://test ERROR: S3 error: 403 (InvalidAccessKeyId) When I try the same with ls I just get an empty response (because there are no buckets to list). I get this against both radosgw locations. I have an nginx in between the internet and radosgw that will just proxy pass every address and sets host and x-forwarded-for header. Am Fr., 30. Juli 2021 um 16:46 Uhr schrieb Boris Behrens : > Hi people, > > I try to create a Multi-zone-group setup (like it is described here: > https://docs.ceph.com/en/latest/radosgw/multisite/) > > But I simply fail. > > I just created a testcluster to mess with it, and no matter how I try to. > > Is there a howto avaialable? > > I don't want to get a multi-zone setup, where I sync the actual zone data, > but have a global namespace where all buckets and users are uniqe. > > Cheers > Boris > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > groüen Saal. > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Discard / Trim does not shrink rbd image size when disk is partitioned
Hi Janne, thanks for the hint. I was aware of that, but it is goot to add that knowledge to the question for further googlesearcher. Hi Ilya, that fixed it. Do we know why the discard does not work when the partition table is not aligned? We provide OS templates to our customer, but they can also create and attach an empty block device, and they will certainly not check if the partitions are aligned correctly. Cheers Boris Am Fr., 13. Aug. 2021 um 08:44 Uhr schrieb Janne Johansson < icepic...@gmail.com>: > Den tors 12 aug. 2021 kl 17:04 skrev Boris Behrens : > > Hi everybody, > > we just stumbled over a problem where the rbd image does not shrink, when > > files are removed. > > This only happenes when the rbd image is partitioned. > > > > * We tested it with centos8/ubuntu20.04 with ext4 and a gpt partition > table > > (/boot and /) > > * the kvm device is virtio-scsi-pci with krbd > > * Mount option discard is set > > * command to create large file: dd if=/dev/zero of=testfile bs=64M > > count=1000 > > * the image grows in the size we expect > > * when we remove the testfile the rbd image stays at the size > > * we wen recreate the deleted file with the command the rbd image grows > > further > > Just a small nit on this single point, regardless of if trim/discard > works or not: > There is no guarantee that writing a file, removing it and then > re-writing a file > will ever end up in the same spot again. In fact, most modern filesystems > will > probably make sure to NOT place things at the same spot again. > Since the second write ends up in a different place, it will once again > expand > your sparse/thin image by the amount of written bytes, this is very much > to be expected. > > I'm sorry if you already knew this and I am just stating the obvious to > you, but > your text came over as if you expected the second write to not increase the > image since that "space" was already blown up on the first write. > > Trim/discard should still be investigated so you can make it shrink back > again somehow, just wanted to point this out for the records. > > > -- > May the most significant bit of your life be positive. > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Discard / Trim does not shrink rbd image size when disk is partitioned
Hi everybody, we just stumbled over a problem where the rbd image does not shrink, when files are removed. This only happenes when the rbd image is partitioned. * We tested it with centos8/ubuntu20.04 with ext4 and a gpt partition table (/boot and /) * the kvm device is virtio-scsi-pci with krbd * Mount option discard is set * command to create large file: dd if=/dev/zero of=testfile bs=64M count=1000 * the image grows in the size we expect * when we remove the testfile the rbd image stays at the size * we wen recreate the deleted file with the command the rbd image grows further * using fstrim does not work * adding a new disk and initialize the ext4 directly on the disk (wihtout partitioning) the trim does work and the rbd image shrinks back to a couple GB * we use ceph 14.2.21 Does anybody experienced the same issue and maybe know how to solve the problem? -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] create a Multi-zone-group sync setup
Hi people, I try to create a Multi-zone-group setup (like it is described here: https://docs.ceph.com/en/latest/radosgw/multisite/) But I simply fail. I just created a testcluster to mess with it, and no matter how I try to. Is there a howto avaialable? I don't want to get a multi-zone setup, where I sync the actual zone data, but have a global namespace where all buckets and users are uniqe. Cheers Boris -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] understanding multisite radosgw syncing
Hi, I wanted to set up a multisite radosgw environment where only bucketnames and userinfo should get synced. Basically I don't want that user data is synced but buckets and userids are still uniqe inside the zonegroup. For this I've gone though this howto ( https://docs.ceph.com/en/latest/radosgw/multisite/) and have set the my zonegroup config to this: "zones": [ { "id": "07cdb1c7-8c8e-4a23-ab1e-fcfb88982f38", "name": "eu-central-2", "endpoints": [ "https://gw2/"; ], "log_meta": "false", "log_data": "false", "bucket_index_max_shards": 11, "read_only": "false", "tier_type": "", "sync_from_all": "true", "sync_from": [], "redirect_zone": "" }, { "id": "ff7a8b0c-07e6-463a-861b-78f0adeba8ad", "name": "eu-central-1", "endpoints": [ "https://gw1/"; ], "log_meta": "true", "log_data": "false", "bucket_index_max_shards": 0, "read_only": "false", "tier_type": "", "sync_from_all": "true", "sync_from": [], "redirect_zone": "" } but the sync status on both systems looks like it wants to replicate data: gw2 realm 5d6f2ea4-b84a-459b-bce2-bccac338b3ef (world) zonegroup da651dc1-2663-4e1b-af2e-ac4454f24c9d (eu) zone 07cdb1c7-8c8e-4a23-ab1e-fcfb88982f38 (eu-central-2) metadata sync preparing for full sync full sync: 64/64 shards full sync: 0 entries to sync failed to fetch master sync status: (5) Input/output error 2021-07-27T11:24:06.772+ 7f4638c23b40 0 data sync zone:ff7a8b0c ERROR: failed to fetch datalog info data sync source: ff7a8b0c-07e6-463a-861b-78f0adeba8ad (eu-central-1) init full sync: 128/128 shards full sync: 0 buckets to sync incremental sync: 0/128 shards gw1 realm 5d6f2ea4-b84a-459b-bce2-bccac338b3ef (world) zonegroup da651dc1-2663-4e1b-af2e-ac4454f24c9d (eu) zone ff7a8b0c-07e6-463a-861b-78f0adeba8ad (eu-central-1) metadata sync no sync (zone is master) 2021-07-27 11:24:24.645 7fe30fc07840 0 data sync zone:07cdb1c7 ERROR: failed to fetch datalog info data sync source: 07cdb1c7-8c8e-4a23-ab1e-fcfb88982f38 (eu-central-2) failed to retrieve sync info: (13) Permission denied The gw2 is not set up yet, so the sync from gw1 will not happen (and I am still figuring out why gw2 can not pull from gw1, but this is something I worry later). Do I need to change the zonegroup config to not have the data synced? Cheers Boris -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Deleting large objects via s3 API leads to orphan objects
Hello my dear ceph community, I am now dealing with a lot of orphan objects and today I got the time to dig into it. What I basically found is that large objects get removed from radosgw, but not from rados. This leads to a huge amount of orphan objects. I"ve found this RH bug from last year (https://bugzilla.redhat.com/show_bug.cgi?id=1844720), and would like to know if there is any workaround. Together with this bug (https://tracker.ceph.com/issues/50293) it makes it very tedious to search for those objects, because we nearly have a constant ingress of data in some of those buckets. We are currently running 14.2.21 through the board. Cheers Boris -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Files listed in radosgw BI but is not available in ceph
Hi Dan, hi Rafael, we found the issue. It was a cleanup script that didn't work correctly. Basically it removed files via rados and the bucket index didn't update. Thank you a lot for your help. (will also close the bug on the ceph tracker) Am Fr., 23. Juli 2021 um 01:16 Uhr schrieb Rafael Lopez : > > Thanks for further clarification Dan. > > Boris, if you have a test/QA environment on the same code as production, you > can confirm if the problem is as above. Do NOT do this in production - if the > problem exists it might result in losing production data. > > 1. Upload large S3 object that would take 10+ seconds to download (several GB) > 2. Download object to ensure it is working > 3. Set "rgw_gc_obj_min_wait" to very low value (2-3 seconds) > 4. Download object > > Step (4) may succeed, but run this: > `radosgw-admin gc list` > > And check for shadow objects associated with the S3 object. > > Once the garbage collection completes, you will get the 404 NoSuchKey return > when you try to download the S3 object, although it will still be listed as > an object in the bucket. > Also recommend setting the "rgw_gc_obj_min_wait" back to a high value after > you finish testing. > > On Thu, 22 Jul 2021 at 19:45, Dan van der Ster wrote: >> >> Boris, >> >> To check if your issue is related to Rafael's, could you check your >> access logs for requests on the missing objects which lasted longer >> than one hour? >> >> I ask because Nautilus also has rgw_gc_obj_min_wait (2hr by default), >> which is the main config option related to >> https://tracker.ceph.com/issues/47866 >> >> >> -- Dan >> >> On Thu, Jul 22, 2021 at 11:12 AM Dan van der Ster >> wrote: >> > >> > Hi Rafael, >> > >> > AFAIU, that gc issue was not relevant for N -- the bug is in the new >> > rgw_gc code which landed in Octopus and was not backported to N. >> > >> > Well, RHCEPH had the new rgw_gc cls backported to it, and RHCEPH has >> > the bugfix you refer to: >> > * Wed Dec 02 2020 Ceph Jenkins 2:14.2.11-86 >> > - rgw: during GC defer, prevent new GC enqueue (rhbz#1892644) >> > https://bugzilla.redhat.com/show_bug.cgi?id=1892644 >> > >> > But still, I think it shouldn't apply to the upstream community >> > Nautilus that we run. >> > >> > That said, this indeed looks really similar so perhaps Nautilus has >> > similar faulty gc logic. >> > >> > Cheers, Dan >> > >> > On Thu, Jul 22, 2021 at 6:47 AM Rafael Lopez >> > wrote: >> > > >> > > hi boris, >> > > >> > > We hit an issue late last year that sounds similar to what you are >> > > experiencing. I am not sure if the fix was backported to nautilus, I >> > > can't see any reference to a nautilus backport so it's possible it was >> > > only backported to octopus (15.x), exception being red hat ceph nautilus. >> > > >> > > https://tracker.ceph.com/issues/47866?next_issue_id=48255#note-59 >> > > https://www.mail-archive.com/ceph-users@ceph.io/msg05312.html >> > > >> > > Basically, a read request on a s3/swift object that took a very long >> > > time to complete would cause the associated rados data objects to be put >> > > in the GC queue, but the head object would still be present. So the s3 >> > > object would still show as present, `rados bi list` would show it (since >> > > head object was present) but the data objects would be gone, resulting >> > > in 404 NoSuchKey when retrieving the object. >> > > >> > > raf >> > > >> > > On Wed, 21 Jul 2021 at 18:12, Boris Behrens wrote: >> > >> >> > >> Good morning everybody, >> > >> >> > >> we've dug further into it but still don't know how this could happen. >> > >> What we ruled out for now: >> > >> * Orphan objects cleanup process. >> > >> ** There is only one bucket with missing data (I checked all other >> > >> buckets yesterday) >> > >> ** The "keep this files" list is generated by radosgw-admin bukcet >> > >> rados list. I would doubt that there were files listed, that are >> > >> accessible via radosgw >> > >> ** The deleted files are somewhat random, but always with their >> > >> corresponding counterparts (per folder there are 2-3 files tha
[ceph-users] Re: Files listed in radosgw BI but is not available in ceph
Good morning everybody, we've dug further into it but still don't know how this could happen. What we ruled out for now: * Orphan objects cleanup process. ** There is only one bucket with missing data (I checked all other buckets yesterday) ** The "keep this files" list is generated by radosgw-admin bukcet rados list. I would doubt that there were files listed, that are accessible via radosgw ** The deleted files are somewhat random, but always with their corresponding counterparts (per folder there are 2-3 files that belong together) * Customer remove his data, but radosgw didn't clean up the bucket index ** there are no delete requests in the buckets usage log. ** customer told us, that they do not have a delete job for this bucket So I am lost with ideas that I could check, and hope that you people might be able to help with further ideas. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Files listed in radosgw BI but is not available in ceph
Hi Dan, unfortunally there are no versioned objects in the bucket. All objects are type plain. I will create a bug ticket. Am Mo., 19. Juli 2021 um 18:35 Uhr schrieb Dan van der Ster : > > Here's a recipe, from the when I had the same question: > > > "[ceph-users] Re: rgw index shard much larger than others - ceph-users - > lists.ceph.io" > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/MO7IHRGJ7TGPKT3GXCKMFLR674G3YGUX/ > > On Mon, 19 Jul 2021, 18:00 Boris Behrens, wrote: >> >> Hi Dan, >> how do I find out if a bucket got versioning enabled? >> >> Am Mo., 19. Juli 2021 um 17:00 Uhr schrieb Dan van der Ster >> : >> > >> > Hi Boris, >> > >> > Does the bucket have object versioning enabled? >> > We saw something like this once a while ago: `s3cmd ls` showed an >> > entry for an object, but when we tried to get it we had 404. >> > We didn't find a good explanation in the end -- our user was able to >> > re-upload the object and it didn't recur so we didn't debug further. >> > >> > I suggest you open a ticket in the tracker with all the evidence so >> > the developers can help diagnose the problem. >> > >> > Cheers, Dan >> > >> > >> > On Fri, Jul 16, 2021 at 6:45 PM Boris Behrens wrote: >> > > >> > > Hi Jean-Sebastien, >> > > >> > > I have the exact opposite. Files can be listed (the are in the bucket >> > > index), but are not available anymore. >> > > >> > > Am Fr., 16. Juli 2021 um 18:41 Uhr schrieb Jean-Sebastien Landry >> > > : >> > > > >> > > > Hi Boris, I don't have any answer for you, but I have situation similar >> > > > to yours. >> > > > >> > > > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/7E6O6ILGE5JCI4ISU66HZ6ZVZP6N6T3M/ >> > > > >> > > > I didn't try radoslist, I should have. >> > > > >> > > > Is this new, or it just that the client realised this lately? >> > > > All the data seems missing or just some paths? >> > > > Did you reshard lately? >> > > > Did you test using client programs like s3cmd & rclone...? >> > > > >> > > > I didn't have time to work on that this week, but I have to find a >> > > > solution too. >> > > > Meanwhile, I run with a lower shard number and my customer can access >> > > > all his data. >> > > > Cheers! >> > > > >> > > > On 7/16/21 11:36 AM, Boris Behrens wrote: >> > > > > [Externe UL*] >> > > > > >> > > > > Hi everybody, >> > > > > a customer mentioned that he got problems in accessing hist rgw data. >> > > > > I checked the bucket index and the file should be available. Then I >> > > > > pulled a list with radosgw-admin radoslist --bucket BUCKET and it >> > > > > seems that the file is gone. >> > > > > >> > > > > beside the "yaiks, is there a way the file might be somewhere else in >> > > > > ceph?" how can this happen? >> > > > > >> > > > > We do occational orphan objects cleanups but this does not pull the >> > > > > bucket index into account. >> > > > > >> > > > > It is a large bucket with 2.1m files in it and with 34 shards. >> > > > > >> > > > > Cheers and happy weekend >> > > > > Boris >> > > > > >> > > > > -- >> > > > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend >> > > > > im groüen Saal. >> > > > > ___ >> > > > > ceph-users mailing list -- ceph-users@ceph.io >> > > > > To unsubscribe send an email to ceph-users-le...@ceph.io >> > > > > *ATTENTION : L’émetteur de ce courriel est externe à l’Université >> > > > > Laval. >> > > > > Évitez de cliquer sur un hyperlien, d’ouvrir une pièce jointe ou de >> > > > > transmettre des informations si vous ne connaissez pas l’expéditeur >> > > > > du courriel. En cas de doute, contactez l’équipe de soutien >> > > > > informatique de votre unité ou hameconn...@ulaval.ca. >> > > > > >> > > > > >> > > > ___ >> > > > ceph-users mailing list -- ceph-users@ceph.io >> > > > To unsubscribe send an email to ceph-users-le...@ceph.io >> > > >> > > >> > > >> > > -- >> > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend >> > > im groüen Saal. >> > > ___ >> > > ceph-users mailing list -- ceph-users@ceph.io >> > > To unsubscribe send an email to ceph-users-le...@ceph.io >> >> >> >> -- >> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend >> im groüen Saal. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Files listed in radosgw BI but is not available in ceph
Hi Dan, how do I find out if a bucket got versioning enabled? Am Mo., 19. Juli 2021 um 17:00 Uhr schrieb Dan van der Ster : > > Hi Boris, > > Does the bucket have object versioning enabled? > We saw something like this once a while ago: `s3cmd ls` showed an > entry for an object, but when we tried to get it we had 404. > We didn't find a good explanation in the end -- our user was able to > re-upload the object and it didn't recur so we didn't debug further. > > I suggest you open a ticket in the tracker with all the evidence so > the developers can help diagnose the problem. > > Cheers, Dan > > > On Fri, Jul 16, 2021 at 6:45 PM Boris Behrens wrote: > > > > Hi Jean-Sebastien, > > > > I have the exact opposite. Files can be listed (the are in the bucket > > index), but are not available anymore. > > > > Am Fr., 16. Juli 2021 um 18:41 Uhr schrieb Jean-Sebastien Landry > > : > > > > > > Hi Boris, I don't have any answer for you, but I have situation similar > > > to yours. > > > > > > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/7E6O6ILGE5JCI4ISU66HZ6ZVZP6N6T3M/ > > > > > > I didn't try radoslist, I should have. > > > > > > Is this new, or it just that the client realised this lately? > > > All the data seems missing or just some paths? > > > Did you reshard lately? > > > Did you test using client programs like s3cmd & rclone...? > > > > > > I didn't have time to work on that this week, but I have to find a > > > solution too. > > > Meanwhile, I run with a lower shard number and my customer can access > > > all his data. > > > Cheers! > > > > > > On 7/16/21 11:36 AM, Boris Behrens wrote: > > > > [Externe UL*] > > > > > > > > Hi everybody, > > > > a customer mentioned that he got problems in accessing hist rgw data. > > > > I checked the bucket index and the file should be available. Then I > > > > pulled a list with radosgw-admin radoslist --bucket BUCKET and it > > > > seems that the file is gone. > > > > > > > > beside the "yaiks, is there a way the file might be somewhere else in > > > > ceph?" how can this happen? > > > > > > > > We do occational orphan objects cleanups but this does not pull the > > > > bucket index into account. > > > > > > > > It is a large bucket with 2.1m files in it and with 34 shards. > > > > > > > > Cheers and happy weekend > > > > Boris > > > > > > > > -- > > > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend > > > > im groüen Saal. > > > > ___ > > > > ceph-users mailing list -- ceph-users@ceph.io > > > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > *ATTENTION : L’émetteur de ce courriel est externe à l’Université Laval. > > > > Évitez de cliquer sur un hyperlien, d’ouvrir une pièce jointe ou de > > > > transmettre des informations si vous ne connaissez pas l’expéditeur du > > > > courriel. En cas de doute, contactez l’équipe de soutien informatique > > > > de votre unité ou hameconn...@ulaval.ca. > > > > > > > > > > > ___ > > > ceph-users mailing list -- ceph-users@ceph.io > > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > > > > > -- > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend > > im groüen Saal. > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Files listed in radosgw BI but is not available in ceph
Does someone got an idea, how this could happen? * The files are present in the output of "radosgw-admin bi list --bucket BUCKET" * The files are missing in the output of "radosgw-admin bucket radoslist --bucket BUCKET" * I have strange shadow objects that doesn't seem to have a filename (_shadow_.Sxj4BEhZS6PZg1HhsvSeqJM4Y0wRCto_4) It doesn't seem to be a careless "rados -p POOL rm OBJECT" because then it should be still in the "radosgw-admin bucket radoslist --bucket BUCKET" output. (just tested that on a testbucket). Am Fr., 16. Juli 2021 um 17:36 Uhr schrieb Boris Behrens : > > Hi everybody, > a customer mentioned that he got problems in accessing hist rgw data. > I checked the bucket index and the file should be available. Then I > pulled a list with radosgw-admin radoslist --bucket BUCKET and it > seems that the file is gone. > > beside the "yaiks, is there a way the file might be somewhere else in > ceph?" how can this happen? > > We do occational orphan objects cleanups but this does not pull the > bucket index into account. > > It is a large bucket with 2.1m files in it and with 34 shards. > > Cheers and happy weekend > Boris > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend > im groüen Saal. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Files listed in radosgw BI but is not available in ceph
Just digging: I have a ton of iles in the radosgw-admin bucket radoslist output that looks like ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.LRSp5qOg4cDn2ImWxeXtJlRvfLNZ-8R_1 ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_1 ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_2 ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_3 ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_4 ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_5 What are those files? o0 Am Sa., 17. Juli 2021 um 22:54 Uhr schrieb Boris Behrens : > > Hi k, > > all systems run 14.2.21 > > Cheers > Boris > > Am Sa., 17. Juli 2021 um 22:12 Uhr schrieb Konstantin Shalygin > : > > > > Boris, what is your Ceph version? > > > > > > k > > > > On 17 Jul 2021, at 11:04, Boris Behrens wrote: > > > > I really need help with this issue. > > > > > > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend > im groüen Saal. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Files listed in radosgw BI but is not available in ceph
Hi k, all systems run 14.2.21 Cheers Boris Am Sa., 17. Juli 2021 um 22:12 Uhr schrieb Konstantin Shalygin : > > Boris, what is your Ceph version? > > > k > > On 17 Jul 2021, at 11:04, Boris Behrens wrote: > > I really need help with this issue. > > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Files listed in radosgw BI but is not available in ceph
Is it possible to not complete a file upload so the actual file is not there, but it is listed in the bucket index? I really need help with this issue. Am Fr., 16. Juli 2021 um 19:35 Uhr schrieb Boris Behrens : > > exactly. > rados rm wouldn't remove it from the "radosgw-admin bucket radoslist" > list, correct? > > our usage statistics are not really usable because it fluctuates in a > 200tb range. > > I also hope that I find the files in the "rados ls" list, but I don't > have much hope. > > For me it is key to understand how this happeneds and how I can verify > the integrity of all buckets. > Dataloss is the worst kind of problem for me. > > Am Fr., 16. Juli 2021 um 19:21 Uhr schrieb Jean-Sebastien Landry > : > > > > Ok, so everything looks normal from the sysadmin "bi list" & the > > customer "s3cmd ls" views, except that the GET give a 404 NoSuchKey? > > > > > Is there way to remove a file from a bucket without removing it from > > the bucketindex? > > > > using rados rm probably, but from the customer ends, I hope not. > > > > Do you have any usage stats that can confirm that the data has been > > deleted and/or are still there. (at the pool level maybe?) > > Hopping for you that it's just a data/index/shard mismatch... > > > > > > On 7/16/21 12:44 PM, Boris Behrens wrote: > > > [Externe UL*] > > > > > > Hi Jean-Sebastien, > > > > > > I have the exact opposite. Files can be listed (the are in the bucket > > > index), but are not available anymore. > > > > > > Am Fr., 16. Juli 2021 um 18:41 Uhr schrieb Jean-Sebastien Landry > > > : > > >> Hi Boris, I don't have any answer for you, but I have situation similar > > >> to yours. > > >> > > >> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/7E6O6ILGE5JCI4ISU66HZ6ZVZP6N6T3M/ > > >> > > >> I didn't try radoslist, I should have. > > >> > > >> Is this new, or it just that the client realised this lately? > > >> All the data seems missing or just some paths? > > >> Did you reshard lately? > > >> Did you test using client programs like s3cmd & rclone...? > > >> > > >> I didn't have time to work on that this week, but I have to find a > > >> solution too. > > >> Meanwhile, I run with a lower shard number and my customer can access > > >> all his data. > > >> Cheers! > > >> > > >> On 7/16/21 11:36 AM, Boris Behrens wrote: > > >>> [Externe UL*] > > >>> > > >>> Hi everybody, > > >>> a customer mentioned that he got problems in accessing hist rgw data. > > >>> I checked the bucket index and the file should be available. Then I > > >>> pulled a list with radosgw-admin radoslist --bucket BUCKET and it > > >>> seems that the file is gone. > > >>> > > >>> beside the "yaiks, is there a way the file might be somewhere else in > > >>> ceph?" how can this happen? > > >>> > > >>> We do occational orphan objects cleanups but this does not pull the > > >>> bucket index into account. > > >>> > > >>> It is a large bucket with 2.1m files in it and with 34 shards. > > >>> > > >>> Cheers and happy weekend > > >>>Boris > > >>> > > >>> -- > > >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend > > >>> im groüen Saal. > > >>> ___ > > >>> ceph-users mailing list -- ceph-users@ceph.io > > >>> To unsubscribe send an email to ceph-users-le...@ceph.io > > >>> > > >>> > > >> ___ > > >> ceph-users mailing list -- ceph-users@ceph.io > > >> To unsubscribe send an email to ceph-users-le...@ceph.io > > > > > > > > > -- > > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend > > > im groüen Saal. > > > ___ > > > ceph-users mailing list -- ceph-users@ceph.io > > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > > > > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend > im groüen Saal. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Files listed in radosgw BI but is not available in ceph
Am Fr., 16. Juli 2021 um 19:35 Uhr schrieb Boris Behrens : > > exactly. > rados rm wouldn't remove it from the "radosgw-admin bucket radoslist" > list, correct? > > our usage statistics are not really usable because it fluctuates in a > 200tb range. > > I also hope that I find the files in the "rados ls" list, but I don't > have much hope. > > For me it is key to understand how this happeneds and how I can verify > the integrity of all buckets. > Dataloss is the worst kind of problem for me. > > Am Fr., 16. Juli 2021 um 19:21 Uhr schrieb Jean-Sebastien Landry > : > > > > Ok, so everything looks normal from the sysadmin "bi list" & the > > customer "s3cmd ls" views, except that the GET give a 404 NoSuchKey? > > > > > Is there way to remove a file from a bucket without removing it from > > the bucketindex? > > > > using rados rm probably, but from the customer ends, I hope not. > > > > Do you have any usage stats that can confirm that the data has been > > deleted and/or are still there. (at the pool level maybe?) > > Hopping for you that it's just a data/index/shard mismatch... > > > > > > On 7/16/21 12:44 PM, Boris Behrens wrote: > > > [Externe UL*] > > > > > > Hi Jean-Sebastien, > > > > > > I have the exact opposite. Files can be listed (the are in the bucket > > > index), but are not available anymore. > > > > > > Am Fr., 16. Juli 2021 um 18:41 Uhr schrieb Jean-Sebastien Landry > > > : > > >> Hi Boris, I don't have any answer for you, but I have situation similar > > >> to yours. > > >> > > >> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/7E6O6ILGE5JCI4ISU66HZ6ZVZP6N6T3M/ > > >> > > >> I didn't try radoslist, I should have. > > >> > > >> Is this new, or it just that the client realised this lately? > > >> All the data seems missing or just some paths? > > >> Did you reshard lately? > > >> Did you test using client programs like s3cmd & rclone...? > > >> > > >> I didn't have time to work on that this week, but I have to find a > > >> solution too. > > >> Meanwhile, I run with a lower shard number and my customer can access > > >> all his data. > > >> Cheers! > > >> > > >> On 7/16/21 11:36 AM, Boris Behrens wrote: > > >>> [Externe UL*] > > >>> > > >>> Hi everybody, > > >>> a customer mentioned that he got problems in accessing hist rgw data. > > >>> I checked the bucket index and the file should be available. Then I > > >>> pulled a list with radosgw-admin radoslist --bucket BUCKET and it > > >>> seems that the file is gone. > > >>> > > >>> beside the "yaiks, is there a way the file might be somewhere else in > > >>> ceph?" how can this happen? > > >>> > > >>> We do occational orphan objects cleanups but this does not pull the > > >>> bucket index into account. > > >>> > > >>> It is a large bucket with 2.1m files in it and with 34 shards. > > >>> > > >>> Cheers and happy weekend > > >>>Boris > > >>> > > >>> -- > > >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend > > >>> im groüen Saal. > > >>> ___ > > >>> ceph-users mailing list -- ceph-users@ceph.io > > >>> To unsubscribe send an email to ceph-users-le...@ceph.io > > >>> > > >>> > > >> ___ > > >> ceph-users mailing list -- ceph-users@ceph.io > > >> To unsubscribe send an email to ceph-users-le...@ceph.io > > > > > > > > > -- > > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend > > > im groüen Saal. > > > ___ > > > ceph-users mailing list -- ceph-users@ceph.io > > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > > > > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend > im groüen Saal. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: difference between rados ls and radosgw-admin bucket radoslist
I thought that too, but for the orphan objects handling you diff the lists from rados ls and radosgw-admin bucket radoslist. So there mus be some kind of difference. I really need this info to debug a problem we have. Am Fr., 16. Juli 2021 um 20:10 Uhr schrieb Jean-Sebastien Landry : > > My understanding is that radoslist is the same (or "very like") as rados > ls, except that it limit the scope to the given bucket. > > to be confirmed, I don't want to spread false information, but when you > do a > radosgw-admin bucket check --check-objects --fix, > it rebuild the "bi" from the pool level (rados ls), so I'm not sure the > bucketindex is "that" much important, knowing that you can rebuilt it > from the pool. (?) > > > > > On 7/16/21 1:47 PM, Boris Behrens wrote: > > [Externe UL*] > > > > Hi, > > is there a difference between those two? > > I always thought that radosgw-admin radoslist only shows the objects > > that are somehow associated with a bucket. But if the bucketindex is > > broken, would this reflect in the output? > > > > -- > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend > > im groüen Saal. > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] difference between rados ls and radosgw-admin bucket radoslist
Hi, is there a difference between those two? I always thought that radosgw-admin radoslist only shows the objects that are somehow associated with a bucket. But if the bucketindex is broken, would this reflect in the output? -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Files listed in radosgw BI but is not available in ceph
exactly. rados rm wouldn't remove it from the "radosgw-admin bucket radoslist" list, correct? our usage statistics are not really usable because it fluctuates in a 200tb range. I also hope that I find the files in the "rados ls" list, but I don't have much hope. For me it is key to understand how this happeneds and how I can verify the integrity of all buckets. Dataloss is the worst kind of problem for me. Am Fr., 16. Juli 2021 um 19:21 Uhr schrieb Jean-Sebastien Landry : > > Ok, so everything looks normal from the sysadmin "bi list" & the > customer "s3cmd ls" views, except that the GET give a 404 NoSuchKey? > > > Is there way to remove a file from a bucket without removing it from > the bucketindex? > > using rados rm probably, but from the customer ends, I hope not. > > Do you have any usage stats that can confirm that the data has been > deleted and/or are still there. (at the pool level maybe?) > Hopping for you that it's just a data/index/shard mismatch... > > > On 7/16/21 12:44 PM, Boris Behrens wrote: > > [Externe UL*] > > > > Hi Jean-Sebastien, > > > > I have the exact opposite. Files can be listed (the are in the bucket > > index), but are not available anymore. > > > > Am Fr., 16. Juli 2021 um 18:41 Uhr schrieb Jean-Sebastien Landry > > : > >> Hi Boris, I don't have any answer for you, but I have situation similar > >> to yours. > >> > >> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/7E6O6ILGE5JCI4ISU66HZ6ZVZP6N6T3M/ > >> > >> I didn't try radoslist, I should have. > >> > >> Is this new, or it just that the client realised this lately? > >> All the data seems missing or just some paths? > >> Did you reshard lately? > >> Did you test using client programs like s3cmd & rclone...? > >> > >> I didn't have time to work on that this week, but I have to find a > >> solution too. > >> Meanwhile, I run with a lower shard number and my customer can access > >> all his data. > >> Cheers! > >> > >> On 7/16/21 11:36 AM, Boris Behrens wrote: > >>> [Externe UL*] > >>> > >>> Hi everybody, > >>> a customer mentioned that he got problems in accessing hist rgw data. > >>> I checked the bucket index and the file should be available. Then I > >>> pulled a list with radosgw-admin radoslist --bucket BUCKET and it > >>> seems that the file is gone. > >>> > >>> beside the "yaiks, is there a way the file might be somewhere else in > >>> ceph?" how can this happen? > >>> > >>> We do occational orphan objects cleanups but this does not pull the > >>> bucket index into account. > >>> > >>> It is a large bucket with 2.1m files in it and with 34 shards. > >>> > >>> Cheers and happy weekend > >>>Boris > >>> > >>> -- > >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend > >>> im groüen Saal. > >>> ___ > >>> ceph-users mailing list -- ceph-users@ceph.io > >>> To unsubscribe send an email to ceph-users-le...@ceph.io > >>> > >>> > >> ___ > >> ceph-users mailing list -- ceph-users@ceph.io > >> To unsubscribe send an email to ceph-users-le...@ceph.io > > > > > > -- > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend > > im groüen Saal. > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Files listed in radosgw BI but is not available in ceph
Hi Jean-Sebastien, I have the exact opposite. Files can be listed (the are in the bucket index), but are not available anymore. Am Fr., 16. Juli 2021 um 18:41 Uhr schrieb Jean-Sebastien Landry : > > Hi Boris, I don't have any answer for you, but I have situation similar > to yours. > > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/7E6O6ILGE5JCI4ISU66HZ6ZVZP6N6T3M/ > > I didn't try radoslist, I should have. > > Is this new, or it just that the client realised this lately? > All the data seems missing or just some paths? > Did you reshard lately? > Did you test using client programs like s3cmd & rclone...? > > I didn't have time to work on that this week, but I have to find a > solution too. > Meanwhile, I run with a lower shard number and my customer can access > all his data. > Cheers! > > On 7/16/21 11:36 AM, Boris Behrens wrote: > > [Externe UL*] > > > > Hi everybody, > > a customer mentioned that he got problems in accessing hist rgw data. > > I checked the bucket index and the file should be available. Then I > > pulled a list with radosgw-admin radoslist --bucket BUCKET and it > > seems that the file is gone. > > > > beside the "yaiks, is there a way the file might be somewhere else in > > ceph?" how can this happen? > > > > We do occational orphan objects cleanups but this does not pull the > > bucket index into account. > > > > It is a large bucket with 2.1m files in it and with 34 shards. > > > > Cheers and happy weekend > > Boris > > > > -- > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend > > im groüen Saal. > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > *ATTENTION : L’émetteur de ce courriel est externe à l’Université Laval. > > Évitez de cliquer sur un hyperlien, d’ouvrir une pièce jointe ou de > > transmettre des informations si vous ne connaissez pas l’expéditeur du > > courriel. En cas de doute, contactez l’équipe de soutien informatique de > > votre unité ou hameconn...@ulaval.ca. > > > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Files listed in radosgw BI but is not available in ceph
Is there way to remove a file from a bucket without removing it from the bucketindex? Am Fr., 16. Juli 2021 um 17:36 Uhr schrieb Boris Behrens : > > Hi everybody, > a customer mentioned that he got problems in accessing hist rgw data. > I checked the bucket index and the file should be available. Then I > pulled a list with radosgw-admin radoslist --bucket BUCKET and it > seems that the file is gone. > > beside the "yaiks, is there a way the file might be somewhere else in > ceph?" how can this happen? > > We do occational orphan objects cleanups but this does not pull the > bucket index into account. > > It is a large bucket with 2.1m files in it and with 34 shards. > > Cheers and happy weekend > Boris > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend > im groüen Saal. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Files listed in radosgw BI but is not available in ceph
Hi everybody, a customer mentioned that he got problems in accessing hist rgw data. I checked the bucket index and the file should be available. Then I pulled a list with radosgw-admin radoslist --bucket BUCKET and it seems that the file is gone. beside the "yaiks, is there a way the file might be somewhere else in ceph?" how can this happen? We do occational orphan objects cleanups but this does not pull the bucket index into account. It is a large bucket with 2.1m files in it and with 34 shards. Cheers and happy weekend Boris -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: best practice balance mode in HAproxy in front of RGW?
Am Do., 27. Mai 2021 um 07:47 Uhr schrieb Janne Johansson : > > Den ons 26 maj 2021 kl 16:33 skrev Boris Behrens : > > > > Hi Janne, > > do you know if there can be data duplication which leads to orphan objects? > > > > I am currently huntin strange errors (there is a lot more data in the > > pool, than accessible via rgw) and want to be sure it doesn't come > > from the HAproxy. > > No, I don't think the HAProxy (or any other load balancing setup) in > itself would > cause a lot of orphans. Or in reverse, the multipart stateless way S3 > acts always > allows for half-uploads and broken connections which would leave orphans even > if you did not have HAProxy in between, and in both cases you should > periodically > run the orphan finding commands and trim usage logs you no longer > require and so on. > > > -- > May the most significant bit of your life be positive. Well, this drops a lot of pressure from my shoulders. Is there a way to reduce the probability of creating orphan objects? We use s3 for rbd backups (create snapshot, compress it and then copy it to s3 via s3cmd) and we created 25m orphan objects in 4 weeks. If there is any option / best practive I can do, I will happily use it :) ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: best practice balance mode in HAproxy in front of RGW?
Hi Janne, do you know if there can be data duplication which leads to orphan objects? I am currently huntin strange errors (there is a lot more data in the pool, than accessible via rgw) and want to be sure it doesn't come from the HAproxy. Am Mi., 26. Mai 2021 um 13:12 Uhr schrieb Janne Johansson : > > I guess normal round robin should work out fine too, regardless of if > there are few clients making several separate connections or many > clients making a few. > > Den ons 26 maj 2021 kl 12:32 skrev Boris Behrens : > > > > Hello togehter, > > > > is there any best practive on the balance mode when I have a HAproxy > > in front of my rgw_frontend? > > > > currently we use "balance leastconn". > > > > Cheers > > Boris > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > -- > May the most significant bit of your life be positive. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] best practice balance mode in HAproxy in front of RGW?
Hello togehter, is there any best practive on the balance mode when I have a HAproxy in front of my rgw_frontend? currently we use "balance leastconn". Cheers Boris ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: summarized radosgw size_kb_actual vs pool stored value doesn't add up
The more files I delete, the more space is used. How can this be? Am Di., 25. Mai 2021 um 14:41 Uhr schrieb Boris Behrens : > > Am Di., 25. Mai 2021 um 09:23 Uhr schrieb Boris Behrens : > > > > Hi, > > I am still searching for a reason why these two values differ so much. > > > > I am currently deleting a giant amount of orphan objects (43mio, most > > of them under 64kb), but the difference get larger instead of smaller. > > > > This was the state two days ago: > > > > > > [root@s3db1 ~]# radosgw-admin bucket stats | grep '"size_kb_actual"' | > > > awk '{ print $2 }' | tr -d , | paste -sd+ - | bc > > > 175977343264 > > > > > > [root@s3db1 ~]# rados df > > > POOL_NAME USED OBJECTS CLONESCOPIES > > > MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RDWR_OPS WR > > > USED COMPR UNDER COMPR > > > ... > > > eu-central-1.rgw.buckets.data 766 TiB 134632397 0 403897191 > > > 0 00 1076480853 45 TiB 532045864 551 TiB > > > 0 B 0 B > > > ... > > > total_objects135866676 > > > > > > [root@s3db1 ~]# ceph df... > > > eu-central-1.rgw.buckets.data 11 2048 253 TiB > > > 134.63M 766 TiB 90.3227 TiB > > > > And this is todays state: > > > > > > [root@s3db1 ~]# radosgw-admin bucket stats | grep '"size_kb_actual"' | > > > awk '{ print $2 }' | tr -d , | paste -sd+ - | bc > > > 177144806812 > > > > > > [root@s3db1 ~]# rados df > > > ... > > > eu-central-1.rgw.buckets.data 786 TiB 120025590 0 360076770 > > > ... > > > total_objects121261889 > > > > > > [root@s3db1 ~]# ceph df > > > ... > > > eu-central-1.rgw.buckets.data 11 2048 260 TiB > > > 120.02M 786 TiB 92.5921 TiB > > > > I would love to free up the missing 80TB :) > > Any suggestions? > > As Konstatin mentioned, maybe it was the GC, but I just processes all > objects (with --include-all), but the situation did not change. > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend > im groüen Saal. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: summarized radosgw size_kb_actual vs pool stored value doesn't add up
Am Di., 25. Mai 2021 um 09:23 Uhr schrieb Boris Behrens : > > Hi, > I am still searching for a reason why these two values differ so much. > > I am currently deleting a giant amount of orphan objects (43mio, most > of them under 64kb), but the difference get larger instead of smaller. > > This was the state two days ago: > > > > [root@s3db1 ~]# radosgw-admin bucket stats | grep '"size_kb_actual"' | awk > > '{ print $2 }' | tr -d , | paste -sd+ - | bc > > 175977343264 > > > > [root@s3db1 ~]# rados df > > POOL_NAME USED OBJECTS CLONESCOPIES > > MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RDWR_OPS WR > > USED COMPR UNDER COMPR > > ... > > eu-central-1.rgw.buckets.data 766 TiB 134632397 0 403897191 > > 0 00 1076480853 45 TiB 532045864 551 TiB0 B > >0 B > > ... > > total_objects135866676 > > > > [root@s3db1 ~]# ceph df... > > eu-central-1.rgw.buckets.data 11 2048 253 TiB 134.63M > > 766 TiB 90.3227 TiB > > And this is todays state: > > > > [root@s3db1 ~]# radosgw-admin bucket stats | grep '"size_kb_actual"' | awk > > '{ print $2 }' | tr -d , | paste -sd+ - | bc > > 177144806812 > > > > [root@s3db1 ~]# rados df > > ... > > eu-central-1.rgw.buckets.data 786 TiB 120025590 0 360076770 > > ... > > total_objects121261889 > > > > [root@s3db1 ~]# ceph df > > ... > > eu-central-1.rgw.buckets.data 11 2048 260 TiB 120.02M > > 786 TiB 92.5921 TiB > > I would love to free up the missing 80TB :) > Any suggestions? As Konstatin mentioned, maybe it was the GC, but I just processes all objects (with --include-all), but the situation did not change. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: summarized radosgw size_kb_actual vs pool stored value doesn't add up
Am Di., 25. Mai 2021 um 09:39 Uhr schrieb Konstantin Shalygin : > > Hi, > > On 25 May 2021, at 10:23, Boris Behrens wrote: > > I am still searching for a reason why these two values differ so much. > > I am currently deleting a giant amount of orphan objects (43mio, most > of them under 64kb), but the difference get larger instead of smaller. > > > When user trough API make a delete, objects just marks as deleted, then > ceph-radosgw gc perform actual delete, you can see queue via `radosgw-admin > gc list` > I think you can speedup process via rgw_gc_ options. > > > Cheers, > k Hi K, I thought about the GC, but it doesn't look like this is the issue: > > [root@s3db1 ~]# radosgw-admin gc list --include-all | grep oid | wc -l > 563598 > [root@s3db1 ~]# radosgw-admin gc list | grep oid | wc -l > 43768 -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] summarized radosgw size_kb_actual vs pool stored value doesn't add up
Hi, I am still searching for a reason why these two values differ so much. I am currently deleting a giant amount of orphan objects (43mio, most of them under 64kb), but the difference get larger instead of smaller. This was the state two days ago: > > [root@s3db1 ~]# radosgw-admin bucket stats | grep '"size_kb_actual"' | awk '{ > print $2 }' | tr -d , | paste -sd+ - | bc > 175977343264 > > [root@s3db1 ~]# rados df > POOL_NAME USED OBJECTS CLONESCOPIES > MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RDWR_OPS WR USED > COMPR UNDER COMPR > ... > eu-central-1.rgw.buckets.data 766 TiB 134632397 0 403897191 > 0 00 1076480853 45 TiB 532045864 551 TiB0 B >0 B > ... > total_objects135866676 > > [root@s3db1 ~]# ceph df... > eu-central-1.rgw.buckets.data 11 2048 253 TiB 134.63M > 766 TiB 90.3227 TiB And this is todays state: > > [root@s3db1 ~]# radosgw-admin bucket stats | grep '"size_kb_actual"' | awk '{ > print $2 }' | tr -d , | paste -sd+ - | bc > 177144806812 > > [root@s3db1 ~]# rados df > ... > eu-central-1.rgw.buckets.data 786 TiB 120025590 0 360076770 > ... > total_objects121261889 > > [root@s3db1 ~]# ceph df > ... > eu-central-1.rgw.buckets.data 11 2048 260 TiB 120.02M > 786 TiB 92.5921 TiB I would love to free up the missing 80TB :) Any suggestions? -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] question regarding markers in radosgw
Hello everybody, It seems that I have a metric ton of orphan objects in my s3 cluster. They look like this: $ rados -p eu-central-1.rgw.buckets.data stat ff7a8b0c-07e6-463a-861b-78f0adeba8ad.811806.9_1063978/features/2018-02-23.json eu-central-1.rgw.buckets.data/ff7a8b0c-07e6-463a-861b-78f0adeba8ad.811806.9_1063978/features/2018-02-23.json mtime 2018-02-23 20:59:32.00, size 608 Now I would imagine that 811806.9 is the marker, but when I do a simple radosgw-admin bucket stats | grep -F 811806.9 I get back no results. Can I jsut delete these files? And if I can delete these files, how can I delete them fast? Cheers Boris -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: "radosgw-admin bucket radoslist" loops when a multipart upload is happening
Reading through the bugtracker: https://tracker.ceph.com/issues/50293 Thanks for your patience. Am Do., 20. Mai 2021 um 15:10 Uhr schrieb Boris Behrens : > I try to bump it once more, because it makes finding orphan objects nearly > impossible. > > Am Di., 11. Mai 2021 um 13:03 Uhr schrieb Boris Behrens : > >> Hi together, >> >> I still search for orphan objects and came across a strange bug: >> There is a huge multipart upload happening (around 4TB), and listing the >> rados objects in the bucket loops over the multipart upload. >> >> >> >> -- >> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im >> groüen Saal. >> > > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > groüen Saal. > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: "radosgw-admin bucket radoslist" loops when a multipart upload is happening
I try to bump it once more, because it makes finding orphan objects nearly impossible. Am Di., 11. Mai 2021 um 13:03 Uhr schrieb Boris Behrens : > Hi together, > > I still search for orphan objects and came across a strange bug: > There is a huge multipart upload happening (around 4TB), and listing the > rados objects in the bucket loops over the multipart upload. > > > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > groüen Saal. > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Process for adding a separate block.db to an osd
This helped: https://tracker.ceph.com/issues/44509 $ systemctl stop ceph-osd@68 $ ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-68 --devs-source /var/lib/ceph/osd/ceph-68/block --dev-target /var/lib/ceph/osd/ceph-68/block.db bluefs-bdev-migrate $ systemctl start ceph-osd@68 Thanks a lot for your support Igor <3 Am Di., 18. Mai 2021 um 09:54 Uhr schrieb Boris Behrens : > One more question: > How do I get rid of the bluestore spillover message? > osd.68 spilled over 64 KiB metadata from 'db' device (13 GiB used of > 50 GiB) to slow device > > I tried an offline compactation, which did not help. > > Am Mo., 17. Mai 2021 um 15:56 Uhr schrieb Boris Behrens : > >> I have no idea why, but it worked. >> >> As the fsck went well, I just re did the bluefs-bdev-new-db and now the >> OSD is back up, with a block.db device. >> >> Thanks a lot >> >> Am Mo., 17. Mai 2021 um 15:28 Uhr schrieb Igor Fedotov > >: >> >>> If you haven't had successful OSD.68 starts with standalone DB I think >>> it's safe to revert previous DB adding and just retry it. >>> >>> At first suggest to run bluefs-bdev-new-db command only and then do fsck >>> again. If it's OK - proceed with bluefs migrate followed by another >>> fsck. And then finalize with adding lvm tags and OSD activation. >>> >>> >>> Thanks, >>> >>> Igor >>> >>> On 5/17/2021 3:47 PM, Boris Behrens wrote: >>> > The FSCK looks good: >>> > >>> > [root@s3db10 export-bluefs2]# ceph-bluestore-tool --path >>> > /var/lib/ceph/osd/ceph-68 fsck >>> > fsck success >>> > >>> > Am Mo., 17. Mai 2021 um 14:39 Uhr schrieb Boris Behrens >> >: >>> > >>> >> Here is the new output. I kept both for now. >>> >> >>> >> [root@s3db10 export-bluefs2]# ls * >>> >> db: >>> >> 018215.sst 018444.sst 018839.sst 019074.sst 019210.sst 019381.sst >>> >> 019560.sst 019755.sst 019849.sst 019888.sst 019958.sst >>> 019995.sst >>> >> 020007.sst 020042.sst 020067.sst 020098.sst 020115.sst >>> >> 018216.sst 018445.sst 018840.sst 019075.sst 019211.sst 019382.sst >>> >> 019670.sst 019756.sst 019877.sst 019889.sst 019959.sst >>> 019996.sst >>> >> 020008.sst 020043.sst 020068.sst 020104.sst CURRENT >>> >> 018273.sst 018446.sst 018876.sst 019076.sst 019256.sst 019383.sst >>> >> 019671.sst 019757.sst 019878.sst 019890.sst 019960.sst >>> 019997.sst >>> >> 020030.sst 020055.sst 020069.sst 020105.sst IDENTITY >>> >> 018300.sst 018447.sst 018877.sst 019081.sst 019257.sst 019395.sst >>> >> 019672.sst 019762.sst 019879.sst 019918.sst 019961.sst >>> 019998.sst >>> >> 020031.sst 020056.sst 020070.sst 020106.sst LOCK >>> >> 018301.sst 018448.sst 018904.sst 019082.sst 019344.sst 019396.sst >>> >> 019673.sst 019763.sst 019880.sst 019919.sst 019962.sst >>> 01.sst >>> >> 020032.sst 020057.sst 020071.sst 020107.sst MANIFEST-020084 >>> >> 018326.sst 018449.sst 018950.sst 019083.sst 019345.sst 019400.sst >>> >> 019674.sst 019764.sst 019881.sst 019920.sst 019963.sst >>> 02.sst >>> >> 020035.sst 020058.sst 020072.sst 020108.sst OPTIONS-020084 >>> >> 018327.sst 018540.sst 018952.sst 019126.sst 019346.sst 019470.sst >>> >> 019675.sst 019765.sst 019882.sst 019921.sst 019964.sst >>> 020001.sst >>> >> 020036.sst 020059.sst 020073.sst 020109.sst OPTIONS-020087 >>> >> 018328.sst 018541.sst 018953.sst 019127.sst 019370.sst 019471.sst >>> >> 019676.sst 019766.sst 019883.sst 019922.sst 019965.sst >>> 020002.sst >>> >> 020037.sst 020060.sst 020074.sst 020110.sst >>> >> 018329.sst 018590.sst 018954.sst 019128.sst 019371.sst 019472.sst >>> >> 019677.sst 019845.sst 019884.sst 019923.sst 019989.sst >>> 020003.sst >>> >> 020038.sst 020061.sst 020075.sst 020111.sst >>> >> 018406.sst 018591.sst 018995.sst 019174.sst 019372.sst 019473.sst >>> >> 019678.sst 019846.sst 019885.sst 019950.sst 019992.sst >>> 020004.sst >>> >> 020039.sst 020062.sst 020094.sst 020112.sst >>> >> 018407.sst 018727.sst 018996.sst 019175.sst 019373.sst 019474.ss
[ceph-users] Re: Process for adding a separate block.db to an osd
One more question: How do I get rid of the bluestore spillover message? osd.68 spilled over 64 KiB metadata from 'db' device (13 GiB used of 50 GiB) to slow device I tried an offline compactation, which did not help. Am Mo., 17. Mai 2021 um 15:56 Uhr schrieb Boris Behrens : > I have no idea why, but it worked. > > As the fsck went well, I just re did the bluefs-bdev-new-db and now the > OSD is back up, with a block.db device. > > Thanks a lot > > Am Mo., 17. Mai 2021 um 15:28 Uhr schrieb Igor Fedotov : > >> If you haven't had successful OSD.68 starts with standalone DB I think >> it's safe to revert previous DB adding and just retry it. >> >> At first suggest to run bluefs-bdev-new-db command only and then do fsck >> again. If it's OK - proceed with bluefs migrate followed by another >> fsck. And then finalize with adding lvm tags and OSD activation. >> >> >> Thanks, >> >> Igor >> >> On 5/17/2021 3:47 PM, Boris Behrens wrote: >> > The FSCK looks good: >> > >> > [root@s3db10 export-bluefs2]# ceph-bluestore-tool --path >> > /var/lib/ceph/osd/ceph-68 fsck >> > fsck success >> > >> > Am Mo., 17. Mai 2021 um 14:39 Uhr schrieb Boris Behrens : >> > >> >> Here is the new output. I kept both for now. >> >> >> >> [root@s3db10 export-bluefs2]# ls * >> >> db: >> >> 018215.sst 018444.sst 018839.sst 019074.sst 019210.sst 019381.sst >> >> 019560.sst 019755.sst 019849.sst 019888.sst 019958.sst >> 019995.sst >> >> 020007.sst 020042.sst 020067.sst 020098.sst 020115.sst >> >> 018216.sst 018445.sst 018840.sst 019075.sst 019211.sst 019382.sst >> >> 019670.sst 019756.sst 019877.sst 019889.sst 019959.sst >> 019996.sst >> >> 020008.sst 020043.sst 020068.sst 020104.sst CURRENT >> >> 018273.sst 018446.sst 018876.sst 019076.sst 019256.sst 019383.sst >> >> 019671.sst 019757.sst 019878.sst 019890.sst 019960.sst >> 019997.sst >> >> 020030.sst 020055.sst 020069.sst 020105.sst IDENTITY >> >> 018300.sst 018447.sst 018877.sst 019081.sst 019257.sst 019395.sst >> >> 019672.sst 019762.sst 019879.sst 019918.sst 019961.sst >> 019998.sst >> >> 020031.sst 020056.sst 020070.sst 020106.sst LOCK >> >> 018301.sst 018448.sst 018904.sst 019082.sst 019344.sst 019396.sst >> >> 019673.sst 019763.sst 019880.sst 019919.sst 019962.sst >> 01.sst >> >> 020032.sst 020057.sst 020071.sst 020107.sst MANIFEST-020084 >> >> 018326.sst 018449.sst 018950.sst 019083.sst 019345.sst 019400.sst >> >> 019674.sst 019764.sst 019881.sst 019920.sst 019963.sst >> 02.sst >> >> 020035.sst 020058.sst 020072.sst 020108.sst OPTIONS-020084 >> >> 018327.sst 018540.sst 018952.sst 019126.sst 019346.sst 019470.sst >> >> 019675.sst 019765.sst 019882.sst 019921.sst 019964.sst >> 020001.sst >> >> 020036.sst 020059.sst 020073.sst 020109.sst OPTIONS-020087 >> >> 018328.sst 018541.sst 018953.sst 019127.sst 019370.sst 019471.sst >> >> 019676.sst 019766.sst 019883.sst 019922.sst 019965.sst >> 020002.sst >> >> 020037.sst 020060.sst 020074.sst 020110.sst >> >> 018329.sst 018590.sst 018954.sst 019128.sst 019371.sst 019472.sst >> >> 019677.sst 019845.sst 019884.sst 019923.sst 019989.sst >> 020003.sst >> >> 020038.sst 020061.sst 020075.sst 020111.sst >> >> 018406.sst 018591.sst 018995.sst 019174.sst 019372.sst 019473.sst >> >> 019678.sst 019846.sst 019885.sst 019950.sst 019992.sst >> 020004.sst >> >> 020039.sst 020062.sst 020094.sst 020112.sst >> >> 018407.sst 018727.sst 018996.sst 019175.sst 019373.sst 019474.sst >> >> 019753.sst 019847.sst 019886.sst 019955.sst 019993.sst >> 020005.sst >> >> 020040.sst 020063.sst 020095.sst 020113.sst >> >> 018443.sst 018728.sst 019073.sst 019176.sst 019380.sst 019475.sst >> >> 019754.sst 019848.sst 019887.sst 019956.sst 019994.sst >> 020006.sst >> >> 020041.sst 020064.sst 020096.sst 020114.sst >> >> >> >> db.slow: >> >> >> >> db.wal: >> >> 020085.log 020088.log >> >> [root@s3db10 export-bluefs2]# du -hs >> >> 12G . >> >> [root@s3db10 export-bluefs2]# cat db/CURRENT >> >> MANIFEST-020084 >> >>
[ceph-users] Re: Process for adding a separate block.db to an osd
I have no idea why, but it worked. As the fsck went well, I just re did the bluefs-bdev-new-db and now the OSD is back up, with a block.db device. Thanks a lot Am Mo., 17. Mai 2021 um 15:28 Uhr schrieb Igor Fedotov : > If you haven't had successful OSD.68 starts with standalone DB I think > it's safe to revert previous DB adding and just retry it. > > At first suggest to run bluefs-bdev-new-db command only and then do fsck > again. If it's OK - proceed with bluefs migrate followed by another > fsck. And then finalize with adding lvm tags and OSD activation. > > > Thanks, > > Igor > > On 5/17/2021 3:47 PM, Boris Behrens wrote: > > The FSCK looks good: > > > > [root@s3db10 export-bluefs2]# ceph-bluestore-tool --path > > /var/lib/ceph/osd/ceph-68 fsck > > fsck success > > > > Am Mo., 17. Mai 2021 um 14:39 Uhr schrieb Boris Behrens : > > > >> Here is the new output. I kept both for now. > >> > >> [root@s3db10 export-bluefs2]# ls * > >> db: > >> 018215.sst 018444.sst 018839.sst 019074.sst 019210.sst 019381.sst > >> 019560.sst 019755.sst 019849.sst 019888.sst 019958.sst 019995.sst > >> 020007.sst 020042.sst 020067.sst 020098.sst 020115.sst > >> 018216.sst 018445.sst 018840.sst 019075.sst 019211.sst 019382.sst > >> 019670.sst 019756.sst 019877.sst 019889.sst 019959.sst 019996.sst > >> 020008.sst 020043.sst 020068.sst 020104.sst CURRENT > >> 018273.sst 018446.sst 018876.sst 019076.sst 019256.sst 019383.sst > >> 019671.sst 019757.sst 019878.sst 019890.sst 019960.sst 019997.sst > >> 020030.sst 020055.sst 020069.sst 020105.sst IDENTITY > >> 018300.sst 018447.sst 018877.sst 019081.sst 019257.sst 019395.sst > >> 019672.sst 019762.sst 019879.sst 019918.sst 019961.sst 019998.sst > >> 020031.sst 020056.sst 020070.sst 020106.sst LOCK > >> 018301.sst 018448.sst 018904.sst 019082.sst 019344.sst 019396.sst > >> 019673.sst 019763.sst 019880.sst 019919.sst 019962.sst 01.sst > >> 020032.sst 020057.sst 020071.sst 020107.sst MANIFEST-020084 > >> 018326.sst 018449.sst 018950.sst 019083.sst 019345.sst 019400.sst > >> 019674.sst 019764.sst 019881.sst 019920.sst 019963.sst 02.sst > >> 020035.sst 020058.sst 020072.sst 020108.sst OPTIONS-020084 > >> 018327.sst 018540.sst 018952.sst 019126.sst 019346.sst 019470.sst > >> 019675.sst 019765.sst 019882.sst 019921.sst 019964.sst 020001.sst > >> 020036.sst 020059.sst 020073.sst 020109.sst OPTIONS-020087 > >> 018328.sst 018541.sst 018953.sst 019127.sst 019370.sst 019471.sst > >> 019676.sst 019766.sst 019883.sst 019922.sst 019965.sst 020002.sst > >> 020037.sst 020060.sst 020074.sst 020110.sst > >> 018329.sst 018590.sst 018954.sst 019128.sst 019371.sst 019472.sst > >> 019677.sst 019845.sst 019884.sst 019923.sst 019989.sst 020003.sst > >> 020038.sst 020061.sst 020075.sst 020111.sst > >> 018406.sst 018591.sst 018995.sst 019174.sst 019372.sst 019473.sst > >> 019678.sst 019846.sst 019885.sst 019950.sst 019992.sst 020004.sst > >> 020039.sst 020062.sst 020094.sst 020112.sst > >> 018407.sst 018727.sst 018996.sst 019175.sst 019373.sst 019474.sst > >> 019753.sst 019847.sst 019886.sst 019955.sst 019993.sst 020005.sst > >> 020040.sst 020063.sst 020095.sst 020113.sst > >> 018443.sst 018728.sst 019073.sst 019176.sst 019380.sst 019475.sst > >> 019754.sst 019848.sst 019887.sst 019956.sst 019994.sst 020006.sst > >> 020041.sst 020064.sst 020096.sst 020114.sst > >> > >> db.slow: > >> > >> db.wal: > >> 020085.log 020088.log > >> [root@s3db10 export-bluefs2]# du -hs > >> 12G . > >> [root@s3db10 export-bluefs2]# cat db/CURRENT > >> MANIFEST-020084 > >> > >> Am Mo., 17. Mai 2021 um 14:28 Uhr schrieb Igor Fedotov < > ifedo...@suse.de>: > >> > >>> On 5/17/2021 2:53 PM, Boris Behrens wrote: > >>>> Like this? > >>> Yeah. > >>> > >>> so DB dir structure is more or less O but db/CURRENT looks corrupted. > It > >>> should contain something like: MANIFEST-020081 > >>> > >>> Could you please remove (or even just rename) block.db symlink and do > >>> the export again? Preferably to preserve the results for the first > export. > >>> > >>> if export reveals proper CURRENT content - you
[ceph-users] Re: Process for adding a separate block.db to an osd
See my last mail :) Am Mo., 17. Mai 2021 um 14:52 Uhr schrieb Igor Fedotov : > Would you try fsck without standalone DB? > > On 5/17/2021 3:39 PM, Boris Behrens wrote: > > Here is the new output. I kept both for now. > > > > [root@s3db10 export-bluefs2]# ls * > > db: > > 018215.sst 018444.sst 018839.sst 019074.sst 019210.sst 019381.sst > > 019560.sst 019755.sst 019849.sst 019888.sst 019958.sst 019995.sst > > 020007.sst 020042.sst 020067.sst 020098.sst 020115.sst > > 018216.sst 018445.sst 018840.sst 019075.sst 019211.sst 019382.sst > > 019670.sst 019756.sst 019877.sst 019889.sst 019959.sst 019996.sst > > 020008.sst 020043.sst 020068.sst 020104.sst CURRENT > > 018273.sst 018446.sst 018876.sst 019076.sst 019256.sst 019383.sst > > 019671.sst 019757.sst 019878.sst 019890.sst 019960.sst 019997.sst > > 020030.sst 020055.sst 020069.sst 020105.sst IDENTITY > > 018300.sst 018447.sst 018877.sst 019081.sst 019257.sst 019395.sst > > 019672.sst 019762.sst 019879.sst 019918.sst 019961.sst 019998.sst > > 020031.sst 020056.sst 020070.sst 020106.sst LOCK > > 018301.sst 018448.sst 018904.sst 019082.sst 019344.sst 019396.sst > > 019673.sst 019763.sst 019880.sst 019919.sst 019962.sst 01.sst > > 020032.sst 020057.sst 020071.sst 020107.sst MANIFEST-020084 > > 018326.sst 018449.sst 018950.sst 019083.sst 019345.sst 019400.sst > > 019674.sst 019764.sst 019881.sst 019920.sst 019963.sst 02.sst > > 020035.sst 020058.sst 020072.sst 020108.sst OPTIONS-020084 > > 018327.sst 018540.sst 018952.sst 019126.sst 019346.sst 019470.sst > > 019675.sst 019765.sst 019882.sst 019921.sst 019964.sst 020001.sst > > 020036.sst 020059.sst 020073.sst 020109.sst OPTIONS-020087 > > 018328.sst 018541.sst 018953.sst 019127.sst 019370.sst 019471.sst > > 019676.sst 019766.sst 019883.sst 019922.sst 019965.sst 020002.sst > > 020037.sst 020060.sst 020074.sst 020110.sst > > 018329.sst 018590.sst 018954.sst 019128.sst 019371.sst 019472.sst > > 019677.sst 019845.sst 019884.sst 019923.sst 019989.sst 020003.sst > > 020038.sst 020061.sst 020075.sst 020111.sst > > 018406.sst 018591.sst 018995.sst 019174.sst 019372.sst 019473.sst > > 019678.sst 019846.sst 019885.sst 019950.sst 019992.sst 020004.sst > > 020039.sst 020062.sst 020094.sst 020112.sst > > 018407.sst 018727.sst 018996.sst 019175.sst 019373.sst 019474.sst > > 019753.sst 019847.sst 019886.sst 019955.sst 019993.sst 020005.sst > > 020040.sst 020063.sst 020095.sst 020113.sst > > 018443.sst 018728.sst 019073.sst 019176.sst 019380.sst 019475.sst > > 019754.sst 019848.sst 019887.sst 019956.sst 019994.sst 020006.sst > > 020041.sst 020064.sst 020096.sst 020114.sst > > > > db.slow: > > > > db.wal: > > 020085.log 020088.log > > [root@s3db10 export-bluefs2]# du -hs > > 12G . > > [root@s3db10 export-bluefs2]# cat db/CURRENT > > MANIFEST-020084 > > > > Am Mo., 17. Mai 2021 um 14:28 Uhr schrieb Igor Fedotov >: > > > >> On 5/17/2021 2:53 PM, Boris Behrens wrote: > >>> Like this? > >> Yeah. > >> > >> so DB dir structure is more or less O but db/CURRENT looks corrupted. It > >> should contain something like: MANIFEST-020081 > >> > >> Could you please remove (or even just rename) block.db symlink and do > the > >> export again? Preferably to preserve the results for the first export. > >> > >> if export reveals proper CURRENT content - you might want to run fsck on > >> the OSD... > >> > >>> [root@s3db10 export-bluefs]# ls * > >>> db: > >>> 018215.sst 018444.sst 018839.sst 019074.sst 019174.sst 019372.sst > >>>019470.sst 019675.sst 019765.sst 019882.sst 019918.sst > 019961.sst > >>>019997.sst 020022.sst 020042.sst 020061.sst 020073.sst > >>> 018216.sst 018445.sst 018840.sst 019075.sst 019175.sst 019373.sst > >>>019471.sst 019676.sst 019766.sst 019883.sst 019919.sst > 019962.sst > >>>019998.sst 020023.sst 020043.sst 020062.sst 020074.sst > >>> 018273.sst 018446.sst 018876.sst 019076.sst 019176.sst 019380.sst > >>>019472.sst 019677.sst 019845.sst 019884.sst 019920.sst > 019963.sst > >>>01.sst 020030.sst 020049.sst 020063.sst 020075.sst > >>> 018300.sst 018447.sst 018877.sst 019077.sst 019210.sst 019381.sst > >>>019473.sst 019678.sst 0
[ceph-users] Re: Process for adding a separate block.db to an osd
The FSCK looks good: [root@s3db10 export-bluefs2]# ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-68 fsck fsck success Am Mo., 17. Mai 2021 um 14:39 Uhr schrieb Boris Behrens : > Here is the new output. I kept both for now. > > [root@s3db10 export-bluefs2]# ls * > db: > 018215.sst 018444.sst 018839.sst 019074.sst 019210.sst 019381.sst > 019560.sst 019755.sst 019849.sst 019888.sst 019958.sst 019995.sst > 020007.sst 020042.sst 020067.sst 020098.sst 020115.sst > 018216.sst 018445.sst 018840.sst 019075.sst 019211.sst 019382.sst > 019670.sst 019756.sst 019877.sst 019889.sst 019959.sst 019996.sst > 020008.sst 020043.sst 020068.sst 020104.sst CURRENT > 018273.sst 018446.sst 018876.sst 019076.sst 019256.sst 019383.sst > 019671.sst 019757.sst 019878.sst 019890.sst 019960.sst 019997.sst > 020030.sst 020055.sst 020069.sst 020105.sst IDENTITY > 018300.sst 018447.sst 018877.sst 019081.sst 019257.sst 019395.sst > 019672.sst 019762.sst 019879.sst 019918.sst 019961.sst 019998.sst > 020031.sst 020056.sst 020070.sst 020106.sst LOCK > 018301.sst 018448.sst 018904.sst 019082.sst 019344.sst 019396.sst > 019673.sst 019763.sst 019880.sst 019919.sst 019962.sst 01.sst > 020032.sst 020057.sst 020071.sst 020107.sst MANIFEST-020084 > 018326.sst 018449.sst 018950.sst 019083.sst 019345.sst 019400.sst > 019674.sst 019764.sst 019881.sst 019920.sst 019963.sst 02.sst > 020035.sst 020058.sst 020072.sst 020108.sst OPTIONS-020084 > 018327.sst 018540.sst 018952.sst 019126.sst 019346.sst 019470.sst > 019675.sst 019765.sst 019882.sst 019921.sst 019964.sst 020001.sst > 020036.sst 020059.sst 020073.sst 020109.sst OPTIONS-020087 > 018328.sst 018541.sst 018953.sst 019127.sst 019370.sst 019471.sst > 019676.sst 019766.sst 019883.sst 019922.sst 019965.sst 020002.sst > 020037.sst 020060.sst 020074.sst 020110.sst > 018329.sst 018590.sst 018954.sst 019128.sst 019371.sst 019472.sst > 019677.sst 019845.sst 019884.sst 019923.sst 019989.sst 020003.sst > 020038.sst 020061.sst 020075.sst 020111.sst > 018406.sst 018591.sst 018995.sst 019174.sst 019372.sst 019473.sst > 019678.sst 019846.sst 019885.sst 019950.sst 019992.sst 020004.sst > 020039.sst 020062.sst 020094.sst 020112.sst > 018407.sst 018727.sst 018996.sst 019175.sst 019373.sst 019474.sst > 019753.sst 019847.sst 019886.sst 019955.sst 019993.sst 020005.sst > 020040.sst 020063.sst 020095.sst 020113.sst > 018443.sst 018728.sst 019073.sst 019176.sst 019380.sst 019475.sst > 019754.sst 019848.sst 019887.sst 019956.sst 019994.sst 020006.sst > 020041.sst 020064.sst 020096.sst 020114.sst > > db.slow: > > db.wal: > 020085.log 020088.log > [root@s3db10 export-bluefs2]# du -hs > 12G . > [root@s3db10 export-bluefs2]# cat db/CURRENT > MANIFEST-020084 > > Am Mo., 17. Mai 2021 um 14:28 Uhr schrieb Igor Fedotov : > >> On 5/17/2021 2:53 PM, Boris Behrens wrote: >> > Like this? >> >> Yeah. >> >> so DB dir structure is more or less O but db/CURRENT looks corrupted. It >> should contain something like: MANIFEST-020081 >> >> Could you please remove (or even just rename) block.db symlink and do >> the export again? Preferably to preserve the results for the first export. >> >> if export reveals proper CURRENT content - you might want to run fsck on >> the OSD... >> >> > >> > [root@s3db10 export-bluefs]# ls * >> > db: >> > 018215.sst 018444.sst 018839.sst 019074.sst 019174.sst 019372.sst >> > 019470.sst 019675.sst 019765.sst 019882.sst 019918.sst 019961.sst >> > 019997.sst 020022.sst 020042.sst 020061.sst 020073.sst >> > 018216.sst 018445.sst 018840.sst 019075.sst 019175.sst 019373.sst >> > 019471.sst 019676.sst 019766.sst 019883.sst 019919.sst 019962.sst >> > 019998.sst 020023.sst 020043.sst 020062.sst 020074.sst >> > 018273.sst 018446.sst 018876.sst 019076.sst 019176.sst 019380.sst >> > 019472.sst 019677.sst 019845.sst 019884.sst 019920.sst 019963.sst >> > 01.sst 020030.sst 020049.sst 020063.sst 020075.sst >> > 018300.sst 018447.sst 018877.sst 019077.sst 019210.sst 019381.sst >> > 019473.sst 019678.sst 019846.sst 019885.sst 019921.sst 019964.sst >> > 02.sst 020031.sst 020051.sst 020064.sst 020077.sst >> > 018301.sst 018448.sst 018904.sst 019081.sst 019211.sst 019382.sst >> > 019474.sst 019753.sst 019847.sst 019886.sst 019922.sst 019965.sst >> > 020001.sst 020032.sst 020052.sst 020065.sst 020080.sst >> > 018326.sst 018
[ceph-users] Re: Process for adding a separate block.db to an osd
Here is the new output. I kept both for now. [root@s3db10 export-bluefs2]# ls * db: 018215.sst 018444.sst 018839.sst 019074.sst 019210.sst 019381.sst 019560.sst 019755.sst 019849.sst 019888.sst 019958.sst 019995.sst 020007.sst 020042.sst 020067.sst 020098.sst 020115.sst 018216.sst 018445.sst 018840.sst 019075.sst 019211.sst 019382.sst 019670.sst 019756.sst 019877.sst 019889.sst 019959.sst 019996.sst 020008.sst 020043.sst 020068.sst 020104.sst CURRENT 018273.sst 018446.sst 018876.sst 019076.sst 019256.sst 019383.sst 019671.sst 019757.sst 019878.sst 019890.sst 019960.sst 019997.sst 020030.sst 020055.sst 020069.sst 020105.sst IDENTITY 018300.sst 018447.sst 018877.sst 019081.sst 019257.sst 019395.sst 019672.sst 019762.sst 019879.sst 019918.sst 019961.sst 019998.sst 020031.sst 020056.sst 020070.sst 020106.sst LOCK 018301.sst 018448.sst 018904.sst 019082.sst 019344.sst 019396.sst 019673.sst 019763.sst 019880.sst 019919.sst 019962.sst 01.sst 020032.sst 020057.sst 020071.sst 020107.sst MANIFEST-020084 018326.sst 018449.sst 018950.sst 019083.sst 019345.sst 019400.sst 019674.sst 019764.sst 019881.sst 019920.sst 019963.sst 02.sst 020035.sst 020058.sst 020072.sst 020108.sst OPTIONS-020084 018327.sst 018540.sst 018952.sst 019126.sst 019346.sst 019470.sst 019675.sst 019765.sst 019882.sst 019921.sst 019964.sst 020001.sst 020036.sst 020059.sst 020073.sst 020109.sst OPTIONS-020087 018328.sst 018541.sst 018953.sst 019127.sst 019370.sst 019471.sst 019676.sst 019766.sst 019883.sst 019922.sst 019965.sst 020002.sst 020037.sst 020060.sst 020074.sst 020110.sst 018329.sst 018590.sst 018954.sst 019128.sst 019371.sst 019472.sst 019677.sst 019845.sst 019884.sst 019923.sst 019989.sst 020003.sst 020038.sst 020061.sst 020075.sst 020111.sst 018406.sst 018591.sst 018995.sst 019174.sst 019372.sst 019473.sst 019678.sst 019846.sst 019885.sst 019950.sst 019992.sst 020004.sst 020039.sst 020062.sst 020094.sst 020112.sst 018407.sst 018727.sst 018996.sst 019175.sst 019373.sst 019474.sst 019753.sst 019847.sst 019886.sst 019955.sst 019993.sst 020005.sst 020040.sst 020063.sst 020095.sst 020113.sst 018443.sst 018728.sst 019073.sst 019176.sst 019380.sst 019475.sst 019754.sst 019848.sst 019887.sst 019956.sst 019994.sst 020006.sst 020041.sst 020064.sst 020096.sst 020114.sst db.slow: db.wal: 020085.log 020088.log [root@s3db10 export-bluefs2]# du -hs 12G . [root@s3db10 export-bluefs2]# cat db/CURRENT MANIFEST-020084 Am Mo., 17. Mai 2021 um 14:28 Uhr schrieb Igor Fedotov : > On 5/17/2021 2:53 PM, Boris Behrens wrote: > > Like this? > > Yeah. > > so DB dir structure is more or less O but db/CURRENT looks corrupted. It > should contain something like: MANIFEST-020081 > > Could you please remove (or even just rename) block.db symlink and do the > export again? Preferably to preserve the results for the first export. > > if export reveals proper CURRENT content - you might want to run fsck on > the OSD... > > > > > [root@s3db10 export-bluefs]# ls * > > db: > > 018215.sst 018444.sst 018839.sst 019074.sst 019174.sst 019372.sst > > 019470.sst 019675.sst 019765.sst 019882.sst 019918.sst 019961.sst > > 019997.sst 020022.sst 020042.sst 020061.sst 020073.sst > > 018216.sst 018445.sst 018840.sst 019075.sst 019175.sst 019373.sst > > 019471.sst 019676.sst 019766.sst 019883.sst 019919.sst 019962.sst > > 019998.sst 020023.sst 020043.sst 020062.sst 020074.sst > > 018273.sst 018446.sst 018876.sst 019076.sst 019176.sst 019380.sst > > 019472.sst 019677.sst 019845.sst 019884.sst 019920.sst 019963.sst > > 01.sst 020030.sst 020049.sst 020063.sst 020075.sst > > 018300.sst 018447.sst 018877.sst 019077.sst 019210.sst 019381.sst > > 019473.sst 019678.sst 019846.sst 019885.sst 019921.sst 019964.sst > > 02.sst 020031.sst 020051.sst 020064.sst 020077.sst > > 018301.sst 018448.sst 018904.sst 019081.sst 019211.sst 019382.sst > > 019474.sst 019753.sst 019847.sst 019886.sst 019922.sst 019965.sst > > 020001.sst 020032.sst 020052.sst 020065.sst 020080.sst > > 018326.sst 018449.sst 018950.sst 019082.sst 019256.sst 019383.sst > > 019475.sst 019754.sst 019848.sst 019887.sst 019923.sst 019986.sst > > 020002.sst 020035.sst 020053.sst 020066.sst CURRENT > > 018327.sst 018540.sst 018952.sst 019083.sst 019257.sst 019395.sst > > 019560.sst 019755.sst 019849.sst 019888.sst 019950.sst 019989.sst > > 020003.sst 020036.sst 020055.sst 020067.sst IDENTITY > > 018328.sst 018541.sst 018953.sst 019124.sst 019344.sst 019396.sst > > 019670.sst 019756.sst 019877.sst 019889.s
[ceph-users] Re: Process for adding a separate block.db to an osd
Like this? [root@s3db10 export-bluefs]# ls * db: 018215.sst 018444.sst 018839.sst 019074.sst 019174.sst 019372.sst 019470.sst 019675.sst 019765.sst 019882.sst 019918.sst 019961.sst 019997.sst 020022.sst 020042.sst 020061.sst 020073.sst 018216.sst 018445.sst 018840.sst 019075.sst 019175.sst 019373.sst 019471.sst 019676.sst 019766.sst 019883.sst 019919.sst 019962.sst 019998.sst 020023.sst 020043.sst 020062.sst 020074.sst 018273.sst 018446.sst 018876.sst 019076.sst 019176.sst 019380.sst 019472.sst 019677.sst 019845.sst 019884.sst 019920.sst 019963.sst 01.sst 020030.sst 020049.sst 020063.sst 020075.sst 018300.sst 018447.sst 018877.sst 019077.sst 019210.sst 019381.sst 019473.sst 019678.sst 019846.sst 019885.sst 019921.sst 019964.sst 02.sst 020031.sst 020051.sst 020064.sst 020077.sst 018301.sst 018448.sst 018904.sst 019081.sst 019211.sst 019382.sst 019474.sst 019753.sst 019847.sst 019886.sst 019922.sst 019965.sst 020001.sst 020032.sst 020052.sst 020065.sst 020080.sst 018326.sst 018449.sst 018950.sst 019082.sst 019256.sst 019383.sst 019475.sst 019754.sst 019848.sst 019887.sst 019923.sst 019986.sst 020002.sst 020035.sst 020053.sst 020066.sst CURRENT 018327.sst 018540.sst 018952.sst 019083.sst 019257.sst 019395.sst 019560.sst 019755.sst 019849.sst 019888.sst 019950.sst 019989.sst 020003.sst 020036.sst 020055.sst 020067.sst IDENTITY 018328.sst 018541.sst 018953.sst 019124.sst 019344.sst 019396.sst 019670.sst 019756.sst 019877.sst 019889.sst 019955.sst 019992.sst 020004.sst 020037.sst 020056.sst 020068.sst LOCK 018329.sst 018590.sst 018954.sst 019125.sst 019345.sst 019400.sst 019671.sst 019757.sst 019878.sst 019890.sst 019956.sst 019993.sst 020005.sst 020038.sst 020057.sst 020069.sst MANIFEST-020081 018406.sst 018591.sst 018995.sst 019126.sst 019346.sst 019467.sst 019672.sst 019762.sst 019879.sst 019915.sst 019958.sst 019994.sst 020006.sst 020039.sst 020058.sst 020070.sst OPTIONS-020081 018407.sst 018727.sst 018996.sst 019127.sst 019370.sst 019468.sst 019673.sst 019763.sst 019880.sst 019916.sst 019959.sst 019995.sst 020007.sst 020040.sst 020059.sst 020071.sst OPTIONS-020084 018443.sst 018728.sst 019073.sst 019128.sst 019371.sst 019469.sst 019674.sst 019764.sst 019881.sst 019917.sst 019960.sst 019996.sst 020008.sst 020041.sst 020060.sst 020072.sst db.slow: db.wal: 020082.log [root@s3db10 export-bluefs]# du -hs 12G . [root@s3db10 export-bluefs]# cat db/CURRENT �g�U uN�[�+p[root@s3db10 export-bluefs]# Am Mo., 17. Mai 2021 um 13:45 Uhr schrieb Igor Fedotov : > You might want to check file structure at new DB using bluestore-tools's > bluefs-export command: > > ceph-bluestore-tool --path --command bluefs-export --out > > > needs to have enough free space enough to fit DB data. > > Once exported - does contain valid BlueFS directory > structure - multiple .sst files, CURRENT and IDENTITY files etc? > > If so then please check and share the content of /db/CURRENT > file. > > > Thanks, > > Igor > > On 5/17/2021 1:32 PM, Boris Behrens wrote: > > Hi Igor, > > I posted it on pastebin: https://pastebin.com/Ze9EuCMD > > > > Cheers > > Boris > > > > Am Mo., 17. Mai 2021 um 12:22 Uhr schrieb Igor Fedotov >: > > > >> Hi Boris, > >> > >> could you please share full OSD startup log and file listing for > >> '/var/lib/ceph/osd/ceph-68'? > >> > >> > >> Thanks, > >> > >> Igor > >> > >> On 5/17/2021 1:09 PM, Boris Behrens wrote: > >>> Hi, > >>> sorry for replying to this old thread: > >>> > >>> I tried to add a block.db to an OSD but now the OSD can not start with > >> the > >>> error: > >>> Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -7> > 2021-05-17 > >>> 09:50:38.362 7fc48ec94a80 -1 rocksdb: Corruption: CURRENT file does not > >> end > >>> with newline > >>> Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -6> > 2021-05-17 > >>> 09:50:38.362 7fc48ec94a80 -1 bluestore(/var/lib/ceph/osd/ceph-68) > >> _open_db > >>> erroring opening db: > >>> Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -1> > 2021-05-17 > >>> 09:50:38.866 7fc48ec94a80 -1 > >>> > >> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.21/rpm/el7/BUILD/ceph-14.2.21/src/os/bluestore/BlueStore.cc: > >>> In function 'int BlueStore::_upgrade_super()' thread 7fc4
[ceph-users] Re: Process for adding a separate block.db to an osd
Hi Igor, I posted it on pastebin: https://pastebin.com/Ze9EuCMD Cheers Boris Am Mo., 17. Mai 2021 um 12:22 Uhr schrieb Igor Fedotov : > Hi Boris, > > could you please share full OSD startup log and file listing for > '/var/lib/ceph/osd/ceph-68'? > > > Thanks, > > Igor > > On 5/17/2021 1:09 PM, Boris Behrens wrote: > > Hi, > > sorry for replying to this old thread: > > > > I tried to add a block.db to an OSD but now the OSD can not start with > the > > error: > > Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -7> 2021-05-17 > > 09:50:38.362 7fc48ec94a80 -1 rocksdb: Corruption: CURRENT file does not > end > > with newline > > Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -6> 2021-05-17 > > 09:50:38.362 7fc48ec94a80 -1 bluestore(/var/lib/ceph/osd/ceph-68) > _open_db > > erroring opening db: > > Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -1> 2021-05-17 > > 09:50:38.866 7fc48ec94a80 -1 > > > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.21/rpm/el7/BUILD/ceph-14.2.21/src/os/bluestore/BlueStore.cc: > > In function 'int BlueStore::_upgrade_super()' thread 7fc48ec94a80 time > > 2021-05-17 09:50:38.865204 > > Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: > > > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.21/rpm/el7/BUILD/ceph-14.2.21/src/os/bluestore/BlueStore.cc: > > 10647: FAILED ceph_assert(ondisk_format > 0) > > > > I tried to run an fsck/repair on the disk: > > [root@s3db10 osd]# ceph-bluestore-tool --path ceph-68 repair > > 2021-05-17 10:05:25.695 7f714dea3ec0 -1 rocksdb: Corruption: CURRENT file > > does not end with newline > > 2021-05-17 10:05:25.695 7f714dea3ec0 -1 bluestore(ceph-68) _open_db > > erroring opening db: > > error from fsck: (5) Input/output error > > [root@s3db10 osd]# ceph-bluestore-tool --path ceph-68 fsck > > 2021-05-17 10:05:35.012 7fb8f22e6ec0 -1 rocksdb: Corruption: CURRENT file > > does not end with newline > > 2021-05-17 10:05:35.012 7fb8f22e6ec0 -1 bluestore(ceph-68) _open_db > > erroring opening db: > > error from fsck: (5) Input/output error > > > > These are the steps I did to add the disk: > > $ CEPH_ARGS="--bluestore-block-db-size 53687091200 > > --bluestore_block_db_create=true" ceph-bluestore-tool bluefs-bdev-new-db > > --path /var/lib/ceph/osd/ceph-68 --dev-target /dev/sdj1 > > $ chown -h ceph:ceph /var/lib/ceph/osd/ceph-68/block.db > > $ lvchange --addtag ceph.db_device=/dev/sdj1 > > > /dev/ceph-3bbfd168-2a54-4593-a037-80d0d7e97afd/osd-block-aaeaea54-eb6a-480c-b2fd-d938e336c0f6 > > $ lvchange --addtag ceph.db_uuid=463dd37c-fd49-4ccb-849f-c5827d3d9df2 > > > /dev/ceph-3bbfd168-2a54-4593-a037-80d0d7e97afd/osd-block-aaeaea54-eb6a-480c-b2fd-d938e336c0f6 > > $ ceph-volume lvm activate --all > > > > The UUIDs > > later I tried this: > > $ ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-68 --devs-source > > /var/lib/ceph/osd/ceph-68/block --dev-target > > /var/lib/ceph/osd/ceph-68/block.db bluefs-bdev-migrate > > > > Any ideas how I can get the rocksdb fixed? > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Process for adding a separate block.db to an osd
Hi, sorry for replying to this old thread: I tried to add a block.db to an OSD but now the OSD can not start with the error: Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -7> 2021-05-17 09:50:38.362 7fc48ec94a80 -1 rocksdb: Corruption: CURRENT file does not end with newline Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -6> 2021-05-17 09:50:38.362 7fc48ec94a80 -1 bluestore(/var/lib/ceph/osd/ceph-68) _open_db erroring opening db: Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -1> 2021-05-17 09:50:38.866 7fc48ec94a80 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.21/rpm/el7/BUILD/ceph-14.2.21/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_upgrade_super()' thread 7fc48ec94a80 time 2021-05-17 09:50:38.865204 Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.21/rpm/el7/BUILD/ceph-14.2.21/src/os/bluestore/BlueStore.cc: 10647: FAILED ceph_assert(ondisk_format > 0) I tried to run an fsck/repair on the disk: [root@s3db10 osd]# ceph-bluestore-tool --path ceph-68 repair 2021-05-17 10:05:25.695 7f714dea3ec0 -1 rocksdb: Corruption: CURRENT file does not end with newline 2021-05-17 10:05:25.695 7f714dea3ec0 -1 bluestore(ceph-68) _open_db erroring opening db: error from fsck: (5) Input/output error [root@s3db10 osd]# ceph-bluestore-tool --path ceph-68 fsck 2021-05-17 10:05:35.012 7fb8f22e6ec0 -1 rocksdb: Corruption: CURRENT file does not end with newline 2021-05-17 10:05:35.012 7fb8f22e6ec0 -1 bluestore(ceph-68) _open_db erroring opening db: error from fsck: (5) Input/output error These are the steps I did to add the disk: $ CEPH_ARGS="--bluestore-block-db-size 53687091200 --bluestore_block_db_create=true" ceph-bluestore-tool bluefs-bdev-new-db --path /var/lib/ceph/osd/ceph-68 --dev-target /dev/sdj1 $ chown -h ceph:ceph /var/lib/ceph/osd/ceph-68/block.db $ lvchange --addtag ceph.db_device=/dev/sdj1 /dev/ceph-3bbfd168-2a54-4593-a037-80d0d7e97afd/osd-block-aaeaea54-eb6a-480c-b2fd-d938e336c0f6 $ lvchange --addtag ceph.db_uuid=463dd37c-fd49-4ccb-849f-c5827d3d9df2 /dev/ceph-3bbfd168-2a54-4593-a037-80d0d7e97afd/osd-block-aaeaea54-eb6a-480c-b2fd-d938e336c0f6 $ ceph-volume lvm activate --all The UUIDs later I tried this: $ ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-68 --devs-source /var/lib/ceph/osd/ceph-68/block --dev-target /var/lib/ceph/osd/ceph-68/block.db bluefs-bdev-migrate Any ideas how I can get the rocksdb fixed? -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)
I actually WAS the amount of watchers... narf.. This is so embarissing.. Thanks a lot for all your input. Am Di., 11. Mai 2021 um 13:54 Uhr schrieb Boris Behrens : > I tried to debug it with --debug-ms=1. > Maybe someone could help me to wrap my head around it? > https://pastebin.com/LD9qrm3x > > > > Am Di., 11. Mai 2021 um 11:17 Uhr schrieb Boris Behrens : > >> Good call. I just restarted the whole cluster, but the problem still >> persists. >> I don't think it is a problem with the rados, but with the radosgw. >> >> But I still struggle to pin the issue. >> >> Am Di., 11. Mai 2021 um 10:45 Uhr schrieb Thomas Schneider < >> thomas.schneider-...@ruhr-uni-bochum.de>: >> >>> Hey all, >>> >>> we had slow RGW access when some OSDs were slow due to an (to us) >>> unknown OSD bug that made PG access either slow or impossible. (It showed >>> itself through slowness of the mgr as well, but nothing other than that). >>> We restarted all OSDs that held RGW data and the problem was gone. >>> I have no good way to debug the problem since it never occured again >>> after we restarted the OSDs. >>> >>> Kind regards, >>> Thomas >>> >>> >>> Am 11. Mai 2021 08:47:06 MESZ schrieb Boris Behrens : >>> >Hi Amit, >>> > >>> >I just pinged the mons from every system and they are all available. >>> > >>> >Am Mo., 10. Mai 2021 um 21:18 Uhr schrieb Amit Ghadge < >>> amitg@gmail.com>: >>> > >>> >> We seen slowness due to unreachable one of them mgr service, maybe >>> here >>> >> are different, you can check monmap/ ceph.conf mon entry and then >>> verify >>> >> all nodes are successfully ping. >>> >> >>> >> >>> >> -AmitG >>> >> >>> >> >>> >> On Tue, 11 May 2021 at 12:12 AM, Boris Behrens wrote: >>> >> >>> >>> Hi guys, >>> >>> >>> >>> does someone got any idea? >>> >>> >>> >>> Am Mi., 5. Mai 2021 um 16:16 Uhr schrieb Boris Behrens >> >: >>> >>> >>> >>> > Hi, >>> >>> > since a couple of days we experience a strange slowness on some >>> >>> > radosgw-admin operations. >>> >>> > What is the best way to debug this? >>> >>> > >>> >>> > For example creating a user takes over 20s. >>> >>> > [root@s3db1 ~]# time radosgw-admin user create --uid test-bb-user >>> >>> > --display-name=test-bb-user >>> >>> > 2021-05-05 14:08:14.297 7f6942286840 1 robust_notify: If at first >>> you >>> >>> > don't succeed: (110) Connection timed out >>> >>> > 2021-05-05 14:08:14.297 7f6942286840 0 ERROR: failed to distribute >>> >>> cache >>> >>> > for eu-central-1.rgw.users.uid:test-bb-user >>> >>> > 2021-05-05 14:08:24.335 7f6942286840 1 robust_notify: If at first >>> you >>> >>> > don't succeed: (110) Connection timed out >>> >>> > 2021-05-05 14:08:24.335 7f6942286840 0 ERROR: failed to distribute >>> >>> cache >>> >>> > for eu-central-1.rgw.users.keys: >>> >>> > { >>> >>> > "user_id": "test-bb-user", >>> >>> > "display_name": "test-bb-user", >>> >>> > >>> >>> > } >>> >>> > real 0m20.557s >>> >>> > user 0m0.087s >>> >>> > sys 0m0.030s >>> >>> > >>> >>> > First I thought that rados operations might be slow, but adding and >>> >>> > deleting objects in rados are fast as usual (at least from my >>> >>> perspective). >>> >>> > Also uploading to buckets is fine. >>> >>> > >>> >>> > We changed some things and I think it might have to do with this: >>> >>> > * We have a HAProxy that distributes via leastconn between the 3 >>> >>> radosgw's >>> >>> > (this did not change) >>> >>> > * We had three times a daemon with the name "eu-central-1" running >>> (on >>> >>> the >>> >>> >
[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)
I tried to debug it with --debug-ms=1. Maybe someone could help me to wrap my head around it? https://pastebin.com/LD9qrm3x Am Di., 11. Mai 2021 um 11:17 Uhr schrieb Boris Behrens : > Good call. I just restarted the whole cluster, but the problem still > persists. > I don't think it is a problem with the rados, but with the radosgw. > > But I still struggle to pin the issue. > > Am Di., 11. Mai 2021 um 10:45 Uhr schrieb Thomas Schneider < > thomas.schneider-...@ruhr-uni-bochum.de>: > >> Hey all, >> >> we had slow RGW access when some OSDs were slow due to an (to us) unknown >> OSD bug that made PG access either slow or impossible. (It showed itself >> through slowness of the mgr as well, but nothing other than that). >> We restarted all OSDs that held RGW data and the problem was gone. >> I have no good way to debug the problem since it never occured again >> after we restarted the OSDs. >> >> Kind regards, >> Thomas >> >> >> Am 11. Mai 2021 08:47:06 MESZ schrieb Boris Behrens : >> >Hi Amit, >> > >> >I just pinged the mons from every system and they are all available. >> > >> >Am Mo., 10. Mai 2021 um 21:18 Uhr schrieb Amit Ghadge < >> amitg@gmail.com>: >> > >> >> We seen slowness due to unreachable one of them mgr service, maybe here >> >> are different, you can check monmap/ ceph.conf mon entry and then >> verify >> >> all nodes are successfully ping. >> >> >> >> >> >> -AmitG >> >> >> >> >> >> On Tue, 11 May 2021 at 12:12 AM, Boris Behrens wrote: >> >> >> >>> Hi guys, >> >>> >> >>> does someone got any idea? >> >>> >> >>> Am Mi., 5. Mai 2021 um 16:16 Uhr schrieb Boris Behrens > >: >> >>> >> >>> > Hi, >> >>> > since a couple of days we experience a strange slowness on some >> >>> > radosgw-admin operations. >> >>> > What is the best way to debug this? >> >>> > >> >>> > For example creating a user takes over 20s. >> >>> > [root@s3db1 ~]# time radosgw-admin user create --uid test-bb-user >> >>> > --display-name=test-bb-user >> >>> > 2021-05-05 14:08:14.297 7f6942286840 1 robust_notify: If at first >> you >> >>> > don't succeed: (110) Connection timed out >> >>> > 2021-05-05 14:08:14.297 7f6942286840 0 ERROR: failed to distribute >> >>> cache >> >>> > for eu-central-1.rgw.users.uid:test-bb-user >> >>> > 2021-05-05 14:08:24.335 7f6942286840 1 robust_notify: If at first >> you >> >>> > don't succeed: (110) Connection timed out >> >>> > 2021-05-05 14:08:24.335 7f6942286840 0 ERROR: failed to distribute >> >>> cache >> >>> > for eu-central-1.rgw.users.keys: >> >>> > { >> >>> > "user_id": "test-bb-user", >> >>> > "display_name": "test-bb-user", >> >>> > >> >>> > } >> >>> > real 0m20.557s >> >>> > user 0m0.087s >> >>> > sys 0m0.030s >> >>> > >> >>> > First I thought that rados operations might be slow, but adding and >> >>> > deleting objects in rados are fast as usual (at least from my >> >>> perspective). >> >>> > Also uploading to buckets is fine. >> >>> > >> >>> > We changed some things and I think it might have to do with this: >> >>> > * We have a HAProxy that distributes via leastconn between the 3 >> >>> radosgw's >> >>> > (this did not change) >> >>> > * We had three times a daemon with the name "eu-central-1" running >> (on >> >>> the >> >>> > 3 radosgw's) >> >>> > * Because this might have led to our data duplication problem, we >> have >> >>> > split that up so now the daemons are named per host >> (eu-central-1-s3db1, >> >>> > eu-central-1-s3db2, eu-central-1-s3db3) >> >>> > * We also added dedicated rgw daemons for garbage collection, >> because >> >>> the >> >>> > current one were not able to keep up. >> >>> > * So basically ceph status went from "rgw: 1 daemon active >> >>> (eu-central-1)" >> >>> > to "rgw: 14 daemons active (eu-central-1-s3db1, eu-central-1-s3db2, >> >>> > eu-central-1-s3db3, gc-s3db12, gc-s3db13...) >> >>> > >> >>> > >> >>> > Cheers >> >>> > Boris >> >>> > >> >>> >> >>> >> >>> -- >> >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend >> im >> >>> groüen Saal. >> >>> ___ >> >>> ceph-users mailing list -- ceph-users@ceph.io >> >>> To unsubscribe send an email to ceph-users-le...@ceph.io >> >>> >> >> >> > >> >> -- >> Thomas Schneider >> IT.SERVICES >> Wissenschaftliche Informationsversorgung Ruhr-Universität Bochum | 44780 >> Bochum >> Telefon: +49 234 32 23939 >> http://www.it-services.rub.de/ >> > > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > groüen Saal. > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] "radosgw-admin bucket radoslist" loops when a multipart upload is happening
Hi together, I still search for orphan objects and came across a strange bug: There is a huge multipart upload happening (around 4TB), and listing the rados objects in the bucket loops over the multipart upload. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)
Hi Amit, it is the same physical interface but different VLANs. I checked all IP adresses from all systems and everything is direct connected, without any gateway hops. Am Di., 11. Mai 2021 um 10:59 Uhr schrieb Amit Ghadge : > I hope you are using a single network interface for the public and cluster? > > On Tue, May 11, 2021 at 2:15 PM Thomas Schneider < > thomas.schneider-...@ruhr-uni-bochum.de> wrote: > >> Hey all, >> >> we had slow RGW access when some OSDs were slow due to an (to us) unknown >> OSD bug that made PG access either slow or impossible. (It showed itself >> through slowness of the mgr as well, but nothing other than that). >> We restarted all OSDs that held RGW data and the problem was gone. >> I have no good way to debug the problem since it never occured again >> after we restarted the OSDs. >> >> Kind regards, >> Thomas >> >> >> Am 11. Mai 2021 08:47:06 MESZ schrieb Boris Behrens : >> >Hi Amit, >> > >> >I just pinged the mons from every system and they are all available. >> > >> >Am Mo., 10. Mai 2021 um 21:18 Uhr schrieb Amit Ghadge < >> amitg@gmail.com>: >> > >> >> We seen slowness due to unreachable one of them mgr service, maybe here >> >> are different, you can check monmap/ ceph.conf mon entry and then >> verify >> >> all nodes are successfully ping. >> >> >> >> >> >> -AmitG >> >> >> >> >> >> On Tue, 11 May 2021 at 12:12 AM, Boris Behrens wrote: >> >> >> >>> Hi guys, >> >>> >> >>> does someone got any idea? >> >>> >> >>> Am Mi., 5. Mai 2021 um 16:16 Uhr schrieb Boris Behrens > >: >> >>> >> >>> > Hi, >> >>> > since a couple of days we experience a strange slowness on some >> >>> > radosgw-admin operations. >> >>> > What is the best way to debug this? >> >>> > >> >>> > For example creating a user takes over 20s. >> >>> > [root@s3db1 ~]# time radosgw-admin user create --uid test-bb-user >> >>> > --display-name=test-bb-user >> >>> > 2021-05-05 14:08:14.297 7f6942286840 1 robust_notify: If at first >> you >> >>> > don't succeed: (110) Connection timed out >> >>> > 2021-05-05 14:08:14.297 7f6942286840 0 ERROR: failed to distribute >> >>> cache >> >>> > for eu-central-1.rgw.users.uid:test-bb-user >> >>> > 2021-05-05 14:08:24.335 7f6942286840 1 robust_notify: If at first >> you >> >>> > don't succeed: (110) Connection timed out >> >>> > 2021-05-05 14:08:24.335 7f6942286840 0 ERROR: failed to distribute >> >>> cache >> >>> > for eu-central-1.rgw.users.keys: >> >>> > { >> >>> > "user_id": "test-bb-user", >> >>> > "display_name": "test-bb-user", >> >>> > >> >>> > } >> >>> > real 0m20.557s >> >>> > user 0m0.087s >> >>> > sys 0m0.030s >> >>> > >> >>> > First I thought that rados operations might be slow, but adding and >> >>> > deleting objects in rados are fast as usual (at least from my >> >>> perspective). >> >>> > Also uploading to buckets is fine. >> >>> > >> >>> > We changed some things and I think it might have to do with this: >> >>> > * We have a HAProxy that distributes via leastconn between the 3 >> >>> radosgw's >> >>> > (this did not change) >> >>> > * We had three times a daemon with the name "eu-central-1" running >> (on >> >>> the >> >>> > 3 radosgw's) >> >>> > * Because this might have led to our data duplication problem, we >> have >> >>> > split that up so now the daemons are named per host >> (eu-central-1-s3db1, >> >>> > eu-central-1-s3db2, eu-central-1-s3db3) >> >>> > * We also added dedicated rgw daemons for garbage collection, >> because >> >>> the >> >>> > current one were not able to keep up. >> >>> > * So basically ceph status went from "rgw: 1 daemon active >> >>> (eu-central-1)" >> >>> > to "rgw: 14 daemons active (eu-central-1-s3db1, eu-central-1-s3db2, >> >>> > eu-central-1-s3db3, gc-s3db12, gc-s3db13...) >> >>> > >> >>> > >> >>> > Cheers >> >>> > Boris >> >>> > >> >>> >> >>> >> >>> -- >> >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend >> im >> >>> groüen Saal. >> >>> ___ >> >>> ceph-users mailing list -- ceph-users@ceph.io >> >>> To unsubscribe send an email to ceph-users-le...@ceph.io >> >>> >> >> >> > >> >> -- >> Thomas Schneider >> IT.SERVICES >> Wissenschaftliche Informationsversorgung Ruhr-Universität Bochum | 44780 >> Bochum >> Telefon: +49 234 32 23939 >> http://www.it-services.rub.de/ >> > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)
Good call. I just restarted the whole cluster, but the problem still persists. I don't think it is a problem with the rados, but with the radosgw. But I still struggle to pin the issue. Am Di., 11. Mai 2021 um 10:45 Uhr schrieb Thomas Schneider < thomas.schneider-...@ruhr-uni-bochum.de>: > Hey all, > > we had slow RGW access when some OSDs were slow due to an (to us) unknown > OSD bug that made PG access either slow or impossible. (It showed itself > through slowness of the mgr as well, but nothing other than that). > We restarted all OSDs that held RGW data and the problem was gone. > I have no good way to debug the problem since it never occured again after > we restarted the OSDs. > > Kind regards, > Thomas > > > Am 11. Mai 2021 08:47:06 MESZ schrieb Boris Behrens : > >Hi Amit, > > > >I just pinged the mons from every system and they are all available. > > > >Am Mo., 10. Mai 2021 um 21:18 Uhr schrieb Amit Ghadge < > amitg@gmail.com>: > > > >> We seen slowness due to unreachable one of them mgr service, maybe here > >> are different, you can check monmap/ ceph.conf mon entry and then verify > >> all nodes are successfully ping. > >> > >> > >> -AmitG > >> > >> > >> On Tue, 11 May 2021 at 12:12 AM, Boris Behrens wrote: > >> > >>> Hi guys, > >>> > >>> does someone got any idea? > >>> > >>> Am Mi., 5. Mai 2021 um 16:16 Uhr schrieb Boris Behrens : > >>> > >>> > Hi, > >>> > since a couple of days we experience a strange slowness on some > >>> > radosgw-admin operations. > >>> > What is the best way to debug this? > >>> > > >>> > For example creating a user takes over 20s. > >>> > [root@s3db1 ~]# time radosgw-admin user create --uid test-bb-user > >>> > --display-name=test-bb-user > >>> > 2021-05-05 14:08:14.297 7f6942286840 1 robust_notify: If at first > you > >>> > don't succeed: (110) Connection timed out > >>> > 2021-05-05 14:08:14.297 7f6942286840 0 ERROR: failed to distribute > >>> cache > >>> > for eu-central-1.rgw.users.uid:test-bb-user > >>> > 2021-05-05 14:08:24.335 7f6942286840 1 robust_notify: If at first > you > >>> > don't succeed: (110) Connection timed out > >>> > 2021-05-05 14:08:24.335 7f6942286840 0 ERROR: failed to distribute > >>> cache > >>> > for eu-central-1.rgw.users.keys: > >>> > { > >>> > "user_id": "test-bb-user", > >>> > "display_name": "test-bb-user", > >>> > > >>> > } > >>> > real 0m20.557s > >>> > user 0m0.087s > >>> > sys 0m0.030s > >>> > > >>> > First I thought that rados operations might be slow, but adding and > >>> > deleting objects in rados are fast as usual (at least from my > >>> perspective). > >>> > Also uploading to buckets is fine. > >>> > > >>> > We changed some things and I think it might have to do with this: > >>> > * We have a HAProxy that distributes via leastconn between the 3 > >>> radosgw's > >>> > (this did not change) > >>> > * We had three times a daemon with the name "eu-central-1" running > (on > >>> the > >>> > 3 radosgw's) > >>> > * Because this might have led to our data duplication problem, we > have > >>> > split that up so now the daemons are named per host > (eu-central-1-s3db1, > >>> > eu-central-1-s3db2, eu-central-1-s3db3) > >>> > * We also added dedicated rgw daemons for garbage collection, because > >>> the > >>> > current one were not able to keep up. > >>> > * So basically ceph status went from "rgw: 1 daemon active > >>> (eu-central-1)" > >>> > to "rgw: 14 daemons active (eu-central-1-s3db1, eu-central-1-s3db2, > >>> > eu-central-1-s3db3, gc-s3db12, gc-s3db13...) > >>> > > >>> > > >>> > Cheers > >>> > Boris > >>> > > >>> > >>> > >>> -- > >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend > im > >>> groüen Saal. > >>> ___ > >>> ceph-users mailing list -- ceph-users@ceph.io > >>> To unsubscribe send an email to ceph-users-le...@ceph.io > >>> > >> > > > > -- > Thomas Schneider > IT.SERVICES > Wissenschaftliche Informationsversorgung Ruhr-Universität Bochum | 44780 > Bochum > Telefon: +49 234 32 23939 > http://www.it-services.rub.de/ > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)
Hi Amit, I just pinged the mons from every system and they are all available. Am Mo., 10. Mai 2021 um 21:18 Uhr schrieb Amit Ghadge : > We seen slowness due to unreachable one of them mgr service, maybe here > are different, you can check monmap/ ceph.conf mon entry and then verify > all nodes are successfully ping. > > > -AmitG > > > On Tue, 11 May 2021 at 12:12 AM, Boris Behrens wrote: > >> Hi guys, >> >> does someone got any idea? >> >> Am Mi., 5. Mai 2021 um 16:16 Uhr schrieb Boris Behrens : >> >> > Hi, >> > since a couple of days we experience a strange slowness on some >> > radosgw-admin operations. >> > What is the best way to debug this? >> > >> > For example creating a user takes over 20s. >> > [root@s3db1 ~]# time radosgw-admin user create --uid test-bb-user >> > --display-name=test-bb-user >> > 2021-05-05 14:08:14.297 7f6942286840 1 robust_notify: If at first you >> > don't succeed: (110) Connection timed out >> > 2021-05-05 14:08:14.297 7f6942286840 0 ERROR: failed to distribute >> cache >> > for eu-central-1.rgw.users.uid:test-bb-user >> > 2021-05-05 14:08:24.335 7f6942286840 1 robust_notify: If at first you >> > don't succeed: (110) Connection timed out >> > 2021-05-05 14:08:24.335 7f6942286840 0 ERROR: failed to distribute >> cache >> > for eu-central-1.rgw.users.keys: >> > { >> > "user_id": "test-bb-user", >> > "display_name": "test-bb-user", >> > >> > } >> > real 0m20.557s >> > user 0m0.087s >> > sys 0m0.030s >> > >> > First I thought that rados operations might be slow, but adding and >> > deleting objects in rados are fast as usual (at least from my >> perspective). >> > Also uploading to buckets is fine. >> > >> > We changed some things and I think it might have to do with this: >> > * We have a HAProxy that distributes via leastconn between the 3 >> radosgw's >> > (this did not change) >> > * We had three times a daemon with the name "eu-central-1" running (on >> the >> > 3 radosgw's) >> > * Because this might have led to our data duplication problem, we have >> > split that up so now the daemons are named per host (eu-central-1-s3db1, >> > eu-central-1-s3db2, eu-central-1-s3db3) >> > * We also added dedicated rgw daemons for garbage collection, because >> the >> > current one were not able to keep up. >> > * So basically ceph status went from "rgw: 1 daemon active >> (eu-central-1)" >> > to "rgw: 14 daemons active (eu-central-1-s3db1, eu-central-1-s3db2, >> > eu-central-1-s3db3, gc-s3db12, gc-s3db13...) >> > >> > >> > Cheers >> > Boris >> > >> >> >> -- >> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im >> groüen Saal. >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)
Hi guys, does someone got any idea? Am Mi., 5. Mai 2021 um 16:16 Uhr schrieb Boris Behrens : > Hi, > since a couple of days we experience a strange slowness on some > radosgw-admin operations. > What is the best way to debug this? > > For example creating a user takes over 20s. > [root@s3db1 ~]# time radosgw-admin user create --uid test-bb-user > --display-name=test-bb-user > 2021-05-05 14:08:14.297 7f6942286840 1 robust_notify: If at first you > don't succeed: (110) Connection timed out > 2021-05-05 14:08:14.297 7f6942286840 0 ERROR: failed to distribute cache > for eu-central-1.rgw.users.uid:test-bb-user > 2021-05-05 14:08:24.335 7f6942286840 1 robust_notify: If at first you > don't succeed: (110) Connection timed out > 2021-05-05 14:08:24.335 7f6942286840 0 ERROR: failed to distribute cache > for eu-central-1.rgw.users.keys: > { > "user_id": "test-bb-user", > "display_name": "test-bb-user", > > } > real 0m20.557s > user 0m0.087s > sys 0m0.030s > > First I thought that rados operations might be slow, but adding and > deleting objects in rados are fast as usual (at least from my perspective). > Also uploading to buckets is fine. > > We changed some things and I think it might have to do with this: > * We have a HAProxy that distributes via leastconn between the 3 radosgw's > (this did not change) > * We had three times a daemon with the name "eu-central-1" running (on the > 3 radosgw's) > * Because this might have led to our data duplication problem, we have > split that up so now the daemons are named per host (eu-central-1-s3db1, > eu-central-1-s3db2, eu-central-1-s3db3) > * We also added dedicated rgw daemons for garbage collection, because the > current one were not able to keep up. > * So basically ceph status went from "rgw: 1 daemon active (eu-central-1)" > to "rgw: 14 daemons active (eu-central-1-s3db1, eu-central-1-s3db2, > eu-central-1-s3db3, gc-s3db12, gc-s3db13...) > > > Cheers > Boris > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] radosgw-admin user create takes a long time (with failed to distribute cache message)
Hi, since a couple of days we experience a strange slowness on some radosgw-admin operations. What is the best way to debug this? For example creating a user takes over 20s. [root@s3db1 ~]# time radosgw-admin user create --uid test-bb-user --display-name=test-bb-user 2021-05-05 14:08:14.297 7f6942286840 1 robust_notify: If at first you don't succeed: (110) Connection timed out 2021-05-05 14:08:14.297 7f6942286840 0 ERROR: failed to distribute cache for eu-central-1.rgw.users.uid:test-bb-user 2021-05-05 14:08:24.335 7f6942286840 1 robust_notify: If at first you don't succeed: (110) Connection timed out 2021-05-05 14:08:24.335 7f6942286840 0 ERROR: failed to distribute cache for eu-central-1.rgw.users.keys: { "user_id": "test-bb-user", "display_name": "test-bb-user", } real 0m20.557s user 0m0.087s sys 0m0.030s First I thought that rados operations might be slow, but adding and deleting objects in rados are fast as usual (at least from my perspective). Also uploading to buckets is fine. We changed some things and I think it might have to do with this: * We have a HAProxy that distributes via leastconn between the 3 radosgw's (this did not change) * We had three times a daemon with the name "eu-central-1" running (on the 3 radosgw's) * Because this might have led to our data duplication problem, we have split that up so now the daemons are named per host (eu-central-1-s3db1, eu-central-1-s3db2, eu-central-1-s3db3) * We also added dedicated rgw daemons for garbage collection, because the current one were not able to keep up. * So basically ceph status went from "rgw: 1 daemon active (eu-central-1)" to "rgw: 14 daemons active (eu-central-1-s3db1, eu-central-1-s3db2, eu-central-1-s3db3, gc-s3db12, gc-s3db13...) Cheers Boris ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] global multipart lc policy in radosgw
Hi, I have a lot of multipart uploads that look like they never finished. Some of them date back to 2019. Is there a way to clean them up when they didn't finish in 28 days? I know I can implement a LC policy per bucket, but how do I implement it cluster wide? Cheers Boris -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: how to handle rgw leaked data (aka data that is not available via buckets but eats diskspace)
So, maybe somebody can answer me the following question: I have ~150m objects in the ceph cluster (ceph status shows (objects: 152.68M objects, 316 TiB). How can radosgw-admin bucket --bucket BUCKET radoslist create an output that is 252677729 and still growing? Am Di., 27. Apr. 2021 um 06:59 Uhr schrieb Boris Behrens : > Hi Anthony, > > yes we are using replication, the lost space is calculated before it's > replicated. > RAW STORAGE: > CLASS SIZEAVAIL USEDRAW USED %RAW USED > hdd 1.1 PiB 191 TiB 968 TiB 968 TiB 83.55 > TOTAL 1.1 PiB 191 TiB 968 TiB 968 TiB 83.55 > > POOLS: > POOLID PGS STORED > OBJECTS USED%USED MAX AVAIL > rbd 0 64 0 B > 0 0 B 013 TiB > .rgw.root1 64 99 KiB > 119 99 KiB 013 TiB > eu-central-1.rgw.control 2 64 0 B > 8 0 B 013 TiB > eu-central-1.rgw.data.root 3 64 947 KiB > 2.82k 947 KiB 013 TiB > eu-central-1.rgw.gc 4 64 101 MiB > 128 101 MiB 013 TiB > eu-central-1.rgw.log 5 64 267 MiB > 500 267 MiB 013 TiB > eu-central-1.rgw.users.uid 6 64 2.9 MiB > 6.91k 2.9 MiB 013 TiB > eu-central-1.rgw.users.keys 7 64 263 KiB > 6.73k 263 KiB 013 TiB > eu-central-1.rgw.meta8 64 384 KiB > 1k 384 KiB 013 TiB > eu-central-1.rgw.users.email 9 6440 B > 140 B 013 TiB > eu-central-1.rgw.buckets.index 10 64 10 GiB > 67.28k 10 GiB 0.0313 TiB > eu-central-1.rgw.buckets.data 11 2048 313 TiB > 151.71M 313 TiB 89.2513 TiB > ... > > EC profile is pretty standard > [root@s3db16 ~]# ceph osd erasure-code-profile ls > default > [root@s3db16 ~]# ceph osd erasure-code-profile get default > k=2 > m=1 > plugin=jerasure > technique=reed_sol_van > > We use mainly ceph 14.2.18. There is an OSD host with 14.2.19 and one with > 14.2.20 > > Object populations is mixed, but the most amount of data is in huge files. > We store our platforms RBD snapshots in it. > > Cheers > Boris > > > Am Di., 27. Apr. 2021 um 06:49 Uhr schrieb Anthony D'Atri < > anthony.da...@gmail.com>: > >> Are you using Replication? EC? How many copies / which profile? >> On which Ceph release were your OSDs built? BlueStore? Filestore? >> What is your RGW object population like? Lots of small objects? Mostly >> large objects? Average / median object size? >> >> > On Apr 26, 2021, at 9:32 PM, Boris Behrens wrote: >> > >> > HI, >> > >> > we still have the problem that our rgw eats more diskspace than it >> should. >> > Summing up the "size_kb_actual" of all buckets show only half of the >> used >> > diskspace. >> > >> > There are 312TiB stored acording to "ceph df" but we only need around >> 158TB. >> > >> > I've already wrote to this ML with the problem, but there were no >> solutions >> > that would help. >> > I've doug through the ML archive and found some interesting threads >> > regarding orphan objects and these kind of issues. >> > >> > Did someone ever solved this problem? >> > Or do you just add more disk space. >> > >> > we tried to: >> > * use the "radosgw-admin orphan find/finish" tool (didn't work) >> > * manually triggering the GC (didn't work) >> > >> > currently running (since yesterday evening): >> > * rgw-orphan-list, which procused 270GB of text output, and it's not >> done >> > yet (I have 60GB diskspace left) >> > >> > -- >> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im >> > groüen Saal. >> > ___ >> > ceph-users mailing list -- ceph-users@ceph.io >> > To unsubscribe send an email to ceph-users-le...@ceph.io >> >> > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > groüen Saal. > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: how to handle rgw leaked data (aka data that is not available via buckets but eats diskspace)
Hi Anthony, yes we are using replication, the lost space is calculated before it's replicated. RAW STORAGE: CLASS SIZEAVAIL USEDRAW USED %RAW USED hdd 1.1 PiB 191 TiB 968 TiB 968 TiB 83.55 TOTAL 1.1 PiB 191 TiB 968 TiB 968 TiB 83.55 POOLS: POOLID PGS STORED OBJECTS USED%USED MAX AVAIL rbd 0 64 0 B 0 0 B 013 TiB .rgw.root1 64 99 KiB 119 99 KiB 013 TiB eu-central-1.rgw.control 2 64 0 B 8 0 B 013 TiB eu-central-1.rgw.data.root 3 64 947 KiB 2.82k 947 KiB 013 TiB eu-central-1.rgw.gc 4 64 101 MiB 128 101 MiB 013 TiB eu-central-1.rgw.log 5 64 267 MiB 500 267 MiB 013 TiB eu-central-1.rgw.users.uid 6 64 2.9 MiB 6.91k 2.9 MiB 013 TiB eu-central-1.rgw.users.keys 7 64 263 KiB 6.73k 263 KiB 013 TiB eu-central-1.rgw.meta8 64 384 KiB 1k 384 KiB 013 TiB eu-central-1.rgw.users.email 9 6440 B 1 40 B 013 TiB eu-central-1.rgw.buckets.index 10 64 10 GiB 67.28k 10 GiB 0.0313 TiB eu-central-1.rgw.buckets.data 11 2048 313 TiB 151.71M 313 TiB 89.2513 TiB ... EC profile is pretty standard [root@s3db16 ~]# ceph osd erasure-code-profile ls default [root@s3db16 ~]# ceph osd erasure-code-profile get default k=2 m=1 plugin=jerasure technique=reed_sol_van We use mainly ceph 14.2.18. There is an OSD host with 14.2.19 and one with 14.2.20 Object populations is mixed, but the most amount of data is in huge files. We store our platforms RBD snapshots in it. Cheers Boris Am Di., 27. Apr. 2021 um 06:49 Uhr schrieb Anthony D'Atri < anthony.da...@gmail.com>: > Are you using Replication? EC? How many copies / which profile? > On which Ceph release were your OSDs built? BlueStore? Filestore? > What is your RGW object population like? Lots of small objects? Mostly > large objects? Average / median object size? > > > On Apr 26, 2021, at 9:32 PM, Boris Behrens wrote: > > > > HI, > > > > we still have the problem that our rgw eats more diskspace than it > should. > > Summing up the "size_kb_actual" of all buckets show only half of the used > > diskspace. > > > > There are 312TiB stored acording to "ceph df" but we only need around > 158TB. > > > > I've already wrote to this ML with the problem, but there were no > solutions > > that would help. > > I've doug through the ML archive and found some interesting threads > > regarding orphan objects and these kind of issues. > > > > Did someone ever solved this problem? > > Or do you just add more disk space. > > > > we tried to: > > * use the "radosgw-admin orphan find/finish" tool (didn't work) > > * manually triggering the GC (didn't work) > > > > currently running (since yesterday evening): > > * rgw-orphan-list, which procused 270GB of text output, and it's not done > > yet (I have 60GB diskspace left) > > > > -- > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > > groüen Saal. > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] how to handle rgw leaked data (aka data that is not available via buckets but eats diskspace)
HI, we still have the problem that our rgw eats more diskspace than it should. Summing up the "size_kb_actual" of all buckets show only half of the used diskspace. There are 312TiB stored acording to "ceph df" but we only need around 158TB. I've already wrote to this ML with the problem, but there were no solutions that would help. I've doug through the ML archive and found some interesting threads regarding orphan objects and these kind of issues. Did someone ever solved this problem? Or do you just add more disk space. we tried to: * use the "radosgw-admin orphan find/finish" tool (didn't work) * manually triggering the GC (didn't work) currently running (since yesterday evening): * rgw-orphan-list, which procused 270GB of text output, and it's not done yet (I have 60GB diskspace left) -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: rbd snap create now working and just hangs forever
Am Fr., 23. Apr. 2021 um 12:16 Uhr schrieb Ilya Dryomov : > On Fri, Apr 23, 2021 at 12:03 PM Boris Behrens wrote: > > > > > > > > Am Fr., 23. Apr. 2021 um 11:52 Uhr schrieb Ilya Dryomov < > idryo...@gmail.com>: > >> > >> > >> This snippet confirms my suspicion. Unfortunately without a verbose > >> log from that VM from three days ago (i.e. when it got into this state) > >> it's hard to tell what exactly went wrong. > >> > >> The problem is that the VM doesn't consider itself to be the rightful > >> owner of the lock and so when "rbd snap create" requests the lock from > >> it in order to make a snapshot, the VM just ignores the request because > >> even though it owns the lock, its record appears to be of sync. > >> > >> I'd suggest to kick it by restarting osd36. If the VM is active, it > >> should reacquire the lock and hopefully update its internal record as > >> expected. If "rbd snap create" still hangs after that, it would mean > >> that we have a reproducer and can gather logs on the VM side. > >> > >> What version of qemu/librbd and ceph is in use (both on the VM side and > >> on the side you are running "rbd snap create"? > >> > > I just stopped the OSD, waited some seconds and started it again. > > I still can't create snapshots. > > > > Ceph version is 14.2.18 accross the board > > qemu is 4.1.0-1 > > as we use krbd, the kernel version is 5.2.9-arch1-1-ARCH > > > > How can I gather more logs to debug it? > > Are you saying that this image is mapped and the lock is held by the > kernel client? It doesn't look that way from the logs you shared. > > Thanks, > > Ilya > We use krbd instead of librbd (at least this is what I think I know), but qemu is doing the kvm/rbd stuff. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: rbd snap create now working and just hangs forever
Am Fr., 23. Apr. 2021 um 11:52 Uhr schrieb Ilya Dryomov : > > This snippet confirms my suspicion. Unfortunately without a verbose > log from that VM from three days ago (i.e. when it got into this state) > it's hard to tell what exactly went wrong. > > The problem is that the VM doesn't consider itself to be the rightful > owner of the lock and so when "rbd snap create" requests the lock from > it in order to make a snapshot, the VM just ignores the request because > even though it owns the lock, its record appears to be of sync. > > I'd suggest to kick it by restarting osd36. If the VM is active, it > should reacquire the lock and hopefully update its internal record as > expected. If "rbd snap create" still hangs after that, it would mean > that we have a reproducer and can gather logs on the VM side. > > What version of qemu/librbd and ceph is in use (both on the VM side and > on the side you are running "rbd snap create"? > > I just stopped the OSD, waited some seconds and started it again. I still can't create snapshots. Ceph version is 14.2.18 accross the board qemu is 4.1.0-1 as we use krbd, the kernel version is 5.2.9-arch1-1-ARCH How can I gather more logs to debug it? -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: s3 requires twice the space it should use
So I am following the orphans trail. Now I have a job that is running since 3 1/2 days. Can I hit finish on a job that is in the comparing state? It is in this since 2 days and the messages in the output are repeating and look like this: leaked: ff7a8b0c-07e6-463a-861b-78f0adeba8ad.81095307.578__shadow_95a2980b-7012-43dd-81f2-07577cfcb9f0/25bc235b-3bf9-4db2-ace0-d149653bfd8b/e79909ed-e52e-4d16-a3a9-e84e332d37fa.lz4.2~YVwbe-JPoLioLOSEQtTwYOPt_wmUCHn.4310_2 This is the find job. I created it with: radosgw-admin orphans find --job-id bb-orphan-2021-04-19 --bucket=BUCKET --yes-i-really-mean-it --pool eu-central-1.rgw.buckets.data { "orphan_search_state": { "info": { "orphan_search_info": { "job_name": "bb-orphan-2021-04-19", "pool": "eu-central-1.rgw.buckets.data", "num_shards": 64, "start_time": "2021-04-19 16:42:45.993615Z" } }, "stage": { "orphan_search_stage": { "search_stage": "comparing", "shard": 0, "marker": "" } } } }, Am Fr., 16. Apr. 2021 um 10:57 Uhr schrieb Boris Behrens : > Could this also be failed multipart uploads? > > Am Do., 15. Apr. 2021 um 18:23 Uhr schrieb Boris Behrens : > >> Cheers, >> >> [root@s3db1 ~]# ceph daemon osd.23 perf dump | grep numpg >> "numpg": 187, >> "numpg_primary": 64, >> "numpg_replica": 121, >> "numpg_stray": 2, >> "numpg_removing": 0, >> >> >> Am Do., 15. Apr. 2021 um 18:18 Uhr schrieb 胡 玮文 : >> >>> Hi Boris, >>> >>> Could you check something like >>> >>> ceph daemon osd.23 perf dump | grep numpg >>> >>> to see if there are some stray or removing PG? >>> >>> Weiwen Hu >>> >>> > 在 2021年4月15日,22:53,Boris Behrens 写道: >>> > >>> > Ah you are right. >>> > [root@s3db1 ~]# ceph daemon osd.23 config get >>> bluestore_min_alloc_size_hdd >>> > { >>> >"bluestore_min_alloc_size_hdd": "65536" >>> > } >>> > But I also checked how many objects our s3 hold and the numbers just >>> do not >>> > add up. >>> > There are only 26509200 objects, which would result in around 1TB >>> "waste" >>> > if every object would be empty. >>> > >>> > I think the problem began when I updated the PG count from 1024 to >>> 2048. >>> > Could there be an issue where the data is written twice? >>> > >>> > >>> >> Am Do., 15. Apr. 2021 um 16:48 Uhr schrieb Amit Ghadge < >>> amitg@gmail.com >>> >>> : >>> >> >>> >> verify those two parameter values ,bluestore_min_alloc_size_hdd & >>> >> bluestore_min_alloc_size_sdd, If you are using hdd disk then >>> >> bluestore_min_alloc_size_hdd are applicable. >>> >> >>> >>> On Thu, Apr 15, 2021 at 8:06 PM Boris Behrens wrote: >>> >>> >>> >>> So, I need to live with it? A value of zero leads to use the default? >>> >>> [root@s3db1 ~]# ceph daemon osd.23 config get >>> bluestore_min_alloc_size >>> >>> { >>> >>>"bluestore_min_alloc_size": "0" >>> >>> } >>> >>> >>> >>> I also checked the fragmentation on the bluestore OSDs and it is >>> around >>> >>> 0.80 - 0.89 on most OSDs. yikes. >>> >>> [root@s3db1 ~]# ceph daemon osd.23 bluestore allocator score block >>> >>> { >>> >>>"fragmentation_rating": 0.85906054329923576 >>> >>> } >>> >>> >>> >>> The problem I currently have is, that I barely keep up with adding >>> OSD >>> >>> disks. >>> >>> >>> >>> Am Do., 15. Apr. 2021 um 16:18 Uhr schrieb Amit Ghadge < >>> >>> amitg@gmail.com>: >>> >>> >>> >>>> size_kb_actual are actually bucket object size but on OSD level the >>> >>>> bluestore_min_alloc_size default 64KB and SSD are 16KB >>> >>&g
[ceph-users] Re: rbd snap create now working and just hangs forever
Am Do., 22. Apr. 2021 um 17:27 Uhr schrieb Ilya Dryomov : > On Thu, Apr 22, 2021 at 5:08 PM Boris Behrens wrote: > > > > > > > > Am Do., 22. Apr. 2021 um 16:43 Uhr schrieb Ilya Dryomov < > idryo...@gmail.com>: > >> > >> On Thu, Apr 22, 2021 at 4:20 PM Boris Behrens wrote: > >> > > >> > Hi, > >> > > >> > I have a customer VM that is running fine, but I can not make > snapshots > >> > anymore. > >> > rbd snap create rbd/IMAGE@test-bb-1 > >> > just hangs forever. > >> > >> Hi Boris, > >> > >> Run > >> > >> $ rbd snap create rbd/IMAGE@test-bb-1 --debug-ms=1 --debug-rbd=20 > >> > >> let it hang for a few minutes and attach the output. > > > > > > I just pasted a short snip here: https://pastebin.com/B3Xgpbzd > > If you need more I can give it to you, but the output is very large. > > Paste the first couple thousand lines (i.e. from the very beginning), > that should be enough. > > sure: https://pastebin.com/GsKpLbqG good luck :) -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: rbd snap create now working and just hangs forever
Am Do., 22. Apr. 2021 um 16:43 Uhr schrieb Ilya Dryomov : > On Thu, Apr 22, 2021 at 4:20 PM Boris Behrens wrote: > > > > Hi, > > > > I have a customer VM that is running fine, but I can not make snapshots > > anymore. > > rbd snap create rbd/IMAGE@test-bb-1 > > just hangs forever. > > Hi Boris, > > Run > > $ rbd snap create rbd/IMAGE@test-bb-1 --debug-ms=1 --debug-rbd=20 > > let it hang for a few minutes and attach the output. > I just pasted a short snip here: https://pastebin.com/B3Xgpbzd If you need more I can give it to you, but the output is very large. > > > > > When I checked the status with > > rbd status rbd/IMAGE > > it shows one watcher, the cpu node where the VM is running. > > > > What can I do to investigate further, without restarting the VM. > > This is the only affected VM and it stopped working three days ago. > > Can you think of any event related to the cluster, that VM or the > VM fleet in general that occurred three days ago? > > We had an incident where the cpu nodes connected to the wrong cluster, but this VM was not affected IIRC. Cheers Boris -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] rbd snap create now working and just hangs forever
Hi, I have a customer VM that is running fine, but I can not make snapshots anymore. rbd snap create rbd/IMAGE@test-bb-1 just hangs forever. When I checked the status with rbd status rbd/IMAGE it shows one watcher, the cpu node where the VM is running. What can I do to investigate further, without restarting the VM. This is the only affected VM and it stopped working three days ago. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: [Suspicious newsletter] cleanup multipart in radosgw
Hi Istvan, both of them require bucket access, correct? Is there a way to add the LC policy globally? Cheers Boris Am Mo., 19. Apr. 2021 um 11:58 Uhr schrieb Szabo, Istvan (Agoda) < istvan.sz...@agoda.com>: > Hi, > > You have 2 ways: > > First is using s3vrowser app and in the menu select the multipart uploads > and clean it up. > The other is like this: > > Set lifecycle policy > On the client: > vim lifecyclepolicy > > http://s3.amazonaws.com/doc/2006-03-01/";> > > Incomplete Multipart > Uploads > > Enabled > > > > 1 > > > > > > /bin/s3cmd setlifecycle lifecyclepolicy s3://bucketname > On mon node process manually > radosgw-admin lc list > radosgw-admin lc process > > Istvan Szabo > Senior Infrastructure Engineer > --- > Agoda Services Co., Ltd. > e: istvan.sz...@agoda.com > --- > > -Original Message- > From: Boris Behrens > Sent: Monday, April 19, 2021 4:10 PM > To: ceph-users@ceph.io > Subject: [Suspicious newsletter] [ceph-users] cleanup multipart in radosgw > > Hi, > is there a way to remove multipart uploads that are older than X days? > > It doesn't need to be build into ceph or is automated to the end. Just > something I don't need to build on my own. > > I currently try to debug a problem where ceph reports a lot more used > space than it actually requires ( > https://www.mail-archive.com/ceph-users@ceph.io/msg09810.html). > > I came across a lot of old _multipart_ files in some buckets and now I > want to clean them up. > I don't know if this will fix my problem but I would love to rule that out. > > radosgw-admin bucket check --bucket=bucket --check-objects --fix does not > work because it is a shareded bucket. > > I have also some buckets that look like this, and contain 100% _multipart_ > files which are >2 years old: > "buckets": [ > { > "bucket": "ncprod", > "tenant": "", > "num_objects": -482, > "num_shards": 0, > "objects_per_shard": -482, > "fill_status": "OVER 180143985094819%" > } > ] > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > groüen Saal. > ___ > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an > email to ceph-users-le...@ceph.io > > > This message is confidential and is for the sole use of the intended > recipient(s). It may also be privileged or otherwise protected by copyright > or other legal rules. If you have received it by mistake please let us know > by reply email and delete it from your system. It is prohibited to copy > this message or disclose its content to anyone. Any confidentiality or > privilege is not waived or lost by any mistaken delivery or unauthorized > disclosure of the message. All messages sent to and from Agoda may be > monitored to ensure compliance with company policies, to protect the > company's interests and to remove potential malware. Electronic messages > may be intercepted, amended, lost or deleted, or contain viruses. > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] cleanup multipart in radosgw
Hi, is there a way to remove multipart uploads that are older than X days? It doesn't need to be build into ceph or is automated to the end. Just something I don't need to build on my own. I currently try to debug a problem where ceph reports a lot more used space than it actually requires ( https://www.mail-archive.com/ceph-users@ceph.io/msg09810.html). I came across a lot of old _multipart_ files in some buckets and now I want to clean them up. I don't know if this will fix my problem but I would love to rule that out. radosgw-admin bucket check --bucket=bucket --check-objects --fix does not work because it is a shareded bucket. I have also some buckets that look like this, and contain 100% _multipart_ files which are >2 years old: "buckets": [ { "bucket": "ncprod", "tenant": "", "num_objects": -482, "num_shards": 0, "objects_per_shard": -482, "fill_status": "OVER 180143985094819%" } ] -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: s3 requires twice the space it should use
Could this also be failed multipart uploads? Am Do., 15. Apr. 2021 um 18:23 Uhr schrieb Boris Behrens : > Cheers, > > [root@s3db1 ~]# ceph daemon osd.23 perf dump | grep numpg > "numpg": 187, > "numpg_primary": 64, > "numpg_replica": 121, > "numpg_stray": 2, > "numpg_removing": 0, > > > Am Do., 15. Apr. 2021 um 18:18 Uhr schrieb 胡 玮文 : > >> Hi Boris, >> >> Could you check something like >> >> ceph daemon osd.23 perf dump | grep numpg >> >> to see if there are some stray or removing PG? >> >> Weiwen Hu >> >> > 在 2021年4月15日,22:53,Boris Behrens 写道: >> > >> > Ah you are right. >> > [root@s3db1 ~]# ceph daemon osd.23 config get >> bluestore_min_alloc_size_hdd >> > { >> >"bluestore_min_alloc_size_hdd": "65536" >> > } >> > But I also checked how many objects our s3 hold and the numbers just do >> not >> > add up. >> > There are only 26509200 objects, which would result in around 1TB >> "waste" >> > if every object would be empty. >> > >> > I think the problem began when I updated the PG count from 1024 to 2048. >> > Could there be an issue where the data is written twice? >> > >> > >> >> Am Do., 15. Apr. 2021 um 16:48 Uhr schrieb Amit Ghadge < >> amitg@gmail.com >> >>> : >> >> >> >> verify those two parameter values ,bluestore_min_alloc_size_hdd & >> >> bluestore_min_alloc_size_sdd, If you are using hdd disk then >> >> bluestore_min_alloc_size_hdd are applicable. >> >> >> >>> On Thu, Apr 15, 2021 at 8:06 PM Boris Behrens wrote: >> >>> >> >>> So, I need to live with it? A value of zero leads to use the default? >> >>> [root@s3db1 ~]# ceph daemon osd.23 config get >> bluestore_min_alloc_size >> >>> { >> >>>"bluestore_min_alloc_size": "0" >> >>> } >> >>> >> >>> I also checked the fragmentation on the bluestore OSDs and it is >> around >> >>> 0.80 - 0.89 on most OSDs. yikes. >> >>> [root@s3db1 ~]# ceph daemon osd.23 bluestore allocator score block >> >>> { >> >>>"fragmentation_rating": 0.85906054329923576 >> >>> } >> >>> >> >>> The problem I currently have is, that I barely keep up with adding OSD >> >>> disks. >> >>> >> >>> Am Do., 15. Apr. 2021 um 16:18 Uhr schrieb Amit Ghadge < >> >>> amitg@gmail.com>: >> >>> >> >>>> size_kb_actual are actually bucket object size but on OSD level the >> >>>> bluestore_min_alloc_size default 64KB and SSD are 16KB >> >>>> >> >>>> >> >>>> >> https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faccess.redhat.com%2Fdocumentation%2Fen-us%2Fred_hat_ceph_storage%2F3%2Fhtml%2Fadministration_guide%2Fosd-bluestore&data=04%7C01%7C%7Cba98c0dff13941ea96ff08d9001e3759%7C84df9e7fe9f640afb435%7C1%7C0%7C637540952043049058%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wfSqqyiDHRXp4ypOGTxx4p%2Buy902OGPEmGkNfJ2BF6I%3D&reserved=0 >> >>>> >> >>>> -AmitG >> >>>> >> >>>> On Thu, Apr 15, 2021 at 7:29 PM Boris Behrens wrote: >> >>>> >> >>>>> Hi, >> >>>>> >> >>>>> maybe it is just a problem in my understanding, but it looks like >> our s3 >> >>>>> requires twice the space it should use. >> >>>>> >> >>>>> I ran "radosgw-admin bucket stats", and added all "size_kb_actual" >> >>>>> values >> >>>>> up and divided to TB (/1024/1024/1024). >> >>>>> The resulting space is 135,1636733 TB. When I tripple it because of >> >>>>> replication I end up with around 405TB which is nearly half the >> space of >> >>>>> what ceph df tells me. >> >>>>> >> >>>>> Hope someone can help me. >> >>>>> >> >>>>> ceph df shows >> >>>>> RAW STORAGE: >> >>>>>CLASS SIZE AVAIL USEDRAW USED %RAW &
[ceph-users] Re: s3 requires twice the space it should use
Cheers, [root@s3db1 ~]# ceph daemon osd.23 perf dump | grep numpg "numpg": 187, "numpg_primary": 64, "numpg_replica": 121, "numpg_stray": 2, "numpg_removing": 0, Am Do., 15. Apr. 2021 um 18:18 Uhr schrieb 胡 玮文 : > Hi Boris, > > Could you check something like > > ceph daemon osd.23 perf dump | grep numpg > > to see if there are some stray or removing PG? > > Weiwen Hu > > > 在 2021年4月15日,22:53,Boris Behrens 写道: > > > > Ah you are right. > > [root@s3db1 ~]# ceph daemon osd.23 config get > bluestore_min_alloc_size_hdd > > { > >"bluestore_min_alloc_size_hdd": "65536" > > } > > But I also checked how many objects our s3 hold and the numbers just do > not > > add up. > > There are only 26509200 objects, which would result in around 1TB "waste" > > if every object would be empty. > > > > I think the problem began when I updated the PG count from 1024 to 2048. > > Could there be an issue where the data is written twice? > > > > > >> Am Do., 15. Apr. 2021 um 16:48 Uhr schrieb Amit Ghadge < > amitg@gmail.com > >>> : > >> > >> verify those two parameter values ,bluestore_min_alloc_size_hdd & > >> bluestore_min_alloc_size_sdd, If you are using hdd disk then > >> bluestore_min_alloc_size_hdd are applicable. > >> > >>> On Thu, Apr 15, 2021 at 8:06 PM Boris Behrens wrote: > >>> > >>> So, I need to live with it? A value of zero leads to use the default? > >>> [root@s3db1 ~]# ceph daemon osd.23 config get bluestore_min_alloc_size > >>> { > >>>"bluestore_min_alloc_size": "0" > >>> } > >>> > >>> I also checked the fragmentation on the bluestore OSDs and it is around > >>> 0.80 - 0.89 on most OSDs. yikes. > >>> [root@s3db1 ~]# ceph daemon osd.23 bluestore allocator score block > >>> { > >>>"fragmentation_rating": 0.85906054329923576 > >>> } > >>> > >>> The problem I currently have is, that I barely keep up with adding OSD > >>> disks. > >>> > >>> Am Do., 15. Apr. 2021 um 16:18 Uhr schrieb Amit Ghadge < > >>> amitg@gmail.com>: > >>> > >>>> size_kb_actual are actually bucket object size but on OSD level the > >>>> bluestore_min_alloc_size default 64KB and SSD are 16KB > >>>> > >>>> > >>>> > https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faccess.redhat.com%2Fdocumentation%2Fen-us%2Fred_hat_ceph_storage%2F3%2Fhtml%2Fadministration_guide%2Fosd-bluestore&data=04%7C01%7C%7Cba98c0dff13941ea96ff08d9001e3759%7C84df9e7fe9f640afb435%7C1%7C0%7C637540952043049058%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wfSqqyiDHRXp4ypOGTxx4p%2Buy902OGPEmGkNfJ2BF6I%3D&reserved=0 > >>>> > >>>> -AmitG > >>>> > >>>> On Thu, Apr 15, 2021 at 7:29 PM Boris Behrens wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> maybe it is just a problem in my understanding, but it looks like > our s3 > >>>>> requires twice the space it should use. > >>>>> > >>>>> I ran "radosgw-admin bucket stats", and added all "size_kb_actual" > >>>>> values > >>>>> up and divided to TB (/1024/1024/1024). > >>>>> The resulting space is 135,1636733 TB. When I tripple it because of > >>>>> replication I end up with around 405TB which is nearly half the > space of > >>>>> what ceph df tells me. > >>>>> > >>>>> Hope someone can help me. > >>>>> > >>>>> ceph df shows > >>>>> RAW STORAGE: > >>>>>CLASS SIZE AVAIL USEDRAW USED %RAW > >>>>> USED > >>>>>hdd 1009 TiB 189 TiB 820 TiB 820 TiB > >>>>> 81.26 > >>>>>TOTAL 1009 TiB 189 TiB 820 TiB 820 TiB > >>>>> 81.26 > >>>>> > >>>>> POOLS: > >>>>>POOLID PGS STORED > >>>>> OBJECTS > >>>>>USED%USED MAX AVAIL > >>>>>
[ceph-users] Re: s3 requires twice the space it should use
Ah you are right. [root@s3db1 ~]# ceph daemon osd.23 config get bluestore_min_alloc_size_hdd { "bluestore_min_alloc_size_hdd": "65536" } But I also checked how many objects our s3 hold and the numbers just do not add up. There are only 26509200 objects, which would result in around 1TB "waste" if every object would be empty. I think the problem began when I updated the PG count from 1024 to 2048. Could there be an issue where the data is written twice? Am Do., 15. Apr. 2021 um 16:48 Uhr schrieb Amit Ghadge : > verify those two parameter values ,bluestore_min_alloc_size_hdd & > bluestore_min_alloc_size_sdd, If you are using hdd disk then > bluestore_min_alloc_size_hdd are applicable. > > On Thu, Apr 15, 2021 at 8:06 PM Boris Behrens wrote: > >> So, I need to live with it? A value of zero leads to use the default? >> [root@s3db1 ~]# ceph daemon osd.23 config get bluestore_min_alloc_size >> { >> "bluestore_min_alloc_size": "0" >> } >> >> I also checked the fragmentation on the bluestore OSDs and it is around >> 0.80 - 0.89 on most OSDs. yikes. >> [root@s3db1 ~]# ceph daemon osd.23 bluestore allocator score block >> { >> "fragmentation_rating": 0.85906054329923576 >> } >> >> The problem I currently have is, that I barely keep up with adding OSD >> disks. >> >> Am Do., 15. Apr. 2021 um 16:18 Uhr schrieb Amit Ghadge < >> amitg@gmail.com>: >> >>> size_kb_actual are actually bucket object size but on OSD level the >>> bluestore_min_alloc_size default 64KB and SSD are 16KB >>> >>> >>> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/administration_guide/osd-bluestore >>> >>> -AmitG >>> >>> On Thu, Apr 15, 2021 at 7:29 PM Boris Behrens wrote: >>> >>>> Hi, >>>> >>>> maybe it is just a problem in my understanding, but it looks like our s3 >>>> requires twice the space it should use. >>>> >>>> I ran "radosgw-admin bucket stats", and added all "size_kb_actual" >>>> values >>>> up and divided to TB (/1024/1024/1024). >>>> The resulting space is 135,1636733 TB. When I tripple it because of >>>> replication I end up with around 405TB which is nearly half the space of >>>> what ceph df tells me. >>>> >>>> Hope someone can help me. >>>> >>>> ceph df shows >>>> RAW STORAGE: >>>> CLASS SIZE AVAIL USEDRAW USED %RAW >>>> USED >>>> hdd 1009 TiB 189 TiB 820 TiB 820 TiB >>>> 81.26 >>>> TOTAL 1009 TiB 189 TiB 820 TiB 820 TiB >>>> 81.26 >>>> >>>> POOLS: >>>> POOLID PGS STORED >>>> OBJECTS >>>> USED%USED MAX AVAIL >>>> rbd 0 64 0 B >>>>0 >>>> 0 B 018 TiB >>>> .rgw.root1 64 99 KiB >>>> 119 >>>> 99 KiB 018 TiB >>>> eu-central-1.rgw.control 2 64 0 B >>>>8 >>>> 0 B 018 TiB >>>> eu-central-1.rgw.data.root 3 64 1.0 MiB >>>> 3.15k >>>> 1.0 MiB 018 TiB >>>> eu-central-1.rgw.gc 4 64 71 MiB >>>> 32 >>>> 71 MiB 018 TiB >>>> eu-central-1.rgw.log 5 64 267 MiB >>>> 564 >>>> 267 MiB 018 TiB >>>> eu-central-1.rgw.users.uid 6 64 2.8 MiB >>>> 6.91k >>>> 2.8 MiB 018 TiB >>>> eu-central-1.rgw.users.keys 7 64 263 KiB >>>> 6.73k >>>> 263 KiB 018 TiB >>>> eu-central-1.rgw.meta8 64 384 KiB >>>> 1k >>>> 384 KiB 018 TiB >>>> eu-central-1.rgw.users.email 9 6440 B >>>>1 >>>>40 B 018 TiB >>>> eu-central-1.rgw.buckets.index 10 64 10 GiB >>>> 67.61k >>>> 10 GiB 0.0218 TiB >>>
[ceph-users] Re: s3 requires twice the space it should use
So, I need to live with it? A value of zero leads to use the default? [root@s3db1 ~]# ceph daemon osd.23 config get bluestore_min_alloc_size { "bluestore_min_alloc_size": "0" } I also checked the fragmentation on the bluestore OSDs and it is around 0.80 - 0.89 on most OSDs. yikes. [root@s3db1 ~]# ceph daemon osd.23 bluestore allocator score block { "fragmentation_rating": 0.85906054329923576 } The problem I currently have is, that I barely keep up with adding OSD disks. Am Do., 15. Apr. 2021 um 16:18 Uhr schrieb Amit Ghadge : > size_kb_actual are actually bucket object size but on OSD level the > bluestore_min_alloc_size default 64KB and SSD are 16KB > > > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/administration_guide/osd-bluestore > > -AmitG > > On Thu, Apr 15, 2021 at 7:29 PM Boris Behrens wrote: > >> Hi, >> >> maybe it is just a problem in my understanding, but it looks like our s3 >> requires twice the space it should use. >> >> I ran "radosgw-admin bucket stats", and added all "size_kb_actual" values >> up and divided to TB (/1024/1024/1024). >> The resulting space is 135,1636733 TB. When I tripple it because of >> replication I end up with around 405TB which is nearly half the space of >> what ceph df tells me. >> >> Hope someone can help me. >> >> ceph df shows >> RAW STORAGE: >> CLASS SIZE AVAIL USEDRAW USED %RAW USED >> hdd 1009 TiB 189 TiB 820 TiB 820 TiB 81.26 >> TOTAL 1009 TiB 189 TiB 820 TiB 820 TiB 81.26 >> >> POOLS: >> POOLID PGS STORED >> OBJECTS >> USED%USED MAX AVAIL >> rbd 0 64 0 B >> 0 >> 0 B 018 TiB >> .rgw.root1 64 99 KiB >> 119 >> 99 KiB 018 TiB >> eu-central-1.rgw.control 2 64 0 B >> 8 >> 0 B 018 TiB >> eu-central-1.rgw.data.root 3 64 1.0 MiB >> 3.15k >> 1.0 MiB 018 TiB >> eu-central-1.rgw.gc 4 64 71 MiB >> 32 >> 71 MiB 018 TiB >> eu-central-1.rgw.log 5 64 267 MiB >> 564 >> 267 MiB 018 TiB >> eu-central-1.rgw.users.uid 6 64 2.8 MiB >> 6.91k >> 2.8 MiB 018 TiB >> eu-central-1.rgw.users.keys 7 64 263 KiB >> 6.73k >> 263 KiB 018 TiB >> eu-central-1.rgw.meta8 64 384 KiB >> 1k >> 384 KiB 018 TiB >> eu-central-1.rgw.users.email 9 6440 B >> 1 >>40 B 018 TiB >> eu-central-1.rgw.buckets.index 10 64 10 GiB >> 67.61k >> 10 GiB 0.0218 TiB >> eu-central-1.rgw.buckets.data 11 2048 264 TiB >> 138.31M >> 264 TiB 83.3718 TiB >> eu-central-1.rgw.buckets.non-ec 12 64 297 MiB >> 11.32k >> 297 MiB 018 TiB >> eu-central-1.rgw.usage 13 64 536 MiB >> 32 >> 536 MiB 018 TiB >> eu-msg-1.rgw.control56 64 0 B >> 8 >> 0 B 018 TiB >> eu-msg-1.rgw.data.root 57 64 72 KiB >> 227 >> 72 KiB 018 TiB >> eu-msg-1.rgw.gc 58 64 300 KiB >> 32 >> 300 KiB 018 TiB >> eu-msg-1.rgw.log59 64 835 KiB >> 242 >> 835 KiB 018 TiB >> eu-msg-1.rgw.users.uid 60 64 56 KiB >> 104 >> 56 KiB 018 TiB >> eu-msg-1.rgw.usage 61 64 37 MiB >> 25 >> 37 MiB 018 TiB >> eu-msg-1.rgw.users.keys 62 64 3.8 KiB >> 97 >> 3.8 KiB 018 TiB >> eu-msg-1.rgw.meta 63 64 607 KiB >> 1.60k >> 607 KiB 018 TiB >> eu-msg-1.rgw.buckets.index 64 64 71 MiB >> 119 >> 71 MiB 018 TiB >> eu-msg-1.rgw.users
[ceph-users] s3 requires twice the space it should use
Hi, maybe it is just a problem in my understanding, but it looks like our s3 requires twice the space it should use. I ran "radosgw-admin bucket stats", and added all "size_kb_actual" values up and divided to TB (/1024/1024/1024). The resulting space is 135,1636733 TB. When I tripple it because of replication I end up with around 405TB which is nearly half the space of what ceph df tells me. Hope someone can help me. ceph df shows RAW STORAGE: CLASS SIZE AVAIL USEDRAW USED %RAW USED hdd 1009 TiB 189 TiB 820 TiB 820 TiB 81.26 TOTAL 1009 TiB 189 TiB 820 TiB 820 TiB 81.26 POOLS: POOLID PGS STORED OBJECTS USED%USED MAX AVAIL rbd 0 64 0 B 0 0 B 018 TiB .rgw.root1 64 99 KiB 119 99 KiB 018 TiB eu-central-1.rgw.control 2 64 0 B 8 0 B 018 TiB eu-central-1.rgw.data.root 3 64 1.0 MiB 3.15k 1.0 MiB 018 TiB eu-central-1.rgw.gc 4 64 71 MiB 32 71 MiB 018 TiB eu-central-1.rgw.log 5 64 267 MiB 564 267 MiB 018 TiB eu-central-1.rgw.users.uid 6 64 2.8 MiB 6.91k 2.8 MiB 018 TiB eu-central-1.rgw.users.keys 7 64 263 KiB 6.73k 263 KiB 018 TiB eu-central-1.rgw.meta8 64 384 KiB 1k 384 KiB 018 TiB eu-central-1.rgw.users.email 9 6440 B 1 40 B 018 TiB eu-central-1.rgw.buckets.index 10 64 10 GiB 67.61k 10 GiB 0.0218 TiB eu-central-1.rgw.buckets.data 11 2048 264 TiB 138.31M 264 TiB 83.3718 TiB eu-central-1.rgw.buckets.non-ec 12 64 297 MiB 11.32k 297 MiB 018 TiB eu-central-1.rgw.usage 13 64 536 MiB 32 536 MiB 018 TiB eu-msg-1.rgw.control56 64 0 B 8 0 B 018 TiB eu-msg-1.rgw.data.root 57 64 72 KiB 227 72 KiB 018 TiB eu-msg-1.rgw.gc 58 64 300 KiB 32 300 KiB 018 TiB eu-msg-1.rgw.log59 64 835 KiB 242 835 KiB 018 TiB eu-msg-1.rgw.users.uid 60 64 56 KiB 104 56 KiB 018 TiB eu-msg-1.rgw.usage 61 64 37 MiB 25 37 MiB 018 TiB eu-msg-1.rgw.users.keys 62 64 3.8 KiB 97 3.8 KiB 018 TiB eu-msg-1.rgw.meta 63 64 607 KiB 1.60k 607 KiB 018 TiB eu-msg-1.rgw.buckets.index 64 64 71 MiB 119 71 MiB 018 TiB eu-msg-1.rgw.users.email65 64 0 B 0 0 B 018 TiB eu-msg-1.rgw.buckets.data 66 64 2.9 TiB 1.16M 2.9 TiB 5.3018 TiB eu-msg-1.rgw.buckets.non-ec 67 64 2.2 MiB 354 2.2 MiB 018 TiB default.rgw.control 69 32 0 B 8 0 B 018 TiB default.rgw.data.root 70 32 0 B 0 0 B 018 TiB default.rgw.gc 71 32 0 B 0 0 B 018 TiB default.rgw.log 72 32 0 B 0 0 B 018 TiB default.rgw.users.uid 73 32 0 B 0 0 B 018 TiB fra-1.rgw.control 74 32 0 B 8 0 B 018 TiB fra-1.rgw.meta 75 32 0 B 0 0 B 018 TiB fra-1.rgw.log 76 3250 B 28 50 B 018 TiB -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: should I increase the amount of PGs?
I raised the backfillfull_ratio to .91 to see what happens, now I am waiting. Some OSDs were around 89-91%, some are around 50-60% The pgp_num is on 1946 since one week. I think this will solve itself, when the cluster becomes a bit more tidy. Am Di., 30. März 2021 um 15:23 Uhr schrieb Dan van der Ster < d...@vanderster.com>: > You started with 1024 PGs, and are splitting to 2048. > Currently there are 1946 PGs used .. so it is nearly there at the goal. > > You need to watch that value 1946 and see if it increases slowly. If > it does not increase, then those backfill_toofull PGs are probably > splitting PGs, and they are blocked by not having enough free space. > > To solve that free space problem, you could either increase the > backfillfull_ratio like we discussed earlier, or add capacity. > I prefer the former, if the OSDs are just over the 90% default limit. > > -- dan > > On Tue, Mar 30, 2021 at 3:18 PM Boris Behrens wrote: > > > > The output from ceph osd pool ls detail tell me nothing, except that the > pgp_num is not where it should be. Can you help me to read the output? How > do I estimate how long the split will take? > > > > [root@s3db1 ~]# ceph status > > cluster: > > id: dca79fff-ffd0-58f4-1cff-82a2feea05f4 > > health: HEALTH_WARN > > noscrub,nodeep-scrub flag(s) set > > 10 backfillfull osd(s) > > 19 nearfull osd(s) > > 37 pool(s) backfillfull > > BlueFS spillover detected on 1 OSD(s) > > 13 large omap objects > > Low space hindering backfill (add storage if this doesn't > resolve itself): 234 pgs backfill_toofull > > ... > > data: > > pools: 37 pools, 4032 pgs > > objects: 121.40M objects, 199 TiB > > usage: 627 TiB used, 169 TiB / 795 TiB avail > > pgs: 45263471/364213596 objects misplaced (12.428%) > > 3719 active+clean > > 209 active+remapped+backfill_wait+backfill_toofull > > 59 active+remapped+backfill_wait > > 24 active+remapped+backfill_toofull > > 20 active+remapped+backfilling > > 1active+remapped+forced_backfill+backfill_toofull > > > > io: > > client: 8.4 MiB/s rd, 127 MiB/s wr, 208 op/s rd, 163 op/s wr > > recovery: 276 MiB/s, 164 objects/s > > > > [root@s3db1 ~]# ceph osd pool ls detail > > ... > > pool 10 'eu-central-1.rgw.buckets.index' replicated size 3 min_size 1 > crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn > last_change 320966 lfor 0/193276/306366 flags hashpspool,backfillfull > stripe_width 0 application rgw > > pool 11 'eu-central-1.rgw.buckets.data' replicated size 3 min_size 2 > crush_rule 0 object_hash rjenkins pg_num 2048 pgp_num 1946 pgp_num_target > 2048 autoscale_mode warn last_change 320966 lfor 0/263549/317774 flags > hashpspool,backfillfull stripe_width 0 application rgw > > ... > > > > Am Di., 30. März 2021 um 15:07 Uhr schrieb Dan van der Ster < > d...@vanderster.com>: > >> > >> It would be safe to turn off the balancer, yes go ahead. > >> > >> To know if adding more hardware will help, we need to see how much > >> longer this current splitting should take. This will help: > >> > >> ceph status > >> ceph osd pool ls detail > >> > >> -- dan > >> > >> On Tue, Mar 30, 2021 at 3:00 PM Boris Behrens wrote: > >> > > >> > I would think due to splitting, because the balancer doesn't refuses > it's work, because to many misplaced objects. > >> > I also think to turn it off for now, so it doesn't begin it's work at > 5% missplaced objects. > >> > > >> > Would adding more hardware help? We wanted to insert another OSD node > with 7x8TB disks anyway, but postponed it due to the rebalancing. > >> > > >> > Am Di., 30. März 2021 um 14:23 Uhr schrieb Dan van der Ster < > d...@vanderster.com>: > >> >> > >> >> Are those PGs backfilling due to splitting or due to balancing? > >> >> If it's the former, I don't think there's a way to pause them with > >> >> upmap or any other trick. > >> >> > >> >> -- dan > >> >> > >> >> On Tue, Mar 30, 2021 at 2:07 PM Boris Behrens wrote: > >> >> > > >> >> > One week later the ceph is still balancing. > >> >> > What worries me like hel
[ceph-users] Re: should I increase the amount of PGs?
The output from ceph osd pool ls detail tell me nothing, except that the pgp_num is not where it should be. Can you help me to read the output? How do I estimate how long the split will take? [root@s3db1 ~]# ceph status cluster: id: dca79fff-ffd0-58f4-1cff-82a2feea05f4 health: HEALTH_WARN noscrub,nodeep-scrub flag(s) set 10 backfillfull osd(s) 19 nearfull osd(s) 37 pool(s) backfillfull BlueFS spillover detected on 1 OSD(s) 13 large omap objects Low space hindering backfill (add storage if this doesn't resolve itself): 234 pgs backfill_toofull ... data: pools: 37 pools, 4032 pgs objects: 121.40M objects, 199 TiB usage: 627 TiB used, 169 TiB / 795 TiB avail pgs: 45263471/364213596 objects misplaced (12.428%) 3719 active+clean 209 active+remapped+backfill_wait+backfill_toofull 59 active+remapped+backfill_wait 24 active+remapped+backfill_toofull 20 active+remapped+backfilling 1active+remapped+forced_backfill+backfill_toofull io: client: 8.4 MiB/s rd, 127 MiB/s wr, 208 op/s rd, 163 op/s wr recovery: 276 MiB/s, 164 objects/s [root@s3db1 ~]# ceph osd pool ls detail ... pool 10 'eu-central-1.rgw.buckets.index' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 320966 lfor 0/193276/306366 flags hashpspool,backfillfull stripe_width 0 application rgw pool 11 'eu-central-1.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 2048 pgp_num 1946 pgp_num_target 2048 autoscale_mode warn last_change 320966 lfor 0/263549/317774 flags hashpspool,backfillfull stripe_width 0 application rgw ... Am Di., 30. März 2021 um 15:07 Uhr schrieb Dan van der Ster < d...@vanderster.com>: > It would be safe to turn off the balancer, yes go ahead. > > To know if adding more hardware will help, we need to see how much > longer this current splitting should take. This will help: > > ceph status > ceph osd pool ls detail > > -- dan > > On Tue, Mar 30, 2021 at 3:00 PM Boris Behrens wrote: > > > > I would think due to splitting, because the balancer doesn't refuses > it's work, because to many misplaced objects. > > I also think to turn it off for now, so it doesn't begin it's work at 5% > missplaced objects. > > > > Would adding more hardware help? We wanted to insert another OSD node > with 7x8TB disks anyway, but postponed it due to the rebalancing. > > > > Am Di., 30. März 2021 um 14:23 Uhr schrieb Dan van der Ster < > d...@vanderster.com>: > >> > >> Are those PGs backfilling due to splitting or due to balancing? > >> If it's the former, I don't think there's a way to pause them with > >> upmap or any other trick. > >> > >> -- dan > >> > >> On Tue, Mar 30, 2021 at 2:07 PM Boris Behrens wrote: > >> > > >> > One week later the ceph is still balancing. > >> > What worries me like hell is the %USE on a lot of those OSDs. Does > ceph > >> > resolv this on it's own? We are currently down to 5TB space in the > cluster. > >> > Rebalancing single OSDs doesn't work well and it increases the > "missplaced > >> > objects". > >> > > >> > I thought about letting upmap do some rebalancing. Anyone know if > this is a > >> > good idea? Or if I should bite my nails an wait as I am the headache > of my > >> > life. > >> > [root@s3db1 ~]# ceph osd getmap -o om; osdmaptool om --upmap out.txt > >> > --upmap-pool eu-central-1.rgw.buckets.data --upmap-max 10; cat out.txt > >> > got osdmap epoch 321975 > >> > osdmaptool: osdmap file 'om' > >> > writing upmap command output to: out.txt > >> > checking for upmap cleanups > >> > upmap, max-count 10, max deviation 5 > >> > limiting to pools eu-central-1.rgw.buckets.data ([11]) > >> > pools eu-central-1.rgw.buckets.data > >> > prepared 10/10 changes > >> > ceph osd rm-pg-upmap-items 11.209 > >> > ceph osd rm-pg-upmap-items 11.253 > >> > ceph osd pg-upmap-items 11.7f 79 88 > >> > ceph osd pg-upmap-items 11.fc 53 31 105 78 > >> > ceph osd pg-upmap-items 11.1d8 84 50 > >> > ceph osd pg-upmap-items 11.47f 94 86 > >> > ceph osd pg-upmap-items 11.49c 44 71 > >> > ceph osd pg-upmap-items 11.553 74 50 > >> > ceph osd pg-upmap-items 11.6c3 66 63 > >> > ceph osd pg-u
[ceph-users] Re: forceful remap PGs
I reweighted the OSD to .0 and then forced the backfilling. How long does it take for ceph to free up space? I looks like it was doing this, but it could also be the "backup cleanup job" that removed images from the buckets. Am Di., 30. März 2021 um 14:41 Uhr schrieb Stefan Kooman : > On 3/30/21 12:55 PM, Boris Behrens wrote: > > I just move one PG away from the OSD, but the diskspace will not get > freed. > > How did you move? I would suggest you use upmap: > > ceph osd pg-upmap-items > Invalid command: missing required parameter pgid() > osd pg-upmap-items [ (id|osd.id)>...] : set pg_upmap_items mapping :{ to , > [...]} (developers only) > > > So you specify which PG has to move to which OSD. > > > Do I need to do something to clean obsolete objects from the osd? > > No. The OSD will trim PG data that is not needed anymore. > > Gr. Stefan > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: should I increase the amount of PGs?
I would think due to splitting, because the balancer doesn't refuses it's work, because to many misplaced objects. I also think to turn it off for now, so it doesn't begin it's work at 5% missplaced objects. Would adding more hardware help? We wanted to insert another OSD node with 7x8TB disks anyway, but postponed it due to the rebalancing. Am Di., 30. März 2021 um 14:23 Uhr schrieb Dan van der Ster < d...@vanderster.com>: > Are those PGs backfilling due to splitting or due to balancing? > If it's the former, I don't think there's a way to pause them with > upmap or any other trick. > > -- dan > > On Tue, Mar 30, 2021 at 2:07 PM Boris Behrens wrote: > > > > One week later the ceph is still balancing. > > What worries me like hell is the %USE on a lot of those OSDs. Does ceph > > resolv this on it's own? We are currently down to 5TB space in the > cluster. > > Rebalancing single OSDs doesn't work well and it increases the > "missplaced > > objects". > > > > I thought about letting upmap do some rebalancing. Anyone know if this > is a > > good idea? Or if I should bite my nails an wait as I am the headache of > my > > life. > > [root@s3db1 ~]# ceph osd getmap -o om; osdmaptool om --upmap out.txt > > --upmap-pool eu-central-1.rgw.buckets.data --upmap-max 10; cat out.txt > > got osdmap epoch 321975 > > osdmaptool: osdmap file 'om' > > writing upmap command output to: out.txt > > checking for upmap cleanups > > upmap, max-count 10, max deviation 5 > > limiting to pools eu-central-1.rgw.buckets.data ([11]) > > pools eu-central-1.rgw.buckets.data > > prepared 10/10 changes > > ceph osd rm-pg-upmap-items 11.209 > > ceph osd rm-pg-upmap-items 11.253 > > ceph osd pg-upmap-items 11.7f 79 88 > > ceph osd pg-upmap-items 11.fc 53 31 105 78 > > ceph osd pg-upmap-items 11.1d8 84 50 > > ceph osd pg-upmap-items 11.47f 94 86 > > ceph osd pg-upmap-items 11.49c 44 71 > > ceph osd pg-upmap-items 11.553 74 50 > > ceph osd pg-upmap-items 11.6c3 66 63 > > ceph osd pg-upmap-items 11.7ad 43 50 > > > > ID CLASS WEIGHTREWEIGHT SIZERAW USE DATA OMAP META > > AVAIL%USE VAR PGS STATUS TYPE NAME > > -1 795.42548- 795 TiB 626 TiB 587 TiB 82 GiB 1.4 TiB > 170 > > TiB 78.64 1.00 -root default > > 56 hdd 7.32619 1.0 7.3 TiB 6.4 TiB 6.4 TiB 684 MiB 16 GiB > 910 > > GiB 87.87 1.12 129 up osd.56 > > 67 hdd 7.27739 1.0 7.3 TiB 6.4 TiB 6.4 TiB 582 MiB 16 GiB > 865 > > GiB 88.40 1.12 115 up osd.67 > > 79 hdd 3.63689 1.0 3.6 TiB 3.2 TiB 432 GiB 1.9 GiB 0 B > 432 > > GiB 88.40 1.12 63 up osd.79 > > 53 hdd 7.32619 1.0 7.3 TiB 6.5 TiB 6.4 TiB 971 MiB 22 GiB > 864 > > GiB 88.48 1.13 114 up osd.53 > > 51 hdd 7.27739 1.0 7.3 TiB 6.5 TiB 6.4 TiB 734 MiB 15 GiB > 837 > > GiB 88.77 1.13 120 up osd.51 > > 73 hdd 14.55269 1.0 15 TiB 13 TiB 13 TiB 1.8 GiB 39 GiB > 1.6 > > TiB 88.97 1.13 246 up osd.73 > > 55 hdd 7.32619 1.0 7.3 TiB 6.5 TiB 6.5 TiB 259 MiB 15 GiB > 825 > > GiB 89.01 1.13 118 up osd.55 > > 70 hdd 7.27739 1.0 7.3 TiB 6.5 TiB 6.5 TiB 291 MiB 16 GiB > 787 > > GiB 89.44 1.14 119 up osd.70 > > 42 hdd 3.73630 1.0 3.7 TiB 3.4 TiB 3.3 TiB 685 MiB 8.2 GiB > 374 > > GiB 90.23 1.15 60 up osd.42 > > 94 hdd 3.63869 1.0 3.6 TiB 3.3 TiB 3.3 TiB 132 MiB 7.7 GiB > 345 > > GiB 90.75 1.15 64 up osd.94 > > 25 hdd 3.73630 1.0 3.7 TiB 3.4 TiB 3.3 TiB 3.2 MiB 8.1 GiB > 352 > > GiB 90.79 1.15 53 up osd.25 > > 31 hdd 7.32619 1.0 7.3 TiB 6.7 TiB 6.6 TiB 223 MiB 15 GiB > 690 > > GiB 90.80 1.15 117 up osd.31 > > 84 hdd 7.52150 1.0 7.5 TiB 6.8 TiB 6.6 TiB 159 MiB 16 GiB > 699 > > GiB 90.93 1.16 121 up osd.84 > > 82 hdd 3.63689 1.0 3.6 TiB 3.3 TiB 332 GiB 1.0 GiB 0 B > 332 > > GiB 91.08 1.16 59 up osd.82 > > 89 hdd 7.52150 1.0 7.5 TiB 6.9 TiB 6.6 TiB 400 MiB 15 GiB > 670 > > GiB 91.29 1.16 126 up osd.89 > > 33 hdd 3.73630 1.0 3.7 TiB 3.4 TiB 3.3 TiB 382 MiB 8.6 GiB > 327 > > GiB 91.46 1.16 66 up osd.33 > > 90 hdd 7.52150 1.0 7.5 TiB 6.9 TiB 6.6 TiB 338 MiB 15 GiB > 658 > > GiB 91.46 1.16 112 up osd.90 > > 105 hdd 3.6
[ceph-users] Re: should I increase the amount of PGs?
One week later the ceph is still balancing. What worries me like hell is the %USE on a lot of those OSDs. Does ceph resolv this on it's own? We are currently down to 5TB space in the cluster. Rebalancing single OSDs doesn't work well and it increases the "missplaced objects". I thought about letting upmap do some rebalancing. Anyone know if this is a good idea? Or if I should bite my nails an wait as I am the headache of my life. [root@s3db1 ~]# ceph osd getmap -o om; osdmaptool om --upmap out.txt --upmap-pool eu-central-1.rgw.buckets.data --upmap-max 10; cat out.txt got osdmap epoch 321975 osdmaptool: osdmap file 'om' writing upmap command output to: out.txt checking for upmap cleanups upmap, max-count 10, max deviation 5 limiting to pools eu-central-1.rgw.buckets.data ([11]) pools eu-central-1.rgw.buckets.data prepared 10/10 changes ceph osd rm-pg-upmap-items 11.209 ceph osd rm-pg-upmap-items 11.253 ceph osd pg-upmap-items 11.7f 79 88 ceph osd pg-upmap-items 11.fc 53 31 105 78 ceph osd pg-upmap-items 11.1d8 84 50 ceph osd pg-upmap-items 11.47f 94 86 ceph osd pg-upmap-items 11.49c 44 71 ceph osd pg-upmap-items 11.553 74 50 ceph osd pg-upmap-items 11.6c3 66 63 ceph osd pg-upmap-items 11.7ad 43 50 ID CLASS WEIGHTREWEIGHT SIZERAW USE DATA OMAP META AVAIL%USE VAR PGS STATUS TYPE NAME -1 795.42548- 795 TiB 626 TiB 587 TiB 82 GiB 1.4 TiB 170 TiB 78.64 1.00 -root default 56 hdd 7.32619 1.0 7.3 TiB 6.4 TiB 6.4 TiB 684 MiB 16 GiB 910 GiB 87.87 1.12 129 up osd.56 67 hdd 7.27739 1.0 7.3 TiB 6.4 TiB 6.4 TiB 582 MiB 16 GiB 865 GiB 88.40 1.12 115 up osd.67 79 hdd 3.63689 1.0 3.6 TiB 3.2 TiB 432 GiB 1.9 GiB 0 B 432 GiB 88.40 1.12 63 up osd.79 53 hdd 7.32619 1.0 7.3 TiB 6.5 TiB 6.4 TiB 971 MiB 22 GiB 864 GiB 88.48 1.13 114 up osd.53 51 hdd 7.27739 1.0 7.3 TiB 6.5 TiB 6.4 TiB 734 MiB 15 GiB 837 GiB 88.77 1.13 120 up osd.51 73 hdd 14.55269 1.0 15 TiB 13 TiB 13 TiB 1.8 GiB 39 GiB 1.6 TiB 88.97 1.13 246 up osd.73 55 hdd 7.32619 1.0 7.3 TiB 6.5 TiB 6.5 TiB 259 MiB 15 GiB 825 GiB 89.01 1.13 118 up osd.55 70 hdd 7.27739 1.0 7.3 TiB 6.5 TiB 6.5 TiB 291 MiB 16 GiB 787 GiB 89.44 1.14 119 up osd.70 42 hdd 3.73630 1.0 3.7 TiB 3.4 TiB 3.3 TiB 685 MiB 8.2 GiB 374 GiB 90.23 1.15 60 up osd.42 94 hdd 3.63869 1.0 3.6 TiB 3.3 TiB 3.3 TiB 132 MiB 7.7 GiB 345 GiB 90.75 1.15 64 up osd.94 25 hdd 3.73630 1.0 3.7 TiB 3.4 TiB 3.3 TiB 3.2 MiB 8.1 GiB 352 GiB 90.79 1.15 53 up osd.25 31 hdd 7.32619 1.0 7.3 TiB 6.7 TiB 6.6 TiB 223 MiB 15 GiB 690 GiB 90.80 1.15 117 up osd.31 84 hdd 7.52150 1.0 7.5 TiB 6.8 TiB 6.6 TiB 159 MiB 16 GiB 699 GiB 90.93 1.16 121 up osd.84 82 hdd 3.63689 1.0 3.6 TiB 3.3 TiB 332 GiB 1.0 GiB 0 B 332 GiB 91.08 1.16 59 up osd.82 89 hdd 7.52150 1.0 7.5 TiB 6.9 TiB 6.6 TiB 400 MiB 15 GiB 670 GiB 91.29 1.16 126 up osd.89 33 hdd 3.73630 1.0 3.7 TiB 3.4 TiB 3.3 TiB 382 MiB 8.6 GiB 327 GiB 91.46 1.16 66 up osd.33 90 hdd 7.52150 1.0 7.5 TiB 6.9 TiB 6.6 TiB 338 MiB 15 GiB 658 GiB 91.46 1.16 112 up osd.90 105 hdd 3.63869 0.8 3.6 TiB 3.3 TiB 3.3 TiB 206 MiB 8.1 GiB 301 GiB 91.91 1.17 56 up osd.105 66 hdd 7.27739 0.95000 7.3 TiB 6.7 TiB 6.7 TiB 322 MiB 16 GiB 548 GiB 92.64 1.18 121 up osd.66 46 hdd 7.27739 1.0 7.3 TiB 6.8 TiB 6.7 TiB 316 MiB 16 GiB 536 GiB 92.81 1.18 119 up osd.46 Am Di., 23. März 2021 um 19:59 Uhr schrieb Boris Behrens : > Good point. Thanks for the hint. I changed it for all OSDs from 5 to 1 > *crossing finger* > > Am Di., 23. März 2021 um 19:45 Uhr schrieb Dan van der Ster < > d...@vanderster.com>: > >> I see. When splitting PGs, the OSDs will increase is used space >> temporarily to make room for the new PGs. >> When going from 1024->2048 PGs, that means that half of the objects from >> each PG will be copied to a new PG, and then the previous PGs will have >> those objects deleted. >> >> Make sure osd_max_backfills is set to 1, so that not too many PGs are >> moving concurrently. >> >> >> >> On Tue, Mar 23, 2021, 7:39 PM Boris Behrens wrote: >> >>> Thank you. >>> Currently I do not have any full OSDs (all <90%) but I keep this in mind. >>> What worries me is the ever increasing %USE metric (it went up from >>> around 72% to 75% in three hours). It looks like there is comming a lot of >>> data (there comes barely new data at the moment), but I think this might >&
[ceph-users] Re: forceful remap PGs
I just move one PG away from the OSD, but the diskspace will not get freed. Do I need to do something to clean obsolete objects from the osd? Am Di., 30. März 2021 um 11:47 Uhr schrieb Boris Behrens : > Hi, > I have a couple OSDs that currently get a lot of data, and are running > towards 95% fillrate. > > I would like to forcefully remap some PGs (they are around 100GB) to more > empty OSDs and drop them from the full OSDs. I know this would lead to > degraded objects, but I am not sure how long the cluster will stay in a > state where it can allocate objects. > > OSD.105 grew from around 85% to 92% in the last 4 hours. > > This is the current state > cluster: > id: dca79fff-ffd0-58f4-1cff-82a2feea05f4 > health: HEALTH_WARN > noscrub,nodeep-scrub flag(s) set > 9 backfillfull osd(s) > 19 nearfull osd(s) > 37 pool(s) backfillfull > BlueFS spillover detected on 1 OSD(s) > 13 large omap objects > Low space hindering backfill (add storage if this doesn't > resolve itself): 248 pgs backfill_toofull > Degraded data redundancy: 18115/362288820 objects degraded > (0.005%), 1 pg degraded, 1 pg undersized > > services: > mon: 3 daemons, quorum ceph-s3-mon1,ceph-s3-mon2,ceph-s3-mon3 (age 6d) > mgr: ceph-mgr2(active, since 6d), standbys: ceph-mgr3, ceph-mgr1 > mds: 3 up:standby > osd: 110 osds: 110 up (since 4d), 110 in (since 6d); 324 remapped pgs > flags noscrub,nodeep-scrub > rgw: 4 daemons active (admin, eu-central-1, eu-msg-1, eu-secure-1) > > task status: > > data: > pools: 37 pools, 4032 pgs > objects: 120.76M objects, 197 TiB > usage: 620 TiB used, 176 TiB / 795 TiB avail > pgs: 18115/362288820 objects degraded (0.005%) > 47144186/362288820 objects misplaced (13.013%) > 3708 active+clean > 241 active+remapped+backfill_wait+backfill_toofull > 63 active+remapped+backfill_wait > 11 active+remapped+backfilling > 6active+remapped+backfill_toofull > 1active+remapped+backfilling+forced_backfill > 1active+remapped+forced_backfill+backfill_toofull > 1active+undersized+degraded+remapped+backfilling > > io: > client: 23 MiB/s rd, 252 MiB/s wr, 347 op/s rd, 381 op/s wr > recovery: 194 MiB/s, 112 objects/s > --- > ID CLASS WEIGHTREWEIGHT SIZERAW USE DATAOMAP META > AVAIL%USE VAR PGS STATUS TYPE NAME > -1 795.42548- 795 TiB 620 TiB 582 TiB 82 GiB 1.4 TiB 176 > TiB 77.90 1.00 -root default > 84 hdd 7.52150 1.0 7.5 TiB 6.8 TiB 6.5 TiB 158 MiB 15 GiB 764 > GiB 90.07 1.16 121 up osd.84 > 79 hdd 3.63689 1.0 3.6 TiB 3.3 TiB 367 GiB 1.9 GiB 0 B 367 > GiB 90.15 1.16 64 up osd.79 > 70 hdd 7.27739 1.0 7.3 TiB 6.6 TiB 6.5 TiB 268 MiB 15 GiB 730 > GiB 90.20 1.16 121 up osd.70 > 82 hdd 3.63689 1.0 3.6 TiB 3.3 TiB 364 GiB 1.1 GiB 0 B 364 > GiB 90.23 1.16 59 up osd.82 > 89 hdd 7.52150 1.0 7.5 TiB 6.8 TiB 6.6 TiB 395 MiB 16 GiB 735 > GiB 90.45 1.16 126 up osd.89 > 90 hdd 7.52150 1.0 7.5 TiB 6.8 TiB 6.6 TiB 338 MiB 15 GiB 723 > GiB 90.62 1.16 112 up osd.90 > 33 hdd 3.73630 1.0 3.7 TiB 3.4 TiB 3.3 TiB 382 MiB 8.6 GiB 358 > GiB 90.64 1.16 66 up osd.33 > 66 hdd 7.27739 0.95000 7.3 TiB 6.7 TiB 6.7 TiB 313 MiB 16 GiB 605 > GiB 91.88 1.18 122 up osd.66 > 46 hdd 7.27739 1.0 7.3 TiB 6.7 TiB 6.7 TiB 312 MiB 16 GiB 601 > GiB 91.93 1.18 119 up osd.46 > 105 hdd 3.63869 0.8 3.6 TiB 3.4 TiB 3.4 TiB 206 MiB 8.1 GiB 281 > GiB 92.45 1.19 58 up osd.105 > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > groüen Saal. > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] forceful remap PGs
Hi, I have a couple OSDs that currently get a lot of data, and are running towards 95% fillrate. I would like to forcefully remap some PGs (they are around 100GB) to more empty OSDs and drop them from the full OSDs. I know this would lead to degraded objects, but I am not sure how long the cluster will stay in a state where it can allocate objects. OSD.105 grew from around 85% to 92% in the last 4 hours. This is the current state cluster: id: dca79fff-ffd0-58f4-1cff-82a2feea05f4 health: HEALTH_WARN noscrub,nodeep-scrub flag(s) set 9 backfillfull osd(s) 19 nearfull osd(s) 37 pool(s) backfillfull BlueFS spillover detected on 1 OSD(s) 13 large omap objects Low space hindering backfill (add storage if this doesn't resolve itself): 248 pgs backfill_toofull Degraded data redundancy: 18115/362288820 objects degraded (0.005%), 1 pg degraded, 1 pg undersized services: mon: 3 daemons, quorum ceph-s3-mon1,ceph-s3-mon2,ceph-s3-mon3 (age 6d) mgr: ceph-mgr2(active, since 6d), standbys: ceph-mgr3, ceph-mgr1 mds: 3 up:standby osd: 110 osds: 110 up (since 4d), 110 in (since 6d); 324 remapped pgs flags noscrub,nodeep-scrub rgw: 4 daemons active (admin, eu-central-1, eu-msg-1, eu-secure-1) task status: data: pools: 37 pools, 4032 pgs objects: 120.76M objects, 197 TiB usage: 620 TiB used, 176 TiB / 795 TiB avail pgs: 18115/362288820 objects degraded (0.005%) 47144186/362288820 objects misplaced (13.013%) 3708 active+clean 241 active+remapped+backfill_wait+backfill_toofull 63 active+remapped+backfill_wait 11 active+remapped+backfilling 6active+remapped+backfill_toofull 1active+remapped+backfilling+forced_backfill 1active+remapped+forced_backfill+backfill_toofull 1active+undersized+degraded+remapped+backfilling io: client: 23 MiB/s rd, 252 MiB/s wr, 347 op/s rd, 381 op/s wr recovery: 194 MiB/s, 112 objects/s --- ID CLASS WEIGHTREWEIGHT SIZERAW USE DATAOMAP METAAVAIL %USE VAR PGS STATUS TYPE NAME -1 795.42548- 795 TiB 620 TiB 582 TiB 82 GiB 1.4 TiB 176 TiB 77.90 1.00 -root default 84 hdd 7.52150 1.0 7.5 TiB 6.8 TiB 6.5 TiB 158 MiB 15 GiB 764 GiB 90.07 1.16 121 up osd.84 79 hdd 3.63689 1.0 3.6 TiB 3.3 TiB 367 GiB 1.9 GiB 0 B 367 GiB 90.15 1.16 64 up osd.79 70 hdd 7.27739 1.0 7.3 TiB 6.6 TiB 6.5 TiB 268 MiB 15 GiB 730 GiB 90.20 1.16 121 up osd.70 82 hdd 3.63689 1.0 3.6 TiB 3.3 TiB 364 GiB 1.1 GiB 0 B 364 GiB 90.23 1.16 59 up osd.82 89 hdd 7.52150 1.0 7.5 TiB 6.8 TiB 6.6 TiB 395 MiB 16 GiB 735 GiB 90.45 1.16 126 up osd.89 90 hdd 7.52150 1.0 7.5 TiB 6.8 TiB 6.6 TiB 338 MiB 15 GiB 723 GiB 90.62 1.16 112 up osd.90 33 hdd 3.73630 1.0 3.7 TiB 3.4 TiB 3.3 TiB 382 MiB 8.6 GiB 358 GiB 90.64 1.16 66 up osd.33 66 hdd 7.27739 0.95000 7.3 TiB 6.7 TiB 6.7 TiB 313 MiB 16 GiB 605 GiB 91.88 1.18 122 up osd.66 46 hdd 7.27739 1.0 7.3 TiB 6.7 TiB 6.7 TiB 312 MiB 16 GiB 601 GiB 91.93 1.18 119 up osd.46 105 hdd 3.63869 0.8 3.6 TiB 3.4 TiB 3.4 TiB 206 MiB 8.1 GiB 281 GiB 92.45 1.19 58 up osd.105 -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io