[ceph-users] Re: large bucket index in multisite environement (how to deal with large omap objects warning)?

2021-11-10 Thread Boris Behrens
I am just creating a bucket with a lot of files to test it. Who would have
thought that uploading a million 1k files would take days?

Am Di., 9. Nov. 2021 um 00:50 Uhr schrieb prosergey07 :

> When resharding is performed I believe its considered as bucket operation
> and undergoes through updating the bucket stats. Like new bucket shard is
> created and it may increase the number of objects within the bucket stats.
>  If it was broken during resharding, you could check the current bucket id
> from:
>  radosgw-admin metadata get "bucket:BUCKET_NAME".
>
> That would hive an idea which bucket index objects to keep.
>
>  Then you could remove corrupted bucket shards (not the ones with the
> bucket id from the previous command) .dir.corrupted_bucket_index.SHARD_NUM
> objects from bucket.index pool:
>
> rados -p bucket.index .dir.corrupted_bucket_index.SHARD_NUM
>
> Where SHARD_NUM is the shard number you want to delete.
>
>  And then running "radosgw-admin bucket check --fix --bucket=BUCKET_NAME"
>
>  That should have resolved your issue with the number of objects.
>
>  As for slow object deletion. Do you run your metadata pools for rgw on
> nvme drives ? Specifically bucket.index pool. The problem is that you have
> a lot of objects and probably not enough shards. Radosgw retrieves the list
> of objects from bucket.index and if I remember correct it retrieves them as
> ordered list which is very expensive operation. Hence handful of time might
> be spent just on getting the object list.
>
>  We get 1000 objects per second deleted  inside our storage.
>
>
> I would not recommend using "--inconsistent-index" to avoid more
> consitency issues.
>
>
>
>
> Надіслано з пристрою Galaxy
>
>
> ---- Оригінальне повідомлення 
> Від: mhnx 
> Дата: 08.11.21 13:28 (GMT+02:00)
> Кому: Сергей Процун 
> Копія: "Szabo, Istvan (Agoda)" , Boris Behrens <
> b...@kervyn.de>, Ceph Users 
> Тема: Re: [ceph-users] Re: large bucket index in multisite environement
> (how to deal with large omap objects warning)?
>
> (There should not be any issues using rgw for other buckets while
> re-sharding.)
> If it is then disabling the bucket access will work right? Also sync
> should be disabled.
>
> Yes, after the manual reshard it should clear the leftovers but in my
> situation resharding failed and I got double entries for that bucket.
> I didn't push further, instead I divide the bucket to new buckets and
> reduce object count with a new bucket tree. Copied all of the objects with
> rclone and started bucket remove "radosgw-admin bucket rm --bucket=mybucket
> --bypass-gc --purge-objects --max-concurrent-ios=128" it has been very very
> long time "started at Sep08" and it is still working. There was 250M
> objects in that bucket and after the manual reshard faiI I got 500M object
> count when I check with bucket stats num_objects. Now I have;
> "size_kb": 10648067645,
> "num_objects": 132270190
>
> Remove speed is 50-60 objects in a second. It's not because of the cluster
> speed. Cluster is fine.
> I have space so I let it go. When I see stable object count I will stop
> the remove process and start again with the
> " --inconsistent-index" parameter.
> I wonder is it safe to use the parameter with referenced objects? I want
> to learn how "--inconsistent-index" works and what it does.
>
> Сергей Процун , 5 Kas 2021 Cum, 17:46 tarihinde
> şunu yazdı:
>
>> There should not be any issues using rgw for other buckets while
>> re-sharding.
>>
>> As for doubling number of objects after reshard is an interesting
>> situation. After the manual reshard is done, there might be leftover from
>> the old bucket index. As during reshard new .dir.new_bucket_index objects
>> are created. They contain all data related to the objects which are stored
>> in buckets.data pool. Just wondering if the issue with the doubled number
>> of objects was related to old bucket index. If so its save to delete old
>> bucket index.
>>
>>  In the perfect world, it would be ideal to know the eventoal number of
>> objects inside the bucket and set number of shards to the corresponding
>> setting initially.
>>
>>  In the real world when the client re-purpose the usage of the bucket, we
>> have to deal with reshards.
>>
>> пт, 5 лист. 2021, 14:43 користувач mhnx  пише:
>>
>>> I also use this method and I hate it.
>>>
>>> Stopping all of the RGW clients is never an option! It shouldn't be.
>>> Sharding is hell. I was have 250M objects in a bucket and reshard faile

[ceph-users] Re: Question if WAL/block.db partition will benefit us

2021-11-10 Thread Boris Behrens
Hi,
we use enterprise SSDs like SAMSUNG MZ7KM1T9.
The work very well for our block storage. Some NVMe would be a lot nicer
but we have some good experience with them.

One SSD fail takes down 10 OSDs might sound hard, but this would be an
okayish risk. Most of the tunables are defaul in our setup and this looks
like PGs have a failure domain of a host. I restart the systems on a
regular basis for kernel updates.
Also checking disk io with dstat seems to be rather low on the SSDs (below
1k IOPs)
root@s3db18:~# dstat --disk --io  -T  -D sdd
--dsk/sdd-- ---io/sdd-- --epoch---
 read  writ| read  writ|  epoch
 214k 1656k|7.21   126 |1636536603
 144k 1176k|2.00   200 |1636536604
 128k 1400k|2.00   230 |1636536605

Normaly I would now try this configuration:
1 SSD / 10 OSDs - having 150GB of block.db and block.wal, both on the same
partition as someone stated before, and 200GB extra to move all pools
except the .data pool to SSDs.

But thinking about 10 downed OSDs if one SSD fails let's me wonder how to
recover from that.
IIRC the configuration per OSDs is in the LVM tags:
root@s3db18:~# lvs -o lv_tags
  LV Tags

ceph.block_device=...,ceph.db_device=/dev/sdd8,ceph.db_uuid=011275a3-4201-8840-a678-c2e23d38bfd6,...

When the SSD fails, can I just remove the tags and restart the OSD
with ceph-volume
lvm activate --all? And after replacing the failed SSD readd the tags with
the correct IDs? Do I need to do anything else to prepare a block.db
partition?

Cheers
 Boris


Am Di., 9. Nov. 2021 um 22:15 Uhr schrieb prosergey07 :

> Not sure how much it would help the performance with osd's backed with ssd
> db and wal devices. Even if you go this route with one ssd per 10 hdd, you
> might want to set the failure domain per host in crush rules in case ssd is
> out of service.
>
>  But from the practice ssd will not help too much to boost the performance
> especially for sharing it between 10 hdds.
>
>  We use nvme db+wal per osd and separate nvme specifically for metadata
> pools. There will be a lot of I/O on bucket.index pool and rgw pool which
> stores user, bucket metadata. So you might want to put them into separate
> fast storage.
>
>  Also if there will not be too much objects, like huge objects but not
> tens-hundreds million of them then bucket index will have less presure and
> ssd might be okay for metadata pools in that case.
>
>
>
> Надіслано з пристрою Galaxy
>
>
>  Оригінальне повідомлення 
> Від: Boris Behrens 
> Дата: 08.11.21 13:08 (GMT+02:00)
> Кому: ceph-users@ceph.io
> Тема: [ceph-users] Question if WAL/block.db partition will benefit us
>
> Hi,
> we run a larger octopus s3 cluster with only rotating disks.
> 1.3 PiB with 177 OSDs, some with a SSD block.db and some without.
>
> We have a ton of spare 2TB disks and we just wondered if we can bring the
> to good use.
> For every 10 spinning disks we could add one 2TB SSD and we would create
> two partitions per OSD (130GB for block.db and 20GB for block.wal). This
> would leave some empty space on the SSD for waer leveling.
>
> The question now is: would we benefit from this? Most of the data that is
> written to the cluster is very large (50GB and above). This would take a
> lot of work into restructuring the cluster and also two other clusters.
>
> And does it make a different to have only a block.db partition or a
> block.db and a block.wal partition?
>
> Cheers
> Boris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Question if WAL/block.db partition will benefit us

2021-11-08 Thread Boris Behrens
> That does not seem like a lot. Having SSD based metadata pools might
> reduce latency though.
>
So block.db and block.wal doesn't make sense? I would like to have a
consistent cluster.
In either case I would need to remove or add SSDs, because we currently
have this mixed.

It does waste a lot of space. But might be worth it if performance
> improves a lot. You might also be able to separate small objects from
> large objects based on placement targets / storage classes [1]. This
> would allow you to store small objects on SSD. Those might be more
> latency sensitive than large objects anyway?
>
> Gr. Stefan
>
> [1]: https://docs.ceph.com/en/latest/radosgw/placement/
>

Puh, large topic.
Would removing the smaller files from the spinning disks release enough
pressure from the flying heads to speed up large file uploads? Could be a
test, but I don't know if this would work as expected. I can imagine that
this leads to larger problems, when the SSD OSDs run out of space.
Also I would rather add more spinning disks because we also need a lot of
space.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: large bucket index in multisite environement (how to deal with large omap objects warning)?

2021-11-08 Thread Boris Behrens
yday and delete it at the end of the week or
>>> month
>>> then you should definitely use a temp bucket.  No versioning, No
>>> multisite,
>>> No index if it's possible.
>>>
>>>
>>>
>>> Szabo, Istvan (Agoda) , 5 Kas 2021 Cum, 12:30
>>> tarihinde şunu yazdı:
>>>
>>> > You mean prepare or reshard?
>>> > Prepare:
>>> > I collect as much information for the users before onboarding so I can
>>> > prepare for their use case in the future and set things up.
>>> >
>>> > Preshard:
>>> > After created the bucket:
>>> > radosgw-admin bucket reshard --bucket=ex-bucket --num-shards=101
>>> >
>>> > Also when you shard the buckets, you need to use prime numbers.
>>> >
>>> > Istvan Szabo
>>> > Senior Infrastructure Engineer
>>> > ---
>>> > Agoda Services Co., Ltd.
>>> > e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
>>> > ---
>>> >
>>> > From: Boris Behrens 
>>> > Sent: Friday, November 5, 2021 4:22 PM
>>> > To: Szabo, Istvan (Agoda) ; ceph-users@ceph.io
>>> > Subject: Re: [ceph-users] large bucket index in multisite environement
>>> > (how to deal with large omap objects warning)?
>>> >
>>> > Email received from the internet. If in doubt, don't click any link nor
>>> > open any attachment !
>>> > 
>>> > Cheers Istvan,
>>> >
>>> > how do you do this?
>>> >
>>> > Am Do., 4. Nov. 2021 um 19:45 Uhr schrieb Szabo, Istvan (Agoda) <
>>> > istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>>:
>>> > This one you need to prepare, you beed to preshard the bucket which you
>>> > know that will hold more than millions of objects.
>>> >
>>> > I have a bucket where we store 1.2 billions of objects with 24xxx
>>> shard.
>>> > No omap issue.
>>> > Istvan Szabo
>>> > Senior Infrastructure Engineer
>>> > ---
>>> > Agoda Services Co., Ltd.
>>> > e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
>>> > ---
>>> >
>>> >
>>> >
>>> > --
>>> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
>>> im
>>> > groüen Saal.
>>> > ___
>>> > ceph-users mailing list -- ceph-users@ceph.io
>>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>> >
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Question if WAL/block.db partition will benefit us

2021-11-08 Thread Boris Behrens
Hi Stefan,

for a 6:1 or 3:1 ration we do not have enough slots (I think).
There is some read but I don't know if this is a lot.
client:   27 MiB/s rd, 289 MiB/s wr, 1.07k op/s rd, 261 op/s wr

Putting the to use for some special rgw pools also came to my mind.
But would this make a lot of difference?
.rgw.root 164  150 KiB  142   26 MiB  0
42 TiB
eu-central-1.rgw.control  264  0 B8  0 B  0
42 TiB
eu-central-1.rgw.data.root364  1.2 MiB3.96k  743 MiB  0
42 TiB
eu-central-1.rgw.gc   464  329 MiB  128  998 MiB  0
42 TiB
eu-central-1.rgw.log  564  939 KiB  370  3.1 MiB  0
42 TiB
eu-central-1.rgw.users.uid664   12 MiB7.10k  1.2 GiB  0
42 TiB
eu-central-1.rgw.users.keys   764  297 KiB7.40k  1.4 GiB  0
42 TiB
eu-central-1.rgw.meta 864  392 KiB   1k  191 MiB  0
42 TiB
eu-central-1.rgw.users.email  964 40 B1  192 KiB  0
42 TiB
eu-central-1.rgw.buckets.index   1064   22 GiB2.55k   67 GiB   0.05
42 TiB
eu-central-1.rgw.buckets.data11  2048  318 TiB  132.31M  961 TiB  88.38
42 TiB
eu-central-1.rgw.buckets.non-ec  1264  467 MiB   13.28k  2.4 GiB  0
42 TiB
eu-central-1.rgw.usage   1364  767 MiB   32  2.2 GiB  0
42 TiB

I would have put the rgw.buckets.index and maybe the rgw.meta pools on it,
but it looks like a waste of space. Having a 2TB OSD in evey chassis that
only handles 23GB of data.

Am Mo., 8. Nov. 2021 um 12:30 Uhr schrieb Stefan Kooman :

> On 11/8/21 12:07, Boris Behrens wrote:
> > Hi,
> > we run a larger octopus s3 cluster with only rotating disks.
> > 1.3 PiB with 177 OSDs, some with a SSD block.db and some without.
> >
> > We have a ton of spare 2TB disks and we just wondered if we can bring the
> > to good use.
> > For every 10 spinning disks we could add one 2TB SSD and we would create
> > two partitions per OSD (130GB for block.db and 20GB for block.wal). This
> > would leave some empty space on the SSD for waer leveling.
>
> A 10:1 ratio looks rather high. Discussions on this list indicate this
> ratio normally is in the 3:1 up to 6:1 range (for high end NVMe / SSD).
>
> >
> > The question now is: would we benefit from this? Most of the data that is
> > written to the cluster is very large (50GB and above). This would take a
> > lot of work into restructuring the cluster and also two other clusters.
> >
> > And does it make a different to have only a block.db partition or a
> > block.db and a block.wal partition?
>
> Does this cluster also gets a lot of reads? I wonder if using the SSD
> drives for S3 metadata pools would make more sense. And also be a lot
> less work.
>
> Gr. Stefan
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Question if WAL/block.db partition will benefit us

2021-11-08 Thread Boris Behrens
Hi,
we run a larger octopus s3 cluster with only rotating disks.
1.3 PiB with 177 OSDs, some with a SSD block.db and some without.

We have a ton of spare 2TB disks and we just wondered if we can bring the
to good use.
For every 10 spinning disks we could add one 2TB SSD and we would create
two partitions per OSD (130GB for block.db and 20GB for block.wal). This
would leave some empty space on the SSD for waer leveling.

The question now is: would we benefit from this? Most of the data that is
written to the cluster is very large (50GB and above). This would take a
lot of work into restructuring the cluster and also two other clusters.

And does it make a different to have only a block.db partition or a
block.db and a block.wal partition?

Cheers
 Boris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: large bucket index in multisite environement (how to deal with large omap objects warning)?

2021-11-05 Thread Boris Behrens
Cheers Istvan,

how do you do this?

Am Do., 4. Nov. 2021 um 19:45 Uhr schrieb Szabo, Istvan (Agoda) <
istvan.sz...@agoda.com>:

> This one you need to prepare, you beed to preshard the bucket which you
> know that will hold more than millions of objects.
>
> I have a bucket where we store 1.2 billions of objects with 24xxx shard.
> No omap issue.
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
>

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: large bucket index in multisite environement (how to deal with large omap objects warning)?

2021-11-05 Thread Boris Behrens
Hi Teoman,

I don't sync the bucket content. It's just the metadata that get's synced.
But turning off the access to our s3 is not an option, because our customer
rely on it (the make backups and serve objects for their web applications
through it).

Am Do., 4. Nov. 2021 um 18:20 Uhr schrieb Teoman Onay :

> AFAIK dynamic resharding is not supported for multisite setups but you can
> reshard manually.
> Note that this is a very expensive process which requires you to:
>
> - disable the sync of the bucket you want to reshard.
> - Stops all the RGW (no more access to your Ceph cluster)
> - On a node of the master zone, reshard the bucket
> - On the secondary zone, purge the bucket
> - Restart the RGW(s)
> - re-enable sync of the bucket.
>
> 4m objects/bucket is way to much...
>
> Regards
>
> Teoman
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] large bucket index in multisite environement (how to deal with large omap objects warning)?

2021-11-04 Thread Boris Behrens
Hi everybody,

we maintain three ceph clusters (2x octopus, 1x nautilus) that use three
zonegroups to sync metadata, without syncing the actual data (only one zone
per zonegroup).

Some customer got buckets with >4m objects in our largest cluster (the
other two a very fresh with close to 0 data in it)

How do I handle that in regards of the "Large OMAP objects" warning?
- Sharding is not an option, because it is a multisite environment (at
least thats what I read everywere)
- Limiting the customers is not a great option, because he already got that
huge amount of files in their buckets
- disabling the warning / increasing the threashold, is IMHO a bad option
(people might have put some thinking in that limit and having 40x the limit
is far off the "just roll with it" threashold)

I really hope that someone does have an answer, or maybe there is some
roadmap which addresses this issue.

Cheers
 Boris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: s3cmd does not show multiparts in nautilus RGW on specific bucket (--debug shows loop)

2021-10-29 Thread Boris Behrens
Hi guys,
we just updated the cluster to latest octopus, but we still can not list
multipart uploads if there are more than 2k multiparts.

Is there any way to show the multiparts and maybe cancel them?

Am Mo., 25. Okt. 2021 um 16:23 Uhr schrieb Boris Behrens :

> Hi Casey,
>
> thanks a lot for that hint. That sound a lot like this is the problem.
> Is there a way to show incomplete multipart uploads via radosgw-admin?
>
> So I would be able to cancel it.
>
> Upgrading to octopus might take a TON of time, as we have 1.1 PiB in 160
> OSDs rotational disks. :)
>
> Am Mo., 25. Okt. 2021 um 16:19 Uhr schrieb Casey Bodley <
> cbod...@redhat.com>:
>
>> hi Boris, this sounds a lot like
>> https://tracker.ceph.com/issues/49206, which says "When deleting a
>> bucket with an incomplete multipart upload that has about 2000 parts
>> uploaded, we noticed an infinite loop, which stopped s3cmd from
>> deleting the bucket forever."
>>
>> i'm afraid this fix was merged after nautilus went end-of-life, so
>> you'd need to upgrade to octopus for it
>>
>> On Mon, Oct 25, 2021 at 9:52 AM Boris Behrens  wrote:
>> >
>> > Good day everybody,
>> >
>> > I just came across very strange behavior. I have two buckets where s3cmd
>> > hangs when I try to show current multipart uploads.
>> >
>> > When I use --debug I see that it loops over the same response.
>> > What I tried to fix it on one bucket:
>> > * radosgw-admin bucket check --bucket=BUCKETNAME
>> > * radosgw-admin bucket check --check-objects --fix --bucket=BUCKETNAME
>> >
>> > The check command now reports an empty array [], but I still can't show
>> the
>> > multiparts. I can interact very normal with the bucket (list/put/get
>> > objects).
>> >
>> > The debug output shows always the same data and
>> > DEBUG: Listing continues after 'FILENAME'
>> >
>> > Did someone already came across this error?
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>>
>>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: upgrade OSDs before mon

2021-10-26 Thread Boris Behrens
Hi Yury,
unfortunally not. It's a package installation and there are no nautilus
packages in ubuntu 20.04 (just realised this).

Now the question: downgrade ubuntu to 18.04 and start over, or keep the
octopus OSDs in a nautilus cluster? Would be cool if the last one is
working properly.

Am Di., 26. Okt. 2021 um 15:47 Uhr schrieb Yury Kirsanov <
y.kirsa...@gmail.com>:

> You can downgrade any CEPH packages if you want to. Just specify the
> number you'd like to go to.
>
> On Wed, Oct 27, 2021 at 12:36 AM Boris Behrens  wrote:
>
>> Hi,
>> I just added new storage to our s3 cluster and saw that ubuntu didn't
>> priortize the nautilus package over the octopus package.
>>
>> Now I have 10 OSDs with octopus in a pure nautilus cluster.
>>
>> Can I leave it this way, or should I remove the OSDs and first upgrade the
>> mons?
>>
>> Cheers
>>  Boris
>>
>> --
>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
>> groüen Saal.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] upgrade OSDs before mon

2021-10-26 Thread Boris Behrens
Hi,
I just added new storage to our s3 cluster and saw that ubuntu didn't
priortize the nautilus package over the octopus package.

Now I have 10 OSDs with octopus in a pure nautilus cluster.

Can I leave it this way, or should I remove the OSDs and first upgrade the
mons?

Cheers
 Boris

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: s3cmd does not show multiparts in nautilus RGW on specific bucket (--debug shows loop)

2021-10-25 Thread Boris Behrens
Hi Casey,

thanks a lot for that hint. That sound a lot like this is the problem.
Is there a way to show incomplete multipart uploads via radosgw-admin?

So I would be able to cancel it.

Upgrading to octopus might take a TON of time, as we have 1.1 PiB in 160
OSDs rotational disks. :)

Am Mo., 25. Okt. 2021 um 16:19 Uhr schrieb Casey Bodley :

> hi Boris, this sounds a lot like
> https://tracker.ceph.com/issues/49206, which says "When deleting a
> bucket with an incomplete multipart upload that has about 2000 parts
> uploaded, we noticed an infinite loop, which stopped s3cmd from
> deleting the bucket forever."
>
> i'm afraid this fix was merged after nautilus went end-of-life, so
> you'd need to upgrade to octopus for it
>
> On Mon, Oct 25, 2021 at 9:52 AM Boris Behrens  wrote:
> >
> > Good day everybody,
> >
> > I just came across very strange behavior. I have two buckets where s3cmd
> > hangs when I try to show current multipart uploads.
> >
> > When I use --debug I see that it loops over the same response.
> > What I tried to fix it on one bucket:
> > * radosgw-admin bucket check --bucket=BUCKETNAME
> > * radosgw-admin bucket check --check-objects --fix --bucket=BUCKETNAME
> >
> > The check command now reports an empty array [], but I still can't show
> the
> > multiparts. I can interact very normal with the bucket (list/put/get
> > objects).
> >
> > The debug output shows always the same data and
> > DEBUG: Listing continues after 'FILENAME'
> >
> > Did someone already came across this error?
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
>

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] s3cmd does not show multiparts in nautilus RGW on specific bucket (--debug shows loop)

2021-10-25 Thread Boris Behrens
Good day everybody,

I just came across very strange behavior. I have two buckets where s3cmd
hangs when I try to show current multipart uploads.

When I use --debug I see that it loops over the same response.
What I tried to fix it on one bucket:
* radosgw-admin bucket check --bucket=BUCKETNAME
* radosgw-admin bucket check --check-objects --fix --bucket=BUCKETNAME

The check command now reports an empty array [], but I still can't show the
multiparts. I can interact very normal with the bucket (list/put/get
objects).

The debug output shows always the same data and
DEBUG: Listing continues after 'FILENAME'

Did someone already came across this error?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: recreate a period in radosgw

2021-10-14 Thread Boris Behrens
What I've tested so far in a testcluster:

1. create a new realm with the same name (just swap two letters)
2. remove the realm
3. get the periods file from the .rgw.root pool
4. correct the name at the end and switch out the two realmid
5. upload the file again
6. change the period for the realm with realm set
7. period update; period update --commit

This looks like it is correct, but I am not sure if this is the correct way.

Does someone got another way to do this?

Am Do., 14. Okt. 2021 um 15:44 Uhr schrieb Boris Behrens :

> Hi,
> is there a way to restore a deleted period?
>
> The realm, zonegroup and zone are still there, but I can't apply any
> changes, because the period is missing.
>
> Cheers
>  Boris
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] recreate a period in radosgw

2021-10-14 Thread Boris Behrens
Hi,
is there a way to restore a deleted period?

The realm, zonegroup and zone are still there, but I can't apply any
changes, because the period is missing.

Cheers
 Boris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] shards falling behind on multisite metadata sync

2021-10-01 Thread Boris Behrens
Hi,

does someone got a quick fix for falling behin shards in the metadata sync?

I can do a radosgw-admin metadata sync init and restart the rgw daemons to
get a full sync, but after a day the first shards falls behind, and after
two days I also get the message with "oldest incremental change not applied
...

[root@3cecef5afc28 ~]# radosgw-admin sync status
  realm 5d6f2ea4-b84a-459b-bce2-bccac338b3ef (company)
  zonegroup f6f3f550-89f0-4c0d-b9b0-301a06c52c16 (bc01)
   zone a7edb6fe-737f-4a1c-a333-0ba0566bb3dd (bc01)
  metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is behind on 6 shards
behind shards: [13,31,35,36,46,60]
oldest incremental change not applied: 2021-09-30
17:54:22.0.270207s [35]

I've tried to check the sync errors, but here I get a lot of "failed to
read remote metadata entry: (5) Input/output error" and trimming it does
not seem to work.

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: debugging radosgw sync errors

2021-09-20 Thread Boris Behrens
Ah found it.
It was a SSL certificate that was invalid (some PoC which started to mold).

Now the sync is running fine, but there is one bucket that got a ton of
data in the mdlog.
[root@s3db16 ~]# radosgw-admin mdlog list | grep temonitor | wc -l
No --period given, using current period=e8fc96f1-ae86-4dc1-b432-470b0772fded
284760
[root@s3db16 ~]# radosgw-admin mdlog list | grep name | wc -l
No --period given, using current period=e8fc96f1-ae86-4dc1-b432-470b0772fded
343078

Is it safe to clear the mdlog?

Am Mo., 20. Sept. 2021 um 01:00 Uhr schrieb Boris Behrens :

> I just deleted the rados object from .rgw.data.root and this removed the
> bucket.instance, but this did not solve the problem.
>
> It looks like there is some access error when I try to radosgw-admin
> metadata sync init.
> The 403 http response code on the post to the /admin/realm/period endpoint.
>
> I checked the system_key and added a new system user and set the keys with
> zone modify and period update --commit on both sides.
> This also did not help.
>
> After a weekend digging through the mailing list and trying to fix it, I
> am totally stuck.
> I hope that someone of you people can help me.
>
>
>
>
> Am Fr., 17. Sept. 2021 um 17:54 Uhr schrieb Boris Behrens :
>
>> While searching for other things I came across this:
>> [root ~]# radosgw-admin metadata list bucket | grep www1
>> "www1",
>> [root ~]# radosgw-admin metadata list bucket.instance | grep www1
>> "www1:ff7a8b0c-07e6-463a-861b-78f0adeba8ad.81095307.31103",
>> "www1.company.dev",
>> [root ~]# radosgw-admin bucket list | grep www1
>> "www1",
>> [root ~]# radosgw-admin metadata rm bucket.instance:www1.company.dev
>> ERROR: can't remove key: (22) Invalid argument
>>
>> Maybe this is part of the problem.
>>
>> Did somebody saw this and know what to do?
>> --
>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
>> groüen Saal.
>>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: debugging radosgw sync errors

2021-09-19 Thread Boris Behrens
I just deleted the rados object from .rgw.data.root and this removed the
bucket.instance, but this did not solve the problem.

It looks like there is some access error when I try to radosgw-admin
metadata sync init.
The 403 http response code on the post to the /admin/realm/period endpoint.

I checked the system_key and added a new system user and set the keys with
zone modify and period update --commit on both sides.
This also did not help.

After a weekend digging through the mailing list and trying to fix it, I am
totally stuck.
I hope that someone of you people can help me.




Am Fr., 17. Sept. 2021 um 17:54 Uhr schrieb Boris Behrens :

> While searching for other things I came across this:
> [root ~]# radosgw-admin metadata list bucket | grep www1
> "www1",
> [root ~]# radosgw-admin metadata list bucket.instance | grep www1
> "www1:ff7a8b0c-07e6-463a-861b-78f0adeba8ad.81095307.31103",
> "www1.company.dev",
> [root ~]# radosgw-admin bucket list | grep www1
> "www1",
> [root ~]# radosgw-admin metadata rm bucket.instance:www1.company.dev
> ERROR: can't remove key: (22) Invalid argument
>
> Maybe this is part of the problem.
>
> Did somebody saw this and know what to do?
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: debugging radosgw sync errors

2021-09-17 Thread Boris Behrens
While searching for other things I came across this:
[root ~]# radosgw-admin metadata list bucket | grep www1
"www1",
[root ~]# radosgw-admin metadata list bucket.instance | grep www1
"www1:ff7a8b0c-07e6-463a-861b-78f0adeba8ad.81095307.31103",
"www1.company.dev",
[root ~]# radosgw-admin bucket list | grep www1
"www1",
[root ~]# radosgw-admin metadata rm bucket.instance:www1.company.dev
ERROR: can't remove key: (22) Invalid argument

Maybe this is part of the problem.

Did somebody saw this and know what to do?
-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw find buckets which use the s3website feature

2021-09-17 Thread Boris Behrens
Found it:

for bucket in `radosgw-admin metadata list bucket.instance | jq .[] | cut
-f2 -d\"`; do
  if radosgw-admin metadata get --metadata-key=bucket.instance:$bucket |
grep --silent website_conf; then
echo $bucket
  fi
done

Am Do., 16. Sept. 2021 um 09:49 Uhr schrieb Boris Behrens :

> Hi people,
>
> is there a way to find bucket that use the s3website feature?
>
> Cheers
>  Boris
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] debugging radosgw sync errors

2021-09-17 Thread Boris Behrens
Hello again,

as my tests with some fresh clusters answerd most of my config questions, I
now wanted to start with our production cluster and the basic setup looks
good, but the sync does not work:

[root@3cecef5afb05 ~]# radosgw-admin sync status
  realm 5d6f2ea4-b84a-459b-bce2-bccac338b3ef (company)
  zonegroup f6f3f550-89f0-4c0d-b9b0-301a06c52c16 (bc01)
   zone a7edb6fe-737f-4a1c-a333-0ba0566bb3dd (bc01)
  metadata sync preparing for full sync
full sync: 64/64 shards
full sync: 0 entries to sync
failed to fetch master sync status: (5) Input/output error

[root@3cecef5afb05 ~]# radosgw-admin metadata sync run
2021-09-17 16:23:08.346 7f6c83c63840  0 meta sync: ERROR: failed to fetch
metadata sections
ERROR: sync.run() returned ret=-5
2021-09-17 16:23:08.474 7f6c83c63840  0 RGW-SYNC:meta: ERROR: failed to
fetch all metadata keys (r=-5)

And when I check "radosgw-admin period get", the sync_status is just an
array of empty strings:
[root@3cecef5afb05 ~]# radosgw-admin period get
{
"id": "e8fc96f1-ae86-4dc1-b432-470b0772fded",
"epoch": 71,
"predecessor_uuid": "5349ac85-3d6d-4088-993f-7a1d4be3835a",
"sync_status": [
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",

How can I debug what is going wrong?
I triet to dig into the logs and see a lot of these messages:
2021-09-17 14:06:04.144 7f755b4e7700  1 civetweb: 0x5641a22b33a8:
IPV6_OF_OUR_HAPROXY - - [17/Sep/2021:14:06:04 +] "GET
/admin/log/?type=metadata&status&rgwx-zonegroup=da651dc1-2663-4e1b-af2e-ac4454f24c9d
HTTP/1.1" 403 439 - -
2021-09-17 14:06:11.646 7f755f4ef700  1 civetweb: 0x5641a22ae4e8:
IPV6_OF_OUR_HAPROXY - - [17/Sep/2021:14:06:11 +] "POST
/admin/realm/period?period=e8fc96f1-ae86-4dc1-b432-470b0772fded&epoch=71&rgwx-zonegroup=da651dc1-2663-4e1b-af2e-ac4454f24c9d
HTTP/1.1" 403 439 - -

The 403 status makes me think I might have an access problem, but pulling
the realm/period from the master was successful. Also the period commit
from the new cluster worked fine.
-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] radosgw find buckets which use the s3website feature

2021-09-16 Thread Boris Behrens
Hi people,

is there a way to find bucket that use the s3website feature?

Cheers
 Boris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Questions about multiple zonegroups (was Problem with multi zonegroup configuration)

2021-09-15 Thread Boris Behrens
Ok, I think I found the basic problem.
I used to talk to the endpoint that is also the Domain for the s3websites.

After switching the domains around everything worked fine. :partyemote:
I have wrote down what I think how things work together (wrote down here
IYAI https://pastebin.com/6Gj9Q5hJ), and I got three additional questions:
* how do I "pull" a zone to another storage cluster?
* how to make the syncing user more secure?
* how to limit users to a specific zone or zonegroup?

cheers :)
-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Problem with multi zonegroup configuration

2021-09-13 Thread Boris Behrens
Someone got any ideas?
I am not even sure if I am thinking it correctly.

I only want to have users and bucketname synced, so they are unique. But
not the data. I don't want to have redundance.

The documentation reads like I need multiple zonegroups with a single zone
each.

Am Mo., 13. Sept. 2021 um 11:47 Uhr schrieb Boris Behrens :

> Dear ceph community,
>
> I am still stuck with the multi zonegroup configuration. I did these steps:
> 1. Create realm (company), zonegroup(eu), zone(eu-central-1), sync user on
> the site fra1
> 2. Pulled the realm and the period in fra2
> 3. Creted the zonegroup(eu-central-2), zone (eu-central-2), modified zone
> (eu-centrla-2)
>with the credentials of the sunc user on the site fra2.
> 4. Did a 'period update --commit' and 'metadata sync init; metadata sync
> run' on the site fra2.
>
> Syncing now seem to work. If I create a user it will be synced. If the
> user creates a bucket,
> this also gets synced, without data (I don't want to sync data. Only
> metadata).
>
> But I still have some issues with working with these clusters. I am not
> able to upload any data.
> If I try to list bucket, I receive "NoSuchBucket".
>
> I currently think it is a configuration problem with mit period and
> ceph.conf
>
> Down below:
> * The output from s3cmd
> * my s3cmd config
> * radosgw-admin period get
> * ceph.conf (fra1/fra2)
>
> ##
> [workstation]# s3cmd --config ~/.s3cfg_testing_fra1 la
> ERROR: Error parsing xml: no element found: line 9, column 0
> ERROR: b'\n 404 Not Found\n \n
>  404 Not Found\n  \n   Code: NoSuchBucket\n
> RequestId: tx0130d0071-00613f1c58-69a6e-eu-central-1\n
>   HostId: 69a6e-eu-central-1-eu\n'
> ERROR: S3 error: 404 (Not Found)
>
> ##
> [workstation]# cat ~/.s3cfg_testing_fra1
> [default]
> access_key = 
> bucket_location = eu-central-1
> host_base = eu-central-1.company.dev
> host_bucket = %(bucket)s.eu-central-1.company.dev
> secret_key = Y
> website_endpoint = https://%(bucket)s.eu-central-1.company.dev
>
> ##
> [fra1]# radosgw-admin period get
> {
> "id": "f8aed695-8f57-47dd-a0b9-de847ccc5cb5",
> "epoch": 42,
> "predecessor_uuid": "c748ead2-424a-4209-b183-b0989c8bda0c",
> "sync_status": [],
> "period_map": {
> "id": "f8aed695-8f57-47dd-a0b9-de847ccc5cb5",
> "zonegroups": [
> {
> "id": "61dfe354-bf61-4a08-9e4d-e7a2228cc651",
> "name": "eu-central-2",
> "api_name": "eu-central-2",
> "is_master": "false",
> "endpoints": [
> "https://eu-central-2.company.dev";
> ],
> "hostnames": [
> "eu-central-2.company.dev"
> ],
> "hostnames_s3website": [
> "eu-central-2.company.dev"
> ],
> "master_zone": "aafa8c61-84f0-48f0-a4f1-110306f83bce",
> "zones": [
> {
> "id": "aafa8c61-84f0-48f0-a4f1-110306f83bce",
> "name": "eu-central-2",
> "endpoints": [
> "https://eu-central-2.company.dev";
> ],
> "log_meta": "false",
> "log_data": "false",
> "bucket_index_max_shards": 11,
> "read_only": "false",
> "tier_type": "",
> "sync_from_all": "true",
> "sync_from": [],
> "redirect_zone": ""
> }
> ],
> "placement_targets": [
> {
> "name": "default-placement",
> "tags": [],
> "storage_classes": [
> "STANDARD"
> ]
> }
> ],
> "

[ceph-users] Re: [Suspicious newsletter] Problem with multi zonegroup configuration

2021-09-13 Thread Boris Behrens
I don't want to sync data between zones.
I only want to sync the metadata.

This is meant to have users and buckets unique over multiple datacenter,
but not build a mirror for data.

Am Mo., 13. Sept. 2021 um 13:14 Uhr schrieb Szabo, Istvan (Agoda) <
istvan.sz...@agoda.com>:

> I don't see any sync rule like you want to do directional sync between 2
> zones, no pipe and no flow also.
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
> -Original Message-
> From: Boris Behrens 
> Sent: Monday, September 13, 2021 4:48 PM
> To: ceph-users@ceph.io
> Subject: [Suspicious newsletter] [ceph-users] Problem with multi zonegroup
> configuration
>
> Email received from the internet. If in doubt, don't click any link nor
> open any attachment !
> 
>
> Dear ceph community,
>
> I am still stuck with the multi zonegroup configuration. I did these steps:
> 1. Create realm (company), zonegroup(eu), zone(eu-central-1), sync user on
> the site fra1 2. Pulled the realm and the period in fra2 3. Creted the
> zonegroup(eu-central-2), zone (eu-central-2), modified zone
> (eu-centrla-2)
>with the credentials of the sunc user on the site fra2.
> 4. Did a 'period update --commit' and 'metadata sync init; metadata sync
> run' on the site fra2.
>
> Syncing now seem to work. If I create a user it will be synced. If the
> user creates a bucket, this also gets synced, without data (I don't want to
> sync data. Only metadata).
>
> But I still have some issues with working with these clusters. I am not
> able to upload any data.
> If I try to list bucket, I receive "NoSuchBucket".
>
> I currently think it is a configuration problem with mit period and
> ceph.conf
>
> Down below:
> * The output from s3cmd
> * my s3cmd config
> * radosgw-admin period get
> * ceph.conf (fra1/fra2)
>
> ##
> [workstation]# s3cmd --config ~/.s3cfg_testing_fra1 la
> ERROR: Error parsing xml: no element found: line 9, column 0
> ERROR: b'\n 404 Not Found\n \n
>  404 Not Found\n  \n   Code: NoSuchBucket\n
> RequestId: tx0130d0071-00613f1c58-69a6e-eu-central-1\n
>   HostId: 69a6e-eu-central-1-eu\n'
> ERROR: S3 error: 404 (Not Found)
>
> ##
> [workstation]# cat ~/.s3cfg_testing_fra1 [default] access_key =
>  bucket_location = eu-central-1 host_base =
> eu-central-1.company.dev host_bucket = %(bucket)s.eu-central-1.company.dev
> secret_key = Y
> website_endpoint = https://%(bucket)s.eu-central-1.company.dev
>
> ##
> [fra1]# radosgw-admin period get
> {
> "id": "f8aed695-8f57-47dd-a0b9-de847ccc5cb5",
> "epoch": 42,
> "predecessor_uuid": "c748ead2-424a-4209-b183-b0989c8bda0c",
> "sync_status": [],
> "period_map": {
> "id": "f8aed695-8f57-47dd-a0b9-de847ccc5cb5",
> "zonegroups": [
> {
> "id": "61dfe354-bf61-4a08-9e4d-e7a2228cc651",
> "name": "eu-central-2",
> "api_name": "eu-central-2",
> "is_master": "false",
> "endpoints": [
> "https://eu-central-2.company.dev";
> ],
> "hostnames": [
> "eu-central-2.company.dev"
> ],
> "hostnames_s3website": [
> "eu-central-2.company.dev"
> ],
> "master_zone": "aafa8c61-84f0-48f0-a4f1-110306f83bce",
> "zones": [
> {
> "id": "aafa8c61-84f0-48f0-a4f1-110306f83bce",
> "name": "eu-central-2",
> "endpoints": [
> "https://eu-central-2.company.dev";
> ],
> "log_meta": "false",
> "log_data": "false",
> "bucket_index_max_shards": 11,
> "read_only": "false",
> "tier_type": "",
> "sync_fr

[ceph-users] Problem with multi zonegroup configuration

2021-09-13 Thread Boris Behrens
Dear ceph community,

I am still stuck with the multi zonegroup configuration. I did these steps:
1. Create realm (company), zonegroup(eu), zone(eu-central-1), sync user on
the site fra1
2. Pulled the realm and the period in fra2
3. Creted the zonegroup(eu-central-2), zone (eu-central-2), modified zone
(eu-centrla-2)
   with the credentials of the sunc user on the site fra2.
4. Did a 'period update --commit' and 'metadata sync init; metadata sync
run' on the site fra2.

Syncing now seem to work. If I create a user it will be synced. If the user
creates a bucket,
this also gets synced, without data (I don't want to sync data. Only
metadata).

But I still have some issues with working with these clusters. I am not
able to upload any data.
If I try to list bucket, I receive "NoSuchBucket".

I currently think it is a configuration problem with mit period and
ceph.conf

Down below:
* The output from s3cmd
* my s3cmd config
* radosgw-admin period get
* ceph.conf (fra1/fra2)

##
[workstation]# s3cmd --config ~/.s3cfg_testing_fra1 la
ERROR: Error parsing xml: no element found: line 9, column 0
ERROR: b'\n 404 Not Found\n \n
 404 Not Found\n  \n   Code: NoSuchBucket\n
RequestId: tx0130d0071-00613f1c58-69a6e-eu-central-1\n
  HostId: 69a6e-eu-central-1-eu\n'
ERROR: S3 error: 404 (Not Found)

##
[workstation]# cat ~/.s3cfg_testing_fra1
[default]
access_key = 
bucket_location = eu-central-1
host_base = eu-central-1.company.dev
host_bucket = %(bucket)s.eu-central-1.company.dev
secret_key = Y
website_endpoint = https://%(bucket)s.eu-central-1.company.dev

##
[fra1]# radosgw-admin period get
{
"id": "f8aed695-8f57-47dd-a0b9-de847ccc5cb5",
"epoch": 42,
"predecessor_uuid": "c748ead2-424a-4209-b183-b0989c8bda0c",
"sync_status": [],
"period_map": {
"id": "f8aed695-8f57-47dd-a0b9-de847ccc5cb5",
"zonegroups": [
{
"id": "61dfe354-bf61-4a08-9e4d-e7a2228cc651",
"name": "eu-central-2",
"api_name": "eu-central-2",
"is_master": "false",
"endpoints": [
"https://eu-central-2.company.dev";
],
"hostnames": [
"eu-central-2.company.dev"
],
"hostnames_s3website": [
"eu-central-2.company.dev"
],
"master_zone": "aafa8c61-84f0-48f0-a4f1-110306f83bce",
"zones": [
{
"id": "aafa8c61-84f0-48f0-a4f1-110306f83bce",
"name": "eu-central-2",
"endpoints": [
"https://eu-central-2.company.dev";
],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 11,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": [],
"storage_classes": [
"STANDARD"
]
}
],
"default_placement": "default-placement",
"realm_id": "be137deb-1072-447c-bd96-def84626872f"
},
{
"id": "b65bbdfd-0555-43eb-9365-8bc72df2efd5",
"name": "eu",
"api_name": "eu",
"is_master": "true",
"endpoints": [
"https://eu-central-1.company.dev";
],
"hostnames": [
"eu-central-1.company.dev"
],
"hostnames_s3website": [
"eu-central-1.company.dev"
],
"master_zone": "6afad715-c0e1-4100-9db2-98ed31de0123",
"zones": [
{
"id": "6afad715-c0e1-4100-9db2-98ed31de0123",
"name": "eu-central-1",
"endpoints": [
"https://eu-central-1.company.dev";
],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
}
],
"placement_targets": [
{

[ceph-users] Re: [Suspicious newsletter] Re: create a Multi-zone-group sync setup

2021-08-17 Thread Boris Behrens
Yes,
I want to open up a new DC where people can store their objects, but I want
the bucket names and users unique over both DC.
After some reading I found that I need one realm with multiple zonegroups,
each containing only one zone.

No sync of actual user data, but metadata like users or used bucket names.

So I created a test setup which contains three servers on each side, each
server is used for mon,mgr,osd,radosgw.
One is a nautilus installation (the master) and the other is a octopus
installation.

I've set up realm,first zonegroup with the zone and a sync user in the
master setup, and commited.
Then I've pulled the periode on the 2nd setup and added a 2nd zonegroup
with a zone and commited.

Now I can create users in the master setup, but not in the 2nd (as it
doesn't sync back). But I am not able to create a bucket or so with the
credentials of the users I created.

Am Mi., 18. Aug. 2021 um 06:08 Uhr schrieb Szabo, Istvan (Agoda) <
istvan.sz...@agoda.com>:

> Hi,
>
> " but have a global namespace where all buckets and users are uniqe."
>
> You mean manage multiple cluster from 1 "master" cluster but ono sync? So
> 1 realm, multiple dc BUT no sync?
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> -------
>
> -Original Message-
> From: Boris Behrens 
> Sent: Tuesday, August 17, 2021 8:51 PM
> To: ceph-users@ceph.io
> Subject: [Suspicious newsletter] [ceph-users] Re: create a
> Multi-zone-group sync setup
>
> Email received from the internet. If in doubt, don't click any link nor
> open any attachment !
> 
>
> Hi, after some trial and error I got it working, so users will get synced.
>
> However, If I try to create a bucket via s3cmd I receive the following
> error:
> s3cmd --access_key=XX --secret_key=YY --host=HOST mb s3://test
> ERROR: S3 error: 403 (InvalidAccessKeyId)
>
> When I try the same with ls I just get an empty response (because there
> are no buckets to list).
>
> I get this against both radosgw locations.
> I have an nginx in between the internet and radosgw that will just proxy
> pass every address and sets host and x-forwarded-for header.
>
>
> Am Fr., 30. Juli 2021 um 16:46 Uhr schrieb Boris Behrens :
>
> > Hi people,
> >
> > I try to create a Multi-zone-group setup (like it is described here:
> > https://docs.ceph.com/en/latest/radosgw/multisite/)
> >
> > But I simply fail.
> >
> > I just created a testcluster to mess with it, and no matter how I try to.
> >
> > Is there a howto avaialable?
> >
> > I don't want to get a multi-zone setup, where I sync the actual zone
> > data, but have a global namespace where all buckets and users are uniqe.
> >
> > Cheers
> >  Boris
> >
> > --
> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> > im groüen Saal.
> >
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to ceph-users-le...@ceph.io
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: create a Multi-zone-group sync setup

2021-08-17 Thread Boris Behrens
Hi, after some trial and error I got it working, so users will get synced.

However, If I try to create a bucket via s3cmd I receive the following
error:
s3cmd --access_key=XX --secret_key=YY --host=HOST mb s3://test
ERROR: S3 error: 403 (InvalidAccessKeyId)

When I try the same with ls I just get an empty response (because there are
no buckets to list).

I get this against both radosgw locations.
I have an nginx in between the internet and radosgw that will just proxy
pass every address and sets host and x-forwarded-for header.


Am Fr., 30. Juli 2021 um 16:46 Uhr schrieb Boris Behrens :

> Hi people,
>
> I try to create a Multi-zone-group setup (like it is described here:
> https://docs.ceph.com/en/latest/radosgw/multisite/)
>
> But I simply fail.
>
> I just created a testcluster to mess with it, and no matter how I try to.
>
> Is there a howto avaialable?
>
> I don't want to get a multi-zone setup, where I sync the actual zone data,
> but have a global namespace where all buckets and users are uniqe.
>
> Cheers
>  Boris
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Discard / Trim does not shrink rbd image size when disk is partitioned

2021-08-13 Thread Boris Behrens
Hi Janne,
thanks for the hint. I was aware of that, but it is goot to add that
knowledge to the question for further googlesearcher.

Hi Ilya,
that fixed it. Do we know why the discard does not work when the partition
table is not aligned? We provide OS templates to our customer, but they can
also create and attach an empty block device, and they will certainly not
check if the partitions are aligned correctly.

Cheers
 Boris


Am Fr., 13. Aug. 2021 um 08:44 Uhr schrieb Janne Johansson <
icepic...@gmail.com>:

> Den tors 12 aug. 2021 kl 17:04 skrev Boris Behrens :
> > Hi everybody,
> > we just stumbled over a problem where the rbd image does not shrink, when
> > files are removed.
> > This only happenes when the rbd image is partitioned.
> >
> > * We tested it with centos8/ubuntu20.04 with ext4 and a gpt partition
> table
> > (/boot and /)
> > * the kvm device is virtio-scsi-pci with krbd
> > * Mount option discard is set
> > * command to create large file: dd if=/dev/zero of=testfile bs=64M
> > count=1000
> > * the image grows in the size we expect
> > * when we remove the testfile the rbd image stays at the size
> > * we wen recreate the deleted file with the command the rbd image grows
> > further
>
> Just a small nit on this single point, regardless of if trim/discard
> works or not:
> There is no guarantee that writing a file, removing it and then
> re-writing a file
> will ever end up in the same spot again. In fact, most modern filesystems
> will
> probably make sure to NOT place things at the same spot again.
> Since the second write ends up in a different place, it will once again
> expand
> your sparse/thin image by the amount of written bytes, this is very much
> to be expected.
>
> I'm sorry if you already knew this and I am just stating the obvious to
> you, but
> your text came over as if you expected the second write to not increase the
> image since that "space" was already blown up on the first write.
>
> Trim/discard should still be investigated so you can make it shrink back
> again somehow, just wanted to point this out for the records.
>
>
> --
> May the most significant bit of your life be positive.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Discard / Trim does not shrink rbd image size when disk is partitioned

2021-08-12 Thread Boris Behrens
Hi everybody,

we just stumbled over a problem where the rbd image does not shrink, when
files are removed.
This only happenes when the rbd image is partitioned.

* We tested it with centos8/ubuntu20.04 with ext4 and a gpt partition table
(/boot and /)
* the kvm device is virtio-scsi-pci with krbd
* Mount option discard is set
* command to create large file: dd if=/dev/zero of=testfile bs=64M
count=1000
* the image grows in the size we expect
* when we remove the testfile the rbd image stays at the size
* we wen recreate the deleted file with the command the rbd image grows
further
* using fstrim does not work
* adding a new disk and initialize the ext4 directly on the disk (wihtout
partitioning) the trim does work and the rbd image shrinks back to a couple
GB
* we use ceph 14.2.21

Does anybody experienced the same issue and maybe know how to solve the
problem?

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] create a Multi-zone-group sync setup

2021-07-30 Thread Boris Behrens
Hi people,

I try to create a Multi-zone-group setup (like it is described here:
https://docs.ceph.com/en/latest/radosgw/multisite/)

But I simply fail.

I just created a testcluster to mess with it, and no matter how I try to.

Is there a howto avaialable?

I don't want to get a multi-zone setup, where I sync the actual zone data,
but have a global namespace where all buckets and users are uniqe.

Cheers
 Boris

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] understanding multisite radosgw syncing

2021-07-27 Thread Boris Behrens
Hi,

I wanted to set up a multisite radosgw environment where only bucketnames
and userinfo should get synced.

Basically I don't want that user data is synced but buckets and userids are
still uniqe inside the zonegroup.

For this I've gone though this howto (
https://docs.ceph.com/en/latest/radosgw/multisite/) and have set the my
zonegroup config to this:

"zones": [
{
"id": "07cdb1c7-8c8e-4a23-ab1e-fcfb88982f38",
"name": "eu-central-2",
"endpoints": [
"https://gw2/";
],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 11,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
},
{
"id": "ff7a8b0c-07e6-463a-861b-78f0adeba8ad",
"name": "eu-central-1",
"endpoints": [
"https://gw1/";
],
"log_meta": "true",
"log_data": "false",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
}

but the sync status on both systems looks like it wants to replicate data:
gw2
  realm 5d6f2ea4-b84a-459b-bce2-bccac338b3ef (world)
  zonegroup da651dc1-2663-4e1b-af2e-ac4454f24c9d (eu)
   zone 07cdb1c7-8c8e-4a23-ab1e-fcfb88982f38 (eu-central-2)
  metadata sync preparing for full sync
full sync: 64/64 shards
full sync: 0 entries to sync
failed to fetch master sync status: (5) Input/output error
2021-07-27T11:24:06.772+ 7f4638c23b40  0 data sync zone:ff7a8b0c ERROR:
failed to fetch datalog info
  data sync source: ff7a8b0c-07e6-463a-861b-78f0adeba8ad (eu-central-1)
init
full sync: 128/128 shards
full sync: 0 buckets to sync
incremental sync: 0/128 shards

gw1
  realm 5d6f2ea4-b84a-459b-bce2-bccac338b3ef (world)
  zonegroup da651dc1-2663-4e1b-af2e-ac4454f24c9d (eu)
   zone ff7a8b0c-07e6-463a-861b-78f0adeba8ad (eu-central-1)
  metadata sync no sync (zone is master)
2021-07-27 11:24:24.645 7fe30fc07840  0 data sync zone:07cdb1c7 ERROR:
failed to fetch datalog info
  data sync source: 07cdb1c7-8c8e-4a23-ab1e-fcfb88982f38 (eu-central-2)
failed to retrieve sync info: (13) Permission denied


The gw2 is not set up yet, so the sync from gw1 will not happen (and I am
still figuring out why gw2 can not pull from gw1, but this is something I
worry later).

Do I need to change the zonegroup config to not have the data synced?

Cheers
 Boris
-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Deleting large objects via s3 API leads to orphan objects

2021-07-27 Thread Boris Behrens
Hello my dear ceph community,

I am now dealing with a lot of orphan objects and today I got the time
to dig into it.
What I basically found is that large objects get removed from radosgw,
but not from rados. This leads to a huge amount of orphan objects.

I"ve found this RH bug from last year
(https://bugzilla.redhat.com/show_bug.cgi?id=1844720), and would like
to know if there is any workaround.

Together with this bug (https://tracker.ceph.com/issues/50293) it
makes it very tedious to search for those objects, because we nearly
have a constant ingress of data in some of those buckets.

We are currently running 14.2.21 through the board.

Cheers
 Boris

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-23 Thread Boris Behrens
Hi Dan,
hi Rafael,

we found the issue.
It was a cleanup script that didn't work correctly.
Basically it removed files via rados and the bucket index didn't update.

Thank you a lot for your help. (will also close the bug on the ceph tracker)

Am Fr., 23. Juli 2021 um 01:16 Uhr schrieb Rafael Lopez
:
>
> Thanks for further clarification Dan.
>
> Boris, if you have a test/QA environment on the same code as production, you 
> can confirm if the problem is as above. Do NOT do this in production - if the 
> problem exists it might result in losing production data.
>
> 1. Upload large S3 object that would take 10+ seconds to download (several GB)
> 2. Download object to ensure it is working
> 3. Set "rgw_gc_obj_min_wait" to very low value (2-3 seconds)
> 4. Download object
>
> Step (4) may succeed, but run this:
> `radosgw-admin gc list`
>
> And check for shadow objects associated with the S3 object.
>
> Once the garbage collection completes, you will get the 404 NoSuchKey return 
> when you try to download the S3 object, although it will still be listed as 
> an object in the bucket.
> Also recommend setting the "rgw_gc_obj_min_wait" back to a high value after 
> you finish testing.
>
> On Thu, 22 Jul 2021 at 19:45, Dan van der Ster  wrote:
>>
>> Boris,
>>
>> To check if your issue is related to Rafael's, could you check your
>> access logs for requests on the missing objects which lasted longer
>> than one hour?
>>
>> I ask because Nautilus also has rgw_gc_obj_min_wait (2hr by default),
>> which is the main config option related to
>> https://tracker.ceph.com/issues/47866
>>
>>
>> -- Dan
>>
>> On Thu, Jul 22, 2021 at 11:12 AM Dan van der Ster  
>> wrote:
>> >
>> > Hi Rafael,
>> >
>> > AFAIU, that gc issue was not relevant for N -- the bug is in the new
>> > rgw_gc code which landed in Octopus and was not backported to N.
>> >
>> > Well, RHCEPH had the new rgw_gc cls backported to it, and RHCEPH has
>> > the bugfix you refer to:
>> > * Wed Dec 02 2020 Ceph Jenkins  2:14.2.11-86
>> > - rgw: during GC defer, prevent new GC enqueue (rhbz#1892644)
>> > https://bugzilla.redhat.com/show_bug.cgi?id=1892644
>> >
>> > But still, I think it shouldn't apply to the upstream community
>> > Nautilus that we run.
>> >
>> > That said, this indeed looks really similar so perhaps Nautilus has
>> > similar faulty gc logic.
>> >
>> > Cheers, Dan
>> >
>> > On Thu, Jul 22, 2021 at 6:47 AM Rafael Lopez  
>> > wrote:
>> > >
>> > > hi boris,
>> > >
>> > > We hit an issue late last year that sounds similar to what you are 
>> > > experiencing. I am not sure if the fix was backported to nautilus, I 
>> > > can't see any reference to a nautilus backport so it's possible it was 
>> > > only backported to octopus (15.x), exception being red hat ceph nautilus.
>> > >
>> > > https://tracker.ceph.com/issues/47866?next_issue_id=48255#note-59
>> > > https://www.mail-archive.com/ceph-users@ceph.io/msg05312.html
>> > >
>> > > Basically, a read request on a s3/swift object that took a very long 
>> > > time to complete would cause the associated rados data objects to be put 
>> > > in the GC queue, but the head object would still be present. So the s3 
>> > > object would still show as present, `rados bi list` would show it (since 
>> > > head object was present) but the data objects would be gone, resulting 
>> > > in 404 NoSuchKey when retrieving the object.
>> > >
>> > > raf
>> > >
>> > > On Wed, 21 Jul 2021 at 18:12, Boris Behrens  wrote:
>> > >>
>> > >> Good morning everybody,
>> > >>
>> > >> we've dug further into it but still don't know how this could happen.
>> > >> What we ruled out for now:
>> > >> * Orphan objects cleanup process.
>> > >> ** There is only one bucket with missing data (I checked all other
>> > >> buckets yesterday)
>> > >> ** The "keep this files" list is generated by radosgw-admin bukcet
>> > >> rados list. I would doubt that there were files listed, that are
>> > >> accessible via radosgw
>> > >> ** The deleted files are somewhat random, but always with their
>> > >> corresponding counterparts (per folder there are 2-3 files tha

[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-21 Thread Boris Behrens
Good morning everybody,

we've dug further into it but still don't know how this could happen.
What we ruled out for now:
* Orphan objects cleanup process.
** There is only one bucket with missing data (I checked all other
buckets yesterday)
** The "keep this files" list is generated by radosgw-admin bukcet
rados list. I would doubt that there were files listed, that are
accessible via radosgw
** The deleted files are somewhat random, but always with their
corresponding counterparts (per folder there are 2-3 files that belong
together)

* Customer remove his data, but radosgw didn't clean up the bucket index
** there are no delete requests in the buckets usage log.
** customer told us, that they do not have a delete job for this bucket

So I am lost with ideas that I could check, and hope that you people
might be able to help with further ideas.




-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-19 Thread Boris Behrens
Hi Dan,
unfortunally there are no versioned objects in the bucket. All objects
are type plain.
I will create a bug ticket.

Am Mo., 19. Juli 2021 um 18:35 Uhr schrieb Dan van der Ster
:
>
> Here's a recipe, from the when I had the same question:
>
>
> "[ceph-users] Re: rgw index shard much larger than others - ceph-users - 
> lists.ceph.io" 
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/MO7IHRGJ7TGPKT3GXCKMFLR674G3YGUX/
>
> On Mon, 19 Jul 2021, 18:00 Boris Behrens,  wrote:
>>
>> Hi Dan,
>> how do I find out if a bucket got versioning enabled?
>>
>> Am Mo., 19. Juli 2021 um 17:00 Uhr schrieb Dan van der Ster
>> :
>> >
>> > Hi Boris,
>> >
>> > Does the bucket have object versioning enabled?
>> > We saw something like this once a while ago: `s3cmd ls` showed an
>> > entry for an object, but when we tried to get it we had 404.
>> > We didn't find a good explanation in the end -- our user was able to
>> > re-upload the object and it didn't recur so we didn't debug further.
>> >
>> > I suggest you open a ticket in the tracker with all the evidence so
>> > the developers can help diagnose the problem.
>> >
>> > Cheers, Dan
>> >
>> >
>> > On Fri, Jul 16, 2021 at 6:45 PM Boris Behrens  wrote:
>> > >
>> > > Hi Jean-Sebastien,
>> > >
>> > > I have the exact opposite. Files can be listed (the are in the bucket
>> > > index), but are not available anymore.
>> > >
>> > > Am Fr., 16. Juli 2021 um 18:41 Uhr schrieb Jean-Sebastien Landry
>> > > :
>> > > >
>> > > > Hi Boris, I don't have any answer for you, but I have situation similar
>> > > > to yours.
>> > > >
>> > > > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/7E6O6ILGE5JCI4ISU66HZ6ZVZP6N6T3M/
>> > > >
>> > > > I didn't try radoslist, I should have.
>> > > >
>> > > > Is this new, or it just that the client realised this lately?
>> > > > All the data seems missing or just some paths?
>> > > > Did you reshard lately?
>> > > > Did you test using client programs like s3cmd & rclone...?
>> > > >
>> > > > I didn't have time to work on that this week, but I have to find a
>> > > > solution too.
>> > > > Meanwhile, I run with a lower shard number and my customer can access
>> > > > all his data.
>> > > > Cheers!
>> > > >
>> > > > On 7/16/21 11:36 AM, Boris Behrens wrote:
>> > > > > [Externe UL*]
>> > > > >
>> > > > > Hi everybody,
>> > > > > a customer mentioned that he got problems in accessing hist rgw data.
>> > > > > I checked the bucket index and the file should be available. Then I
>> > > > > pulled a list with radosgw-admin radoslist --bucket BUCKET and it
>> > > > > seems that the file is gone.
>> > > > >
>> > > > > beside the "yaiks, is there a way the file might be somewhere else in
>> > > > > ceph?" how can this happen?
>> > > > >
>> > > > > We do occational orphan objects cleanups but this does not pull the
>> > > > > bucket index into account.
>> > > > >
>> > > > > It is a large bucket with 2.1m files in it and with 34 shards.
>> > > > >
>> > > > > Cheers and happy weekend
>> > > > >   Boris
>> > > > >
>> > > > > --
>> > > > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
>> > > > > im groüen Saal.
>> > > > > ___
>> > > > > ceph-users mailing list -- ceph-users@ceph.io
>> > > > > To unsubscribe send an email to ceph-users-le...@ceph.io
>> > > > > *ATTENTION : L’émetteur de ce courriel est externe à l’Université 
>> > > > > Laval.
>> > > > > Évitez de cliquer sur un hyperlien, d’ouvrir une pièce jointe ou de 
>> > > > > transmettre des informations si vous ne connaissez pas l’expéditeur 
>> > > > > du courriel. En cas de doute, contactez l’équipe de soutien 
>> > > > > informatique de votre unité ou hameconn...@ulaval.ca.
>> > > > > 
>> > > > >
>> > > > ___
>> > > > ceph-users mailing list -- ceph-users@ceph.io
>> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
>> > >
>> > >
>> > >
>> > > --
>> > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
>> > > im groüen Saal.
>> > > ___
>> > > ceph-users mailing list -- ceph-users@ceph.io
>> > > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>>
>> --
>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
>> im groüen Saal.



-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-19 Thread Boris Behrens
Hi Dan,
how do I find out if a bucket got versioning enabled?

Am Mo., 19. Juli 2021 um 17:00 Uhr schrieb Dan van der Ster
:
>
> Hi Boris,
>
> Does the bucket have object versioning enabled?
> We saw something like this once a while ago: `s3cmd ls` showed an
> entry for an object, but when we tried to get it we had 404.
> We didn't find a good explanation in the end -- our user was able to
> re-upload the object and it didn't recur so we didn't debug further.
>
> I suggest you open a ticket in the tracker with all the evidence so
> the developers can help diagnose the problem.
>
> Cheers, Dan
>
>
> On Fri, Jul 16, 2021 at 6:45 PM Boris Behrens  wrote:
> >
> > Hi Jean-Sebastien,
> >
> > I have the exact opposite. Files can be listed (the are in the bucket
> > index), but are not available anymore.
> >
> > Am Fr., 16. Juli 2021 um 18:41 Uhr schrieb Jean-Sebastien Landry
> > :
> > >
> > > Hi Boris, I don't have any answer for you, but I have situation similar
> > > to yours.
> > >
> > > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/7E6O6ILGE5JCI4ISU66HZ6ZVZP6N6T3M/
> > >
> > > I didn't try radoslist, I should have.
> > >
> > > Is this new, or it just that the client realised this lately?
> > > All the data seems missing or just some paths?
> > > Did you reshard lately?
> > > Did you test using client programs like s3cmd & rclone...?
> > >
> > > I didn't have time to work on that this week, but I have to find a
> > > solution too.
> > > Meanwhile, I run with a lower shard number and my customer can access
> > > all his data.
> > > Cheers!
> > >
> > > On 7/16/21 11:36 AM, Boris Behrens wrote:
> > > > [Externe UL*]
> > > >
> > > > Hi everybody,
> > > > a customer mentioned that he got problems in accessing hist rgw data.
> > > > I checked the bucket index and the file should be available. Then I
> > > > pulled a list with radosgw-admin radoslist --bucket BUCKET and it
> > > > seems that the file is gone.
> > > >
> > > > beside the "yaiks, is there a way the file might be somewhere else in
> > > > ceph?" how can this happen?
> > > >
> > > > We do occational orphan objects cleanups but this does not pull the
> > > > bucket index into account.
> > > >
> > > > It is a large bucket with 2.1m files in it and with 34 shards.
> > > >
> > > > Cheers and happy weekend
> > > >   Boris
> > > >
> > > > --
> > > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> > > > im groüen Saal.
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > > *ATTENTION : L’émetteur de ce courriel est externe à l’Université Laval.
> > > > Évitez de cliquer sur un hyperlien, d’ouvrir une pièce jointe ou de 
> > > > transmettre des informations si vous ne connaissez pas l’expéditeur du 
> > > > courriel. En cas de doute, contactez l’équipe de soutien informatique 
> > > > de votre unité ou hameconn...@ulaval.ca.
> > > > 
> > > >
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> >
> > --
> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> > im groüen Saal.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-19 Thread Boris Behrens
Does someone got an idea, how this could happen?

* The files are present in the output of "radosgw-admin bi list --bucket BUCKET"
* The files are missing in the output of "radosgw-admin bucket
radoslist --bucket BUCKET"
* I have strange shadow objects that doesn't seem to have a filename
(_shadow_.Sxj4BEhZS6PZg1HhsvSeqJM4Y0wRCto_4)

It doesn't seem to be a  careless "rados -p POOL rm OBJECT" because
then it should be still in the "radosgw-admin bucket radoslist
--bucket BUCKET" output. (just tested that on a testbucket).

Am Fr., 16. Juli 2021 um 17:36 Uhr schrieb Boris Behrens :
>
> Hi everybody,
> a customer mentioned that he got problems in accessing hist rgw data.
> I checked the bucket index and the file should be available. Then I
> pulled a list with radosgw-admin radoslist --bucket BUCKET and it
> seems that the file is gone.
>
> beside the "yaiks, is there a way the file might be somewhere else in
> ceph?" how can this happen?
>
> We do occational orphan objects cleanups but this does not pull the
> bucket index into account.
>
> It is a large bucket with 2.1m files in it and with 34 shards.
>
> Cheers and happy weekend
>  Boris
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> im groüen Saal.



-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-19 Thread Boris Behrens
Just digging: I have a ton of iles in the radosgw-admin bucket radoslist
 output that looks like
ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.LRSp5qOg4cDn2ImWxeXtJlRvfLNZ-8R_1
ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_1
ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_2
ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_3
ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_4
ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_5

What are those files? o0


Am Sa., 17. Juli 2021 um 22:54 Uhr schrieb Boris Behrens :
>
> Hi k,
>
> all systems run 14.2.21
>
> Cheers
>  Boris
>
> Am Sa., 17. Juli 2021 um 22:12 Uhr schrieb Konstantin Shalygin 
> :
> >
> > Boris, what is your Ceph version?
> >
> >
> > k
> >
> > On 17 Jul 2021, at 11:04, Boris Behrens  wrote:
> >
> > I really need help with this issue.
> >
> >
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> im groüen Saal.



--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-17 Thread Boris Behrens
Hi k,

all systems run 14.2.21

Cheers
 Boris

Am Sa., 17. Juli 2021 um 22:12 Uhr schrieb Konstantin Shalygin :
>
> Boris, what is your Ceph version?
>
>
> k
>
> On 17 Jul 2021, at 11:04, Boris Behrens  wrote:
>
> I really need help with this issue.
>
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-17 Thread Boris Behrens
Is it possible to not complete a file upload so the actual file is not
there, but it is listed in the bucket index?

I really need help with this issue.

Am Fr., 16. Juli 2021 um 19:35 Uhr schrieb Boris Behrens :
>
> exactly.
> rados rm wouldn't remove it from the "radosgw-admin bucket radoslist"
> list, correct?
>
> our usage statistics are not really usable because it fluctuates in a
> 200tb range.
>
> I also hope that I find the files in the "rados ls" list, but I don't
> have much hope.
>
> For me it is key to understand how this happeneds and how I can verify
> the integrity of all buckets.
> Dataloss is the worst kind of problem for me.
>
> Am Fr., 16. Juli 2021 um 19:21 Uhr schrieb Jean-Sebastien Landry
> :
> >
> > Ok, so everything looks normal from the sysadmin "bi list" & the
> > customer "s3cmd ls" views, except that the GET give a 404 NoSuchKey?
> >
> >  > Is there way to remove a file from a bucket without removing it from
> > the bucketindex?
> >
> > using rados rm probably, but from the customer ends, I hope not.
> >
> > Do you have any usage stats that can confirm that the data has been
> > deleted and/or are still there. (at the pool level maybe?)
> > Hopping for you that it's just a data/index/shard mismatch...
> >
> >
> > On 7/16/21 12:44 PM, Boris Behrens wrote:
> > > [Externe UL*]
> > >
> > > Hi Jean-Sebastien,
> > >
> > > I have the exact opposite. Files can be listed (the are in the bucket
> > > index), but are not available anymore.
> > >
> > > Am Fr., 16. Juli 2021 um 18:41 Uhr schrieb Jean-Sebastien Landry
> > > :
> > >> Hi Boris, I don't have any answer for you, but I have situation similar
> > >> to yours.
> > >>
> > >> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/7E6O6ILGE5JCI4ISU66HZ6ZVZP6N6T3M/
> > >>
> > >> I didn't try radoslist, I should have.
> > >>
> > >> Is this new, or it just that the client realised this lately?
> > >> All the data seems missing or just some paths?
> > >> Did you reshard lately?
> > >> Did you test using client programs like s3cmd & rclone...?
> > >>
> > >> I didn't have time to work on that this week, but I have to find a
> > >> solution too.
> > >> Meanwhile, I run with a lower shard number and my customer can access
> > >> all his data.
> > >> Cheers!
> > >>
> > >> On 7/16/21 11:36 AM, Boris Behrens wrote:
> > >>> [Externe UL*]
> > >>>
> > >>> Hi everybody,
> > >>> a customer mentioned that he got problems in accessing hist rgw data.
> > >>> I checked the bucket index and the file should be available. Then I
> > >>> pulled a list with radosgw-admin radoslist --bucket BUCKET and it
> > >>> seems that the file is gone.
> > >>>
> > >>> beside the "yaiks, is there a way the file might be somewhere else in
> > >>> ceph?" how can this happen?
> > >>>
> > >>> We do occational orphan objects cleanups but this does not pull the
> > >>> bucket index into account.
> > >>>
> > >>> It is a large bucket with 2.1m files in it and with 34 shards.
> > >>>
> > >>> Cheers and happy weekend
> > >>>Boris
> > >>>
> > >>> --
> > >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> > >>> im groüen Saal.
> > >>> ___
> > >>> ceph-users mailing list -- ceph-users@ceph.io
> > >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> > >>> 
> > >>>
> > >> ___
> > >> ceph-users mailing list -- ceph-users@ceph.io
> > >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> > >
> > > --
> > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> > > im groüen Saal.
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > 
> > >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> im groüen Saal.



-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-17 Thread Boris Behrens
Am Fr., 16. Juli 2021 um 19:35 Uhr schrieb Boris Behrens :
>
> exactly.
> rados rm wouldn't remove it from the "radosgw-admin bucket radoslist"
> list, correct?
>
> our usage statistics are not really usable because it fluctuates in a
> 200tb range.
>
> I also hope that I find the files in the "rados ls" list, but I don't
> have much hope.
>
> For me it is key to understand how this happeneds and how I can verify
> the integrity of all buckets.
> Dataloss is the worst kind of problem for me.
>
> Am Fr., 16. Juli 2021 um 19:21 Uhr schrieb Jean-Sebastien Landry
> :
> >
> > Ok, so everything looks normal from the sysadmin "bi list" & the
> > customer "s3cmd ls" views, except that the GET give a 404 NoSuchKey?
> >
> >  > Is there way to remove a file from a bucket without removing it from
> > the bucketindex?
> >
> > using rados rm probably, but from the customer ends, I hope not.
> >
> > Do you have any usage stats that can confirm that the data has been
> > deleted and/or are still there. (at the pool level maybe?)
> > Hopping for you that it's just a data/index/shard mismatch...
> >
> >
> > On 7/16/21 12:44 PM, Boris Behrens wrote:
> > > [Externe UL*]
> > >
> > > Hi Jean-Sebastien,
> > >
> > > I have the exact opposite. Files can be listed (the are in the bucket
> > > index), but are not available anymore.
> > >
> > > Am Fr., 16. Juli 2021 um 18:41 Uhr schrieb Jean-Sebastien Landry
> > > :
> > >> Hi Boris, I don't have any answer for you, but I have situation similar
> > >> to yours.
> > >>
> > >> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/7E6O6ILGE5JCI4ISU66HZ6ZVZP6N6T3M/
> > >>
> > >> I didn't try radoslist, I should have.
> > >>
> > >> Is this new, or it just that the client realised this lately?
> > >> All the data seems missing or just some paths?
> > >> Did you reshard lately?
> > >> Did you test using client programs like s3cmd & rclone...?
> > >>
> > >> I didn't have time to work on that this week, but I have to find a
> > >> solution too.
> > >> Meanwhile, I run with a lower shard number and my customer can access
> > >> all his data.
> > >> Cheers!
> > >>
> > >> On 7/16/21 11:36 AM, Boris Behrens wrote:
> > >>> [Externe UL*]
> > >>>
> > >>> Hi everybody,
> > >>> a customer mentioned that he got problems in accessing hist rgw data.
> > >>> I checked the bucket index and the file should be available. Then I
> > >>> pulled a list with radosgw-admin radoslist --bucket BUCKET and it
> > >>> seems that the file is gone.
> > >>>
> > >>> beside the "yaiks, is there a way the file might be somewhere else in
> > >>> ceph?" how can this happen?
> > >>>
> > >>> We do occational orphan objects cleanups but this does not pull the
> > >>> bucket index into account.
> > >>>
> > >>> It is a large bucket with 2.1m files in it and with 34 shards.
> > >>>
> > >>> Cheers and happy weekend
> > >>>Boris
> > >>>
> > >>> --
> > >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> > >>> im groüen Saal.
> > >>> ___
> > >>> ceph-users mailing list -- ceph-users@ceph.io
> > >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> > >>> 
> > >>>
> > >> ___
> > >> ceph-users mailing list -- ceph-users@ceph.io
> > >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> > >
> > > --
> > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> > > im groüen Saal.
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > 
> > >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> im groüen Saal.



-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: difference between rados ls and radosgw-admin bucket radoslist

2021-07-17 Thread Boris Behrens
I thought that too, but for the orphan objects handling you diff the
lists from rados ls and radosgw-admin bucket radoslist. So there mus
be some kind of difference.

I really need this info to debug a problem we have.

Am Fr., 16. Juli 2021 um 20:10 Uhr schrieb Jean-Sebastien Landry
:
>
> My understanding is that radoslist is the same (or "very like") as rados
> ls, except that it limit the scope to the given bucket.
>
> to be confirmed, I don't want to spread false information, but when you
> do a
> radosgw-admin bucket check --check-objects --fix,
> it rebuild the "bi" from the pool level (rados ls), so I'm not sure the
> bucketindex is "that" much important, knowing that you can rebuilt it
> from the pool. (?)
>
>
>
>
> On 7/16/21 1:47 PM, Boris Behrens wrote:
> > [Externe UL*]
> >
> > Hi,
> > is there a difference between those two?
> > I always thought that radosgw-admin radoslist only shows the objects
> > that are somehow associated with a bucket. But if the bucketindex is
> > broken, would this reflect in the output?
> >
> > --
> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> > im groüen Saal.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> > 
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] difference between rados ls and radosgw-admin bucket radoslist

2021-07-16 Thread Boris Behrens
Hi,
is there a difference between those two?
I always thought that radosgw-admin radoslist only shows the objects
that are somehow associated with a bucket. But if the bucketindex is
broken, would this reflect in the output?

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-16 Thread Boris Behrens
exactly.
rados rm wouldn't remove it from the "radosgw-admin bucket radoslist"
list, correct?

our usage statistics are not really usable because it fluctuates in a
200tb range.

I also hope that I find the files in the "rados ls" list, but I don't
have much hope.

For me it is key to understand how this happeneds and how I can verify
the integrity of all buckets.
Dataloss is the worst kind of problem for me.

Am Fr., 16. Juli 2021 um 19:21 Uhr schrieb Jean-Sebastien Landry
:
>
> Ok, so everything looks normal from the sysadmin "bi list" & the
> customer "s3cmd ls" views, except that the GET give a 404 NoSuchKey?
>
>  > Is there way to remove a file from a bucket without removing it from
> the bucketindex?
>
> using rados rm probably, but from the customer ends, I hope not.
>
> Do you have any usage stats that can confirm that the data has been
> deleted and/or are still there. (at the pool level maybe?)
> Hopping for you that it's just a data/index/shard mismatch...
>
>
> On 7/16/21 12:44 PM, Boris Behrens wrote:
> > [Externe UL*]
> >
> > Hi Jean-Sebastien,
> >
> > I have the exact opposite. Files can be listed (the are in the bucket
> > index), but are not available anymore.
> >
> > Am Fr., 16. Juli 2021 um 18:41 Uhr schrieb Jean-Sebastien Landry
> > :
> >> Hi Boris, I don't have any answer for you, but I have situation similar
> >> to yours.
> >>
> >> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/7E6O6ILGE5JCI4ISU66HZ6ZVZP6N6T3M/
> >>
> >> I didn't try radoslist, I should have.
> >>
> >> Is this new, or it just that the client realised this lately?
> >> All the data seems missing or just some paths?
> >> Did you reshard lately?
> >> Did you test using client programs like s3cmd & rclone...?
> >>
> >> I didn't have time to work on that this week, but I have to find a
> >> solution too.
> >> Meanwhile, I run with a lower shard number and my customer can access
> >> all his data.
> >> Cheers!
> >>
> >> On 7/16/21 11:36 AM, Boris Behrens wrote:
> >>> [Externe UL*]
> >>>
> >>> Hi everybody,
> >>> a customer mentioned that he got problems in accessing hist rgw data.
> >>> I checked the bucket index and the file should be available. Then I
> >>> pulled a list with radosgw-admin radoslist --bucket BUCKET and it
> >>> seems that the file is gone.
> >>>
> >>> beside the "yaiks, is there a way the file might be somewhere else in
> >>> ceph?" how can this happen?
> >>>
> >>> We do occational orphan objects cleanups but this does not pull the
> >>> bucket index into account.
> >>>
> >>> It is a large bucket with 2.1m files in it and with 34 shards.
> >>>
> >>> Cheers and happy weekend
> >>>Boris
> >>>
> >>> --
> >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> >>> im groüen Saal.
> >>> ___
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>> 
> >>>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> > --
> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> > im groüen Saal.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> > 
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-16 Thread Boris Behrens
Hi Jean-Sebastien,

I have the exact opposite. Files can be listed (the are in the bucket
index), but are not available anymore.

Am Fr., 16. Juli 2021 um 18:41 Uhr schrieb Jean-Sebastien Landry
:
>
> Hi Boris, I don't have any answer for you, but I have situation similar
> to yours.
>
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/7E6O6ILGE5JCI4ISU66HZ6ZVZP6N6T3M/
>
> I didn't try radoslist, I should have.
>
> Is this new, or it just that the client realised this lately?
> All the data seems missing or just some paths?
> Did you reshard lately?
> Did you test using client programs like s3cmd & rclone...?
>
> I didn't have time to work on that this week, but I have to find a
> solution too.
> Meanwhile, I run with a lower shard number and my customer can access
> all his data.
> Cheers!
>
> On 7/16/21 11:36 AM, Boris Behrens wrote:
> > [Externe UL*]
> >
> > Hi everybody,
> > a customer mentioned that he got problems in accessing hist rgw data.
> > I checked the bucket index and the file should be available. Then I
> > pulled a list with radosgw-admin radoslist --bucket BUCKET and it
> > seems that the file is gone.
> >
> > beside the "yaiks, is there a way the file might be somewhere else in
> > ceph?" how can this happen?
> >
> > We do occational orphan objects cleanups but this does not pull the
> > bucket index into account.
> >
> > It is a large bucket with 2.1m files in it and with 34 shards.
> >
> > Cheers and happy weekend
> >   Boris
> >
> > --
> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> > im groüen Saal.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> > *ATTENTION : L’émetteur de ce courriel est externe à l’Université Laval.
> > Évitez de cliquer sur un hyperlien, d’ouvrir une pièce jointe ou de 
> > transmettre des informations si vous ne connaissez pas l’expéditeur du 
> > courriel. En cas de doute, contactez l’équipe de soutien informatique de 
> > votre unité ou hameconn...@ulaval.ca.
> > 
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-16 Thread Boris Behrens
Is there way to remove a file from a bucket without removing it from
the bucketindex?

Am Fr., 16. Juli 2021 um 17:36 Uhr schrieb Boris Behrens :
>
> Hi everybody,
> a customer mentioned that he got problems in accessing hist rgw data.
> I checked the bucket index and the file should be available. Then I
> pulled a list with radosgw-admin radoslist --bucket BUCKET and it
> seems that the file is gone.
>
> beside the "yaiks, is there a way the file might be somewhere else in
> ceph?" how can this happen?
>
> We do occational orphan objects cleanups but this does not pull the
> bucket index into account.
>
> It is a large bucket with 2.1m files in it and with 34 shards.
>
> Cheers and happy weekend
>  Boris
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> im groüen Saal.



-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Files listed in radosgw BI but is not available in ceph

2021-07-16 Thread Boris Behrens
Hi everybody,
a customer mentioned that he got problems in accessing hist rgw data.
I checked the bucket index and the file should be available. Then I
pulled a list with radosgw-admin radoslist --bucket BUCKET and it
seems that the file is gone.

beside the "yaiks, is there a way the file might be somewhere else in
ceph?" how can this happen?

We do occational orphan objects cleanups but this does not pull the
bucket index into account.

It is a large bucket with 2.1m files in it and with 34 shards.

Cheers and happy weekend
 Boris

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: best practice balance mode in HAproxy in front of RGW?

2021-05-27 Thread Boris Behrens
Am Do., 27. Mai 2021 um 07:47 Uhr schrieb Janne Johansson :
>
> Den ons 26 maj 2021 kl 16:33 skrev Boris Behrens :
> >
> > Hi Janne,
> > do you know if there can be data duplication which leads to orphan objects?
> >
> > I am currently huntin strange errors (there is a lot more data in the
> > pool, than accessible via rgw) and want to be sure it doesn't come
> > from the HAproxy.
>
> No, I don't think the HAProxy (or any other load balancing setup) in
> itself would
> cause a lot of orphans. Or in reverse, the multipart stateless way S3
> acts always
> allows for half-uploads and broken connections which would leave orphans even
> if you did not have HAProxy in between, and in both cases you should
> periodically
> run the orphan finding commands and trim usage logs you no longer
> require and so on.
>
>
> --
> May the most significant bit of your life be positive.

Well, this drops a lot of pressure from my shoulders.
Is there a way to reduce the probability of creating orphan objects?
We use s3 for rbd backups (create snapshot, compress it and then copy
it to s3 via s3cmd) and we created 25m orphan objects in 4 weeks.
If there is any option / best practive I can do, I will happily use it :)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: best practice balance mode in HAproxy in front of RGW?

2021-05-26 Thread Boris Behrens
Hi Janne,
do you know if there can be data duplication which leads to orphan objects?

I am currently huntin strange errors (there is a lot more data in the
pool, than accessible via rgw) and want to be sure it doesn't come
from the HAproxy.

Am Mi., 26. Mai 2021 um 13:12 Uhr schrieb Janne Johansson :
>
> I guess normal round robin should work out fine too, regardless of if
> there are few clients making several separate connections or many
> clients making a few.
>
> Den ons 26 maj 2021 kl 12:32 skrev Boris Behrens :
> >
> > Hello togehter,
> >
> > is there any best practive on the balance mode when I have a HAproxy
> > in front of my rgw_frontend?
> >
> > currently we use "balance leastconn".
> >
> > Cheers
> >  Boris
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> --
> May the most significant bit of your life be positive.



-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] best practice balance mode in HAproxy in front of RGW?

2021-05-26 Thread Boris Behrens
Hello togehter,

is there any best practive on the balance mode when I have a HAproxy
in front of my rgw_frontend?

currently we use "balance leastconn".

Cheers
 Boris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: summarized radosgw size_kb_actual vs pool stored value doesn't add up

2021-05-25 Thread Boris Behrens
The more files I delete, the more space is used.

How can this be?




Am Di., 25. Mai 2021 um 14:41 Uhr schrieb Boris Behrens :
>
> Am Di., 25. Mai 2021 um 09:23 Uhr schrieb Boris Behrens :
> >
> > Hi,
> > I am still searching for a reason why these two values differ so much.
> >
> > I am currently deleting a giant amount of orphan objects (43mio, most
> > of them under 64kb), but the difference get larger instead of smaller.
> >
> > This was the state two days ago:
> > >
> > > [root@s3db1 ~]# radosgw-admin bucket stats | grep '"size_kb_actual"' | 
> > > awk '{ print $2 }' | tr -d , | paste -sd+ - | bc
> > > 175977343264
> > >
> > > [root@s3db1 ~]# rados df
> > > POOL_NAME  USED   OBJECTS CLONESCOPIES 
> > > MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS  RDWR_OPS  WR 
> > > USED COMPR UNDER COMPR
> > > ...
> > > eu-central-1.rgw.buckets.data   766 TiB 134632397  0 403897191
> > >   0   00 1076480853  45 TiB 532045864 551 TiB
> > > 0 B 0 B
> > > ...
> > > total_objects135866676
> > >
> > > [root@s3db1 ~]# ceph df...
> > > eu-central-1.rgw.buckets.data   11 2048 253 TiB 
> > > 134.63M 766 TiB 90.3227 TiB
> >
> > And this is todays state:
> > >
> > > [root@s3db1 ~]# radosgw-admin bucket stats | grep '"size_kb_actual"' | 
> > > awk '{ print $2 }' | tr -d , | paste -sd+ - | bc
> > > 177144806812
> > >
> > > [root@s3db1 ~]# rados df
> > > ...
> > > eu-central-1.rgw.buckets.data   786 TiB 120025590  0 360076770
> > > ...
> > > total_objects121261889
> > >
> > > [root@s3db1 ~]# ceph df
> > > ...
> > > eu-central-1.rgw.buckets.data   11 2048 260 TiB 
> > > 120.02M 786 TiB 92.5921 TiB
> >
> > I would love to free up the missing 80TB :)
> > Any suggestions?
>
> As Konstatin mentioned, maybe it was the GC, but I just processes all
> objects (with --include-all), but the situation did not change.
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> im groüen Saal.



-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: summarized radosgw size_kb_actual vs pool stored value doesn't add up

2021-05-25 Thread Boris Behrens
Am Di., 25. Mai 2021 um 09:23 Uhr schrieb Boris Behrens :
>
> Hi,
> I am still searching for a reason why these two values differ so much.
>
> I am currently deleting a giant amount of orphan objects (43mio, most
> of them under 64kb), but the difference get larger instead of smaller.
>
> This was the state two days ago:
> >
> > [root@s3db1 ~]# radosgw-admin bucket stats | grep '"size_kb_actual"' | awk 
> > '{ print $2 }' | tr -d , | paste -sd+ - | bc
> > 175977343264
> >
> > [root@s3db1 ~]# rados df
> > POOL_NAME  USED   OBJECTS CLONESCOPIES 
> > MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS  RDWR_OPS  WR 
> > USED COMPR UNDER COMPR
> > ...
> > eu-central-1.rgw.buckets.data   766 TiB 134632397  0 403897191  
> > 0   00 1076480853  45 TiB 532045864 551 TiB0 B  
> >0 B
> > ...
> > total_objects135866676
> >
> > [root@s3db1 ~]# ceph df...
> > eu-central-1.rgw.buckets.data   11 2048 253 TiB 134.63M 
> > 766 TiB 90.3227 TiB
>
> And this is todays state:
> >
> > [root@s3db1 ~]# radosgw-admin bucket stats | grep '"size_kb_actual"' | awk 
> > '{ print $2 }' | tr -d , | paste -sd+ - | bc
> > 177144806812
> >
> > [root@s3db1 ~]# rados df
> > ...
> > eu-central-1.rgw.buckets.data   786 TiB 120025590  0 360076770
> > ...
> > total_objects121261889
> >
> > [root@s3db1 ~]# ceph df
> > ...
> > eu-central-1.rgw.buckets.data   11 2048 260 TiB 120.02M 
> > 786 TiB 92.5921 TiB
>
> I would love to free up the missing 80TB :)
> Any suggestions?

As Konstatin mentioned, maybe it was the GC, but I just processes all
objects (with --include-all), but the situation did not change.

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: summarized radosgw size_kb_actual vs pool stored value doesn't add up

2021-05-25 Thread Boris Behrens
Am Di., 25. Mai 2021 um 09:39 Uhr schrieb Konstantin Shalygin :
>
> Hi,
>
> On 25 May 2021, at 10:23, Boris Behrens  wrote:
>
> I am still searching for a reason why these two values differ so much.
>
> I am currently deleting a giant amount of orphan objects (43mio, most
> of them under 64kb), but the difference get larger instead of smaller.
>
>
> When user trough API make a delete, objects just marks as deleted, then
> ceph-radosgw gc perform actual delete, you can see queue via `radosgw-admin 
> gc list`
> I think you can speedup process via rgw_gc_ options.
>
>
> Cheers,
> k

Hi K,

I thought about the GC, but it doesn't look like this is the issue:
>
> [root@s3db1 ~]# radosgw-admin gc list --include-all | grep oid | wc -l
> 563598
> [root@s3db1 ~]# radosgw-admin gc list | grep oid | wc -l
> 43768


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] summarized radosgw size_kb_actual vs pool stored value doesn't add up

2021-05-25 Thread Boris Behrens
Hi,
I am still searching for a reason why these two values differ so much.

I am currently deleting a giant amount of orphan objects (43mio, most
of them under 64kb), but the difference get larger instead of smaller.

This was the state two days ago:
>
> [root@s3db1 ~]# radosgw-admin bucket stats | grep '"size_kb_actual"' | awk '{ 
> print $2 }' | tr -d , | paste -sd+ - | bc
> 175977343264
>
> [root@s3db1 ~]# rados df
> POOL_NAME  USED   OBJECTS CLONESCOPIES 
> MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS  RDWR_OPS  WR USED 
> COMPR UNDER COMPR
> ...
> eu-central-1.rgw.buckets.data   766 TiB 134632397  0 403897191
>   0   00 1076480853  45 TiB 532045864 551 TiB0 B  
>0 B
> ...
> total_objects135866676
>
> [root@s3db1 ~]# ceph df...
> eu-central-1.rgw.buckets.data   11 2048 253 TiB 134.63M   
>   766 TiB 90.3227 TiB

And this is todays state:
>
> [root@s3db1 ~]# radosgw-admin bucket stats | grep '"size_kb_actual"' | awk '{ 
> print $2 }' | tr -d , | paste -sd+ - | bc
> 177144806812
>
> [root@s3db1 ~]# rados df
> ...
> eu-central-1.rgw.buckets.data   786 TiB 120025590  0 360076770
> ...
> total_objects121261889
>
> [root@s3db1 ~]# ceph df
> ...
> eu-central-1.rgw.buckets.data   11 2048 260 TiB 120.02M   
>   786 TiB 92.5921 TiB

I would love to free up the missing 80TB :)
Any suggestions?

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] question regarding markers in radosgw

2021-05-21 Thread Boris Behrens
Hello everybody,

It seems that I have a metric ton of orphan objects in my s3 cluster.
They look like this:
$ rados -p eu-central-1.rgw.buckets.data stat
ff7a8b0c-07e6-463a-861b-78f0adeba8ad.811806.9_1063978/features/2018-02-23.json
eu-central-1.rgw.buckets.data/ff7a8b0c-07e6-463a-861b-78f0adeba8ad.811806.9_1063978/features/2018-02-23.json
mtime 2018-02-23 20:59:32.00, size 608

Now I would imagine that 811806.9 is the marker, but when I do a simple
radosgw-admin bucket stats | grep -F 811806.9
I get back no results.

Can I jsut delete these files? And if I can delete these files, how can I
delete them fast?

Cheers
 Boris
-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: "radosgw-admin bucket radoslist" loops when a multipart upload is happening

2021-05-20 Thread Boris Behrens
Reading through the bugtracker: https://tracker.ceph.com/issues/50293

Thanks for your patience.

Am Do., 20. Mai 2021 um 15:10 Uhr schrieb Boris Behrens :

> I try to bump it once more, because it makes finding orphan objects nearly
> impossible.
>
> Am Di., 11. Mai 2021 um 13:03 Uhr schrieb Boris Behrens :
>
>> Hi together,
>>
>> I still search for orphan objects and came across a strange bug:
>> There is a huge multipart upload happening (around 4TB), and listing the
>> rados objects in the bucket loops over the multipart upload.
>>
>>
>>
>> --
>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
>> groüen Saal.
>>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: "radosgw-admin bucket radoslist" loops when a multipart upload is happening

2021-05-20 Thread Boris Behrens
I try to bump it once more, because it makes finding orphan objects nearly
impossible.

Am Di., 11. Mai 2021 um 13:03 Uhr schrieb Boris Behrens :

> Hi together,
>
> I still search for orphan objects and came across a strange bug:
> There is a huge multipart upload happening (around 4TB), and listing the
> rados objects in the bucket loops over the multipart upload.
>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Process for adding a separate block.db to an osd

2021-05-19 Thread Boris Behrens
This helped: https://tracker.ceph.com/issues/44509

$ systemctl stop ceph-osd@68
$ ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-68 --devs-source
/var/lib/ceph/osd/ceph-68/block --dev-target
/var/lib/ceph/osd/ceph-68/block.db  bluefs-bdev-migrate
$ systemctl start ceph-osd@68

Thanks a lot for your support Igor <3

Am Di., 18. Mai 2021 um 09:54 Uhr schrieb Boris Behrens :

> One more question:
> How do I get rid of the bluestore spillover message?
>  osd.68 spilled over 64 KiB metadata from 'db' device (13 GiB used of
> 50 GiB) to slow device
>
> I tried an offline compactation, which did not help.
>
> Am Mo., 17. Mai 2021 um 15:56 Uhr schrieb Boris Behrens :
>
>> I have no idea why, but it worked.
>>
>> As the fsck went well, I just re did the bluefs-bdev-new-db and now the
>> OSD is back up, with a block.db device.
>>
>> Thanks a lot
>>
>> Am Mo., 17. Mai 2021 um 15:28 Uhr schrieb Igor Fedotov > >:
>>
>>> If you haven't had successful OSD.68 starts with standalone DB I think
>>> it's safe to revert previous DB adding and just retry it.
>>>
>>> At first suggest to run bluefs-bdev-new-db command only and then do fsck
>>> again. If it's OK - proceed with bluefs migrate followed by another
>>> fsck. And then finalize with adding lvm tags and OSD activation.
>>>
>>>
>>> Thanks,
>>>
>>> Igor
>>>
>>> On 5/17/2021 3:47 PM, Boris Behrens wrote:
>>> > The FSCK looks good:
>>> >
>>> > [root@s3db10 export-bluefs2]# ceph-bluestore-tool --path
>>> > /var/lib/ceph/osd/ceph-68  fsck
>>> > fsck success
>>> >
>>> > Am Mo., 17. Mai 2021 um 14:39 Uhr schrieb Boris Behrens >> >:
>>> >
>>> >> Here is the new output. I kept both for now.
>>> >>
>>> >> [root@s3db10 export-bluefs2]# ls *
>>> >> db:
>>> >> 018215.sst  018444.sst  018839.sst  019074.sst  019210.sst  019381.sst
>>> >>   019560.sst  019755.sst  019849.sst  019888.sst  019958.sst
>>> 019995.sst
>>> >>   020007.sst  020042.sst  020067.sst  020098.sst  020115.sst
>>> >> 018216.sst  018445.sst  018840.sst  019075.sst  019211.sst  019382.sst
>>> >>   019670.sst  019756.sst  019877.sst  019889.sst  019959.sst
>>> 019996.sst
>>> >>   020008.sst  020043.sst  020068.sst  020104.sst  CURRENT
>>> >> 018273.sst  018446.sst  018876.sst  019076.sst  019256.sst  019383.sst
>>> >>   019671.sst  019757.sst  019878.sst  019890.sst  019960.sst
>>> 019997.sst
>>> >>   020030.sst  020055.sst  020069.sst  020105.sst  IDENTITY
>>> >> 018300.sst  018447.sst  018877.sst  019081.sst  019257.sst  019395.sst
>>> >>   019672.sst  019762.sst  019879.sst  019918.sst  019961.sst
>>> 019998.sst
>>> >>   020031.sst  020056.sst  020070.sst  020106.sst  LOCK
>>> >> 018301.sst  018448.sst  018904.sst  019082.sst  019344.sst  019396.sst
>>> >>   019673.sst  019763.sst  019880.sst  019919.sst  019962.sst
>>> 01.sst
>>> >>   020032.sst  020057.sst  020071.sst  020107.sst  MANIFEST-020084
>>> >> 018326.sst  018449.sst  018950.sst  019083.sst  019345.sst  019400.sst
>>> >>   019674.sst  019764.sst  019881.sst  019920.sst  019963.sst
>>> 02.sst
>>> >>   020035.sst  020058.sst  020072.sst  020108.sst  OPTIONS-020084
>>> >> 018327.sst  018540.sst  018952.sst  019126.sst  019346.sst  019470.sst
>>> >>   019675.sst  019765.sst  019882.sst  019921.sst  019964.sst
>>> 020001.sst
>>> >>   020036.sst  020059.sst  020073.sst  020109.sst  OPTIONS-020087
>>> >> 018328.sst  018541.sst  018953.sst  019127.sst  019370.sst  019471.sst
>>> >>   019676.sst  019766.sst  019883.sst  019922.sst  019965.sst
>>> 020002.sst
>>> >>   020037.sst  020060.sst  020074.sst  020110.sst
>>> >> 018329.sst  018590.sst  018954.sst  019128.sst  019371.sst  019472.sst
>>> >>   019677.sst  019845.sst  019884.sst  019923.sst  019989.sst
>>> 020003.sst
>>> >>   020038.sst  020061.sst  020075.sst  020111.sst
>>> >> 018406.sst  018591.sst  018995.sst  019174.sst  019372.sst  019473.sst
>>> >>   019678.sst  019846.sst  019885.sst  019950.sst  019992.sst
>>> 020004.sst
>>> >>   020039.sst  020062.sst  020094.sst  020112.sst
>>> >> 018407.sst  018727.sst  018996.sst  019175.sst  019373.sst  019474.ss

[ceph-users] Re: Process for adding a separate block.db to an osd

2021-05-18 Thread Boris Behrens
One more question:
How do I get rid of the bluestore spillover message?
 osd.68 spilled over 64 KiB metadata from 'db' device (13 GiB used of
50 GiB) to slow device

I tried an offline compactation, which did not help.

Am Mo., 17. Mai 2021 um 15:56 Uhr schrieb Boris Behrens :

> I have no idea why, but it worked.
>
> As the fsck went well, I just re did the bluefs-bdev-new-db and now the
> OSD is back up, with a block.db device.
>
> Thanks a lot
>
> Am Mo., 17. Mai 2021 um 15:28 Uhr schrieb Igor Fedotov :
>
>> If you haven't had successful OSD.68 starts with standalone DB I think
>> it's safe to revert previous DB adding and just retry it.
>>
>> At first suggest to run bluefs-bdev-new-db command only and then do fsck
>> again. If it's OK - proceed with bluefs migrate followed by another
>> fsck. And then finalize with adding lvm tags and OSD activation.
>>
>>
>> Thanks,
>>
>> Igor
>>
>> On 5/17/2021 3:47 PM, Boris Behrens wrote:
>> > The FSCK looks good:
>> >
>> > [root@s3db10 export-bluefs2]# ceph-bluestore-tool --path
>> > /var/lib/ceph/osd/ceph-68  fsck
>> > fsck success
>> >
>> > Am Mo., 17. Mai 2021 um 14:39 Uhr schrieb Boris Behrens :
>> >
>> >> Here is the new output. I kept both for now.
>> >>
>> >> [root@s3db10 export-bluefs2]# ls *
>> >> db:
>> >> 018215.sst  018444.sst  018839.sst  019074.sst  019210.sst  019381.sst
>> >>   019560.sst  019755.sst  019849.sst  019888.sst  019958.sst
>> 019995.sst
>> >>   020007.sst  020042.sst  020067.sst  020098.sst  020115.sst
>> >> 018216.sst  018445.sst  018840.sst  019075.sst  019211.sst  019382.sst
>> >>   019670.sst  019756.sst  019877.sst  019889.sst  019959.sst
>> 019996.sst
>> >>   020008.sst  020043.sst  020068.sst  020104.sst  CURRENT
>> >> 018273.sst  018446.sst  018876.sst  019076.sst  019256.sst  019383.sst
>> >>   019671.sst  019757.sst  019878.sst  019890.sst  019960.sst
>> 019997.sst
>> >>   020030.sst  020055.sst  020069.sst  020105.sst  IDENTITY
>> >> 018300.sst  018447.sst  018877.sst  019081.sst  019257.sst  019395.sst
>> >>   019672.sst  019762.sst  019879.sst  019918.sst  019961.sst
>> 019998.sst
>> >>   020031.sst  020056.sst  020070.sst  020106.sst  LOCK
>> >> 018301.sst  018448.sst  018904.sst  019082.sst  019344.sst  019396.sst
>> >>   019673.sst  019763.sst  019880.sst  019919.sst  019962.sst
>> 01.sst
>> >>   020032.sst  020057.sst  020071.sst  020107.sst  MANIFEST-020084
>> >> 018326.sst  018449.sst  018950.sst  019083.sst  019345.sst  019400.sst
>> >>   019674.sst  019764.sst  019881.sst  019920.sst  019963.sst
>> 02.sst
>> >>   020035.sst  020058.sst  020072.sst  020108.sst  OPTIONS-020084
>> >> 018327.sst  018540.sst  018952.sst  019126.sst  019346.sst  019470.sst
>> >>   019675.sst  019765.sst  019882.sst  019921.sst  019964.sst
>> 020001.sst
>> >>   020036.sst  020059.sst  020073.sst  020109.sst  OPTIONS-020087
>> >> 018328.sst  018541.sst  018953.sst  019127.sst  019370.sst  019471.sst
>> >>   019676.sst  019766.sst  019883.sst  019922.sst  019965.sst
>> 020002.sst
>> >>   020037.sst  020060.sst  020074.sst  020110.sst
>> >> 018329.sst  018590.sst  018954.sst  019128.sst  019371.sst  019472.sst
>> >>   019677.sst  019845.sst  019884.sst  019923.sst  019989.sst
>> 020003.sst
>> >>   020038.sst  020061.sst  020075.sst  020111.sst
>> >> 018406.sst  018591.sst  018995.sst  019174.sst  019372.sst  019473.sst
>> >>   019678.sst  019846.sst  019885.sst  019950.sst  019992.sst
>> 020004.sst
>> >>   020039.sst  020062.sst  020094.sst  020112.sst
>> >> 018407.sst  018727.sst  018996.sst  019175.sst  019373.sst  019474.sst
>> >>   019753.sst  019847.sst  019886.sst  019955.sst  019993.sst
>> 020005.sst
>> >>   020040.sst  020063.sst  020095.sst  020113.sst
>> >> 018443.sst  018728.sst  019073.sst  019176.sst  019380.sst  019475.sst
>> >>   019754.sst  019848.sst  019887.sst  019956.sst  019994.sst
>> 020006.sst
>> >>   020041.sst  020064.sst  020096.sst  020114.sst
>> >>
>> >> db.slow:
>> >>
>> >> db.wal:
>> >> 020085.log  020088.log
>> >> [root@s3db10 export-bluefs2]# du -hs
>> >> 12G .
>> >> [root@s3db10 export-bluefs2]# cat db/CURRENT
>> >> MANIFEST-020084
>> >>

[ceph-users] Re: Process for adding a separate block.db to an osd

2021-05-17 Thread Boris Behrens
I have no idea why, but it worked.

As the fsck went well, I just re did the bluefs-bdev-new-db and now the OSD
is back up, with a block.db device.

Thanks a lot

Am Mo., 17. Mai 2021 um 15:28 Uhr schrieb Igor Fedotov :

> If you haven't had successful OSD.68 starts with standalone DB I think
> it's safe to revert previous DB adding and just retry it.
>
> At first suggest to run bluefs-bdev-new-db command only and then do fsck
> again. If it's OK - proceed with bluefs migrate followed by another
> fsck. And then finalize with adding lvm tags and OSD activation.
>
>
> Thanks,
>
> Igor
>
> On 5/17/2021 3:47 PM, Boris Behrens wrote:
> > The FSCK looks good:
> >
> > [root@s3db10 export-bluefs2]# ceph-bluestore-tool --path
> > /var/lib/ceph/osd/ceph-68  fsck
> > fsck success
> >
> > Am Mo., 17. Mai 2021 um 14:39 Uhr schrieb Boris Behrens :
> >
> >> Here is the new output. I kept both for now.
> >>
> >> [root@s3db10 export-bluefs2]# ls *
> >> db:
> >> 018215.sst  018444.sst  018839.sst  019074.sst  019210.sst  019381.sst
> >>   019560.sst  019755.sst  019849.sst  019888.sst  019958.sst  019995.sst
> >>   020007.sst  020042.sst  020067.sst  020098.sst  020115.sst
> >> 018216.sst  018445.sst  018840.sst  019075.sst  019211.sst  019382.sst
> >>   019670.sst  019756.sst  019877.sst  019889.sst  019959.sst  019996.sst
> >>   020008.sst  020043.sst  020068.sst  020104.sst  CURRENT
> >> 018273.sst  018446.sst  018876.sst  019076.sst  019256.sst  019383.sst
> >>   019671.sst  019757.sst  019878.sst  019890.sst  019960.sst  019997.sst
> >>   020030.sst  020055.sst  020069.sst  020105.sst  IDENTITY
> >> 018300.sst  018447.sst  018877.sst  019081.sst  019257.sst  019395.sst
> >>   019672.sst  019762.sst  019879.sst  019918.sst  019961.sst  019998.sst
> >>   020031.sst  020056.sst  020070.sst  020106.sst  LOCK
> >> 018301.sst  018448.sst  018904.sst  019082.sst  019344.sst  019396.sst
> >>   019673.sst  019763.sst  019880.sst  019919.sst  019962.sst  01.sst
> >>   020032.sst  020057.sst  020071.sst  020107.sst  MANIFEST-020084
> >> 018326.sst  018449.sst  018950.sst  019083.sst  019345.sst  019400.sst
> >>   019674.sst  019764.sst  019881.sst  019920.sst  019963.sst  02.sst
> >>   020035.sst  020058.sst  020072.sst  020108.sst  OPTIONS-020084
> >> 018327.sst  018540.sst  018952.sst  019126.sst  019346.sst  019470.sst
> >>   019675.sst  019765.sst  019882.sst  019921.sst  019964.sst  020001.sst
> >>   020036.sst  020059.sst  020073.sst  020109.sst  OPTIONS-020087
> >> 018328.sst  018541.sst  018953.sst  019127.sst  019370.sst  019471.sst
> >>   019676.sst  019766.sst  019883.sst  019922.sst  019965.sst  020002.sst
> >>   020037.sst  020060.sst  020074.sst  020110.sst
> >> 018329.sst  018590.sst  018954.sst  019128.sst  019371.sst  019472.sst
> >>   019677.sst  019845.sst  019884.sst  019923.sst  019989.sst  020003.sst
> >>   020038.sst  020061.sst  020075.sst  020111.sst
> >> 018406.sst  018591.sst  018995.sst  019174.sst  019372.sst  019473.sst
> >>   019678.sst  019846.sst  019885.sst  019950.sst  019992.sst  020004.sst
> >>   020039.sst  020062.sst  020094.sst  020112.sst
> >> 018407.sst  018727.sst  018996.sst  019175.sst  019373.sst  019474.sst
> >>   019753.sst  019847.sst  019886.sst  019955.sst  019993.sst  020005.sst
> >>   020040.sst  020063.sst  020095.sst  020113.sst
> >> 018443.sst  018728.sst  019073.sst  019176.sst  019380.sst  019475.sst
> >>   019754.sst  019848.sst  019887.sst  019956.sst  019994.sst  020006.sst
> >>   020041.sst  020064.sst  020096.sst  020114.sst
> >>
> >> db.slow:
> >>
> >> db.wal:
> >> 020085.log  020088.log
> >> [root@s3db10 export-bluefs2]# du -hs
> >> 12G .
> >> [root@s3db10 export-bluefs2]# cat db/CURRENT
> >> MANIFEST-020084
> >>
> >> Am Mo., 17. Mai 2021 um 14:28 Uhr schrieb Igor Fedotov <
> ifedo...@suse.de>:
> >>
> >>> On 5/17/2021 2:53 PM, Boris Behrens wrote:
> >>>> Like this?
> >>> Yeah.
> >>>
> >>> so DB dir structure is more or less O but db/CURRENT looks corrupted.
> It
> >>> should contain something like: MANIFEST-020081
> >>>
> >>> Could you please remove (or even just rename)  block.db symlink and do
> >>> the export again? Preferably to preserve the results for the first
> export.
> >>>
> >>> if export reveals proper CURRENT content - you 

[ceph-users] Re: Process for adding a separate block.db to an osd

2021-05-17 Thread Boris Behrens
See my last mail :)

Am Mo., 17. Mai 2021 um 14:52 Uhr schrieb Igor Fedotov :

> Would you try fsck without standalone DB?
>
> On 5/17/2021 3:39 PM, Boris Behrens wrote:
> > Here is the new output. I kept both for now.
> >
> > [root@s3db10 export-bluefs2]# ls *
> > db:
> > 018215.sst  018444.sst  018839.sst  019074.sst  019210.sst  019381.sst
> >   019560.sst  019755.sst  019849.sst  019888.sst  019958.sst  019995.sst
> >   020007.sst  020042.sst  020067.sst  020098.sst  020115.sst
> > 018216.sst  018445.sst  018840.sst  019075.sst  019211.sst  019382.sst
> >   019670.sst  019756.sst  019877.sst  019889.sst  019959.sst  019996.sst
> >   020008.sst  020043.sst  020068.sst  020104.sst  CURRENT
> > 018273.sst  018446.sst  018876.sst  019076.sst  019256.sst  019383.sst
> >   019671.sst  019757.sst  019878.sst  019890.sst  019960.sst  019997.sst
> >   020030.sst  020055.sst  020069.sst  020105.sst  IDENTITY
> > 018300.sst  018447.sst  018877.sst  019081.sst  019257.sst  019395.sst
> >   019672.sst  019762.sst  019879.sst  019918.sst  019961.sst  019998.sst
> >   020031.sst  020056.sst  020070.sst  020106.sst  LOCK
> > 018301.sst  018448.sst  018904.sst  019082.sst  019344.sst  019396.sst
> >   019673.sst  019763.sst  019880.sst  019919.sst  019962.sst  01.sst
> >   020032.sst  020057.sst  020071.sst  020107.sst  MANIFEST-020084
> > 018326.sst  018449.sst  018950.sst  019083.sst  019345.sst  019400.sst
> >   019674.sst  019764.sst  019881.sst  019920.sst  019963.sst  02.sst
> >   020035.sst  020058.sst  020072.sst  020108.sst  OPTIONS-020084
> > 018327.sst  018540.sst  018952.sst  019126.sst  019346.sst  019470.sst
> >   019675.sst  019765.sst  019882.sst  019921.sst  019964.sst  020001.sst
> >   020036.sst  020059.sst  020073.sst  020109.sst  OPTIONS-020087
> > 018328.sst  018541.sst  018953.sst  019127.sst  019370.sst  019471.sst
> >   019676.sst  019766.sst  019883.sst  019922.sst  019965.sst  020002.sst
> >   020037.sst  020060.sst  020074.sst  020110.sst
> > 018329.sst  018590.sst  018954.sst  019128.sst  019371.sst  019472.sst
> >   019677.sst  019845.sst  019884.sst  019923.sst  019989.sst  020003.sst
> >   020038.sst  020061.sst  020075.sst  020111.sst
> > 018406.sst  018591.sst  018995.sst  019174.sst  019372.sst  019473.sst
> >   019678.sst  019846.sst  019885.sst  019950.sst  019992.sst  020004.sst
> >   020039.sst  020062.sst  020094.sst  020112.sst
> > 018407.sst  018727.sst  018996.sst  019175.sst  019373.sst  019474.sst
> >   019753.sst  019847.sst  019886.sst  019955.sst  019993.sst  020005.sst
> >   020040.sst  020063.sst  020095.sst  020113.sst
> > 018443.sst  018728.sst  019073.sst  019176.sst  019380.sst  019475.sst
> >   019754.sst  019848.sst  019887.sst  019956.sst  019994.sst  020006.sst
> >   020041.sst  020064.sst  020096.sst  020114.sst
> >
> > db.slow:
> >
> > db.wal:
> > 020085.log  020088.log
> > [root@s3db10 export-bluefs2]# du -hs
> > 12G .
> > [root@s3db10 export-bluefs2]# cat db/CURRENT
> > MANIFEST-020084
> >
> > Am Mo., 17. Mai 2021 um 14:28 Uhr schrieb Igor Fedotov  >:
> >
> >> On 5/17/2021 2:53 PM, Boris Behrens wrote:
> >>> Like this?
> >> Yeah.
> >>
> >> so DB dir structure is more or less O but db/CURRENT looks corrupted. It
> >> should contain something like: MANIFEST-020081
> >>
> >> Could you please remove (or even just rename)  block.db symlink and do
> the
> >> export again? Preferably to preserve the results for the first export.
> >>
> >> if export reveals proper CURRENT content - you might want to run fsck on
> >> the OSD...
> >>
> >>> [root@s3db10 export-bluefs]# ls *
> >>> db:
> >>> 018215.sst  018444.sst  018839.sst  019074.sst  019174.sst  019372.sst
> >>>019470.sst  019675.sst  019765.sst  019882.sst  019918.sst
> 019961.sst
> >>>019997.sst  020022.sst  020042.sst  020061.sst  020073.sst
> >>> 018216.sst  018445.sst  018840.sst  019075.sst  019175.sst  019373.sst
> >>>019471.sst  019676.sst  019766.sst  019883.sst  019919.sst
> 019962.sst
> >>>019998.sst  020023.sst  020043.sst  020062.sst  020074.sst
> >>> 018273.sst  018446.sst  018876.sst  019076.sst  019176.sst  019380.sst
> >>>019472.sst  019677.sst  019845.sst  019884.sst  019920.sst
> 019963.sst
> >>>01.sst  020030.sst  020049.sst  020063.sst  020075.sst
> >>> 018300.sst  018447.sst  018877.sst  019077.sst  019210.sst  019381.sst
> >>>019473.sst  019678.sst  0

[ceph-users] Re: Process for adding a separate block.db to an osd

2021-05-17 Thread Boris Behrens
The FSCK looks good:

[root@s3db10 export-bluefs2]# ceph-bluestore-tool --path
/var/lib/ceph/osd/ceph-68  fsck
fsck success

Am Mo., 17. Mai 2021 um 14:39 Uhr schrieb Boris Behrens :

> Here is the new output. I kept both for now.
>
> [root@s3db10 export-bluefs2]# ls *
> db:
> 018215.sst  018444.sst  018839.sst  019074.sst  019210.sst  019381.sst
>  019560.sst  019755.sst  019849.sst  019888.sst  019958.sst  019995.sst
>  020007.sst  020042.sst  020067.sst  020098.sst  020115.sst
> 018216.sst  018445.sst  018840.sst  019075.sst  019211.sst  019382.sst
>  019670.sst  019756.sst  019877.sst  019889.sst  019959.sst  019996.sst
>  020008.sst  020043.sst  020068.sst  020104.sst  CURRENT
> 018273.sst  018446.sst  018876.sst  019076.sst  019256.sst  019383.sst
>  019671.sst  019757.sst  019878.sst  019890.sst  019960.sst  019997.sst
>  020030.sst  020055.sst  020069.sst  020105.sst  IDENTITY
> 018300.sst  018447.sst  018877.sst  019081.sst  019257.sst  019395.sst
>  019672.sst  019762.sst  019879.sst  019918.sst  019961.sst  019998.sst
>  020031.sst  020056.sst  020070.sst  020106.sst  LOCK
> 018301.sst  018448.sst  018904.sst  019082.sst  019344.sst  019396.sst
>  019673.sst  019763.sst  019880.sst  019919.sst  019962.sst  01.sst
>  020032.sst  020057.sst  020071.sst  020107.sst  MANIFEST-020084
> 018326.sst  018449.sst  018950.sst  019083.sst  019345.sst  019400.sst
>  019674.sst  019764.sst  019881.sst  019920.sst  019963.sst  02.sst
>  020035.sst  020058.sst  020072.sst  020108.sst  OPTIONS-020084
> 018327.sst  018540.sst  018952.sst  019126.sst  019346.sst  019470.sst
>  019675.sst  019765.sst  019882.sst  019921.sst  019964.sst  020001.sst
>  020036.sst  020059.sst  020073.sst  020109.sst  OPTIONS-020087
> 018328.sst  018541.sst  018953.sst  019127.sst  019370.sst  019471.sst
>  019676.sst  019766.sst  019883.sst  019922.sst  019965.sst  020002.sst
>  020037.sst  020060.sst  020074.sst  020110.sst
> 018329.sst  018590.sst  018954.sst  019128.sst  019371.sst  019472.sst
>  019677.sst  019845.sst  019884.sst  019923.sst  019989.sst  020003.sst
>  020038.sst  020061.sst  020075.sst  020111.sst
> 018406.sst  018591.sst  018995.sst  019174.sst  019372.sst  019473.sst
>  019678.sst  019846.sst  019885.sst  019950.sst  019992.sst  020004.sst
>  020039.sst  020062.sst  020094.sst  020112.sst
> 018407.sst  018727.sst  018996.sst  019175.sst  019373.sst  019474.sst
>  019753.sst  019847.sst  019886.sst  019955.sst  019993.sst  020005.sst
>  020040.sst  020063.sst  020095.sst  020113.sst
> 018443.sst  018728.sst  019073.sst  019176.sst  019380.sst  019475.sst
>  019754.sst  019848.sst  019887.sst  019956.sst  019994.sst  020006.sst
>  020041.sst  020064.sst  020096.sst  020114.sst
>
> db.slow:
>
> db.wal:
> 020085.log  020088.log
> [root@s3db10 export-bluefs2]# du -hs
> 12G .
> [root@s3db10 export-bluefs2]# cat db/CURRENT
> MANIFEST-020084
>
> Am Mo., 17. Mai 2021 um 14:28 Uhr schrieb Igor Fedotov :
>
>> On 5/17/2021 2:53 PM, Boris Behrens wrote:
>> > Like this?
>>
>> Yeah.
>>
>> so DB dir structure is more or less O but db/CURRENT looks corrupted. It
>> should contain something like: MANIFEST-020081
>>
>> Could you please remove (or even just rename)  block.db symlink and do
>> the export again? Preferably to preserve the results for the first export.
>>
>> if export reveals proper CURRENT content - you might want to run fsck on
>> the OSD...
>>
>> >
>> > [root@s3db10 export-bluefs]# ls *
>> > db:
>> > 018215.sst  018444.sst  018839.sst  019074.sst  019174.sst  019372.sst
>> >   019470.sst  019675.sst  019765.sst  019882.sst  019918.sst  019961.sst
>> >   019997.sst  020022.sst  020042.sst  020061.sst  020073.sst
>> > 018216.sst  018445.sst  018840.sst  019075.sst  019175.sst  019373.sst
>> >   019471.sst  019676.sst  019766.sst  019883.sst  019919.sst  019962.sst
>> >   019998.sst  020023.sst  020043.sst  020062.sst  020074.sst
>> > 018273.sst  018446.sst  018876.sst  019076.sst  019176.sst  019380.sst
>> >   019472.sst  019677.sst  019845.sst  019884.sst  019920.sst  019963.sst
>> >   01.sst  020030.sst  020049.sst  020063.sst  020075.sst
>> > 018300.sst  018447.sst  018877.sst  019077.sst  019210.sst  019381.sst
>> >   019473.sst  019678.sst  019846.sst  019885.sst  019921.sst  019964.sst
>> >   02.sst  020031.sst  020051.sst  020064.sst  020077.sst
>> > 018301.sst  018448.sst  018904.sst  019081.sst  019211.sst  019382.sst
>> >   019474.sst  019753.sst  019847.sst  019886.sst  019922.sst  019965.sst
>> >   020001.sst  020032.sst  020052.sst  020065.sst  020080.sst
>> > 018326.sst  018

[ceph-users] Re: Process for adding a separate block.db to an osd

2021-05-17 Thread Boris Behrens
Here is the new output. I kept both for now.

[root@s3db10 export-bluefs2]# ls *
db:
018215.sst  018444.sst  018839.sst  019074.sst  019210.sst  019381.sst
 019560.sst  019755.sst  019849.sst  019888.sst  019958.sst  019995.sst
 020007.sst  020042.sst  020067.sst  020098.sst  020115.sst
018216.sst  018445.sst  018840.sst  019075.sst  019211.sst  019382.sst
 019670.sst  019756.sst  019877.sst  019889.sst  019959.sst  019996.sst
 020008.sst  020043.sst  020068.sst  020104.sst  CURRENT
018273.sst  018446.sst  018876.sst  019076.sst  019256.sst  019383.sst
 019671.sst  019757.sst  019878.sst  019890.sst  019960.sst  019997.sst
 020030.sst  020055.sst  020069.sst  020105.sst  IDENTITY
018300.sst  018447.sst  018877.sst  019081.sst  019257.sst  019395.sst
 019672.sst  019762.sst  019879.sst  019918.sst  019961.sst  019998.sst
 020031.sst  020056.sst  020070.sst  020106.sst  LOCK
018301.sst  018448.sst  018904.sst  019082.sst  019344.sst  019396.sst
 019673.sst  019763.sst  019880.sst  019919.sst  019962.sst  01.sst
 020032.sst  020057.sst  020071.sst  020107.sst  MANIFEST-020084
018326.sst  018449.sst  018950.sst  019083.sst  019345.sst  019400.sst
 019674.sst  019764.sst  019881.sst  019920.sst  019963.sst  02.sst
 020035.sst  020058.sst  020072.sst  020108.sst  OPTIONS-020084
018327.sst  018540.sst  018952.sst  019126.sst  019346.sst  019470.sst
 019675.sst  019765.sst  019882.sst  019921.sst  019964.sst  020001.sst
 020036.sst  020059.sst  020073.sst  020109.sst  OPTIONS-020087
018328.sst  018541.sst  018953.sst  019127.sst  019370.sst  019471.sst
 019676.sst  019766.sst  019883.sst  019922.sst  019965.sst  020002.sst
 020037.sst  020060.sst  020074.sst  020110.sst
018329.sst  018590.sst  018954.sst  019128.sst  019371.sst  019472.sst
 019677.sst  019845.sst  019884.sst  019923.sst  019989.sst  020003.sst
 020038.sst  020061.sst  020075.sst  020111.sst
018406.sst  018591.sst  018995.sst  019174.sst  019372.sst  019473.sst
 019678.sst  019846.sst  019885.sst  019950.sst  019992.sst  020004.sst
 020039.sst  020062.sst  020094.sst  020112.sst
018407.sst  018727.sst  018996.sst  019175.sst  019373.sst  019474.sst
 019753.sst  019847.sst  019886.sst  019955.sst  019993.sst  020005.sst
 020040.sst  020063.sst  020095.sst  020113.sst
018443.sst  018728.sst  019073.sst  019176.sst  019380.sst  019475.sst
 019754.sst  019848.sst  019887.sst  019956.sst  019994.sst  020006.sst
 020041.sst  020064.sst  020096.sst  020114.sst

db.slow:

db.wal:
020085.log  020088.log
[root@s3db10 export-bluefs2]# du -hs
12G .
[root@s3db10 export-bluefs2]# cat db/CURRENT
MANIFEST-020084

Am Mo., 17. Mai 2021 um 14:28 Uhr schrieb Igor Fedotov :

> On 5/17/2021 2:53 PM, Boris Behrens wrote:
> > Like this?
>
> Yeah.
>
> so DB dir structure is more or less O but db/CURRENT looks corrupted. It
> should contain something like: MANIFEST-020081
>
> Could you please remove (or even just rename)  block.db symlink and do the
> export again? Preferably to preserve the results for the first export.
>
> if export reveals proper CURRENT content - you might want to run fsck on
> the OSD...
>
> >
> > [root@s3db10 export-bluefs]# ls *
> > db:
> > 018215.sst  018444.sst  018839.sst  019074.sst  019174.sst  019372.sst
> >   019470.sst  019675.sst  019765.sst  019882.sst  019918.sst  019961.sst
> >   019997.sst  020022.sst  020042.sst  020061.sst  020073.sst
> > 018216.sst  018445.sst  018840.sst  019075.sst  019175.sst  019373.sst
> >   019471.sst  019676.sst  019766.sst  019883.sst  019919.sst  019962.sst
> >   019998.sst  020023.sst  020043.sst  020062.sst  020074.sst
> > 018273.sst  018446.sst  018876.sst  019076.sst  019176.sst  019380.sst
> >   019472.sst  019677.sst  019845.sst  019884.sst  019920.sst  019963.sst
> >   01.sst  020030.sst  020049.sst  020063.sst  020075.sst
> > 018300.sst  018447.sst  018877.sst  019077.sst  019210.sst  019381.sst
> >   019473.sst  019678.sst  019846.sst  019885.sst  019921.sst  019964.sst
> >   02.sst  020031.sst  020051.sst  020064.sst  020077.sst
> > 018301.sst  018448.sst  018904.sst  019081.sst  019211.sst  019382.sst
> >   019474.sst  019753.sst  019847.sst  019886.sst  019922.sst  019965.sst
> >   020001.sst  020032.sst  020052.sst  020065.sst  020080.sst
> > 018326.sst  018449.sst  018950.sst  019082.sst  019256.sst  019383.sst
> >   019475.sst  019754.sst  019848.sst  019887.sst  019923.sst  019986.sst
> >   020002.sst  020035.sst  020053.sst  020066.sst  CURRENT
> > 018327.sst  018540.sst  018952.sst  019083.sst  019257.sst  019395.sst
> >   019560.sst  019755.sst  019849.sst  019888.sst  019950.sst  019989.sst
> >   020003.sst  020036.sst  020055.sst  020067.sst  IDENTITY
> > 018328.sst  018541.sst  018953.sst  019124.sst  019344.sst  019396.sst
> >   019670.sst  019756.sst  019877.sst  019889.s

[ceph-users] Re: Process for adding a separate block.db to an osd

2021-05-17 Thread Boris Behrens
Like this?

[root@s3db10 export-bluefs]# ls *
db:
018215.sst  018444.sst  018839.sst  019074.sst  019174.sst  019372.sst
 019470.sst  019675.sst  019765.sst  019882.sst  019918.sst  019961.sst
 019997.sst  020022.sst  020042.sst  020061.sst  020073.sst
018216.sst  018445.sst  018840.sst  019075.sst  019175.sst  019373.sst
 019471.sst  019676.sst  019766.sst  019883.sst  019919.sst  019962.sst
 019998.sst  020023.sst  020043.sst  020062.sst  020074.sst
018273.sst  018446.sst  018876.sst  019076.sst  019176.sst  019380.sst
 019472.sst  019677.sst  019845.sst  019884.sst  019920.sst  019963.sst
 01.sst  020030.sst  020049.sst  020063.sst  020075.sst
018300.sst  018447.sst  018877.sst  019077.sst  019210.sst  019381.sst
 019473.sst  019678.sst  019846.sst  019885.sst  019921.sst  019964.sst
 02.sst  020031.sst  020051.sst  020064.sst  020077.sst
018301.sst  018448.sst  018904.sst  019081.sst  019211.sst  019382.sst
 019474.sst  019753.sst  019847.sst  019886.sst  019922.sst  019965.sst
 020001.sst  020032.sst  020052.sst  020065.sst  020080.sst
018326.sst  018449.sst  018950.sst  019082.sst  019256.sst  019383.sst
 019475.sst  019754.sst  019848.sst  019887.sst  019923.sst  019986.sst
 020002.sst  020035.sst  020053.sst  020066.sst  CURRENT
018327.sst  018540.sst  018952.sst  019083.sst  019257.sst  019395.sst
 019560.sst  019755.sst  019849.sst  019888.sst  019950.sst  019989.sst
 020003.sst  020036.sst  020055.sst  020067.sst  IDENTITY
018328.sst  018541.sst  018953.sst  019124.sst  019344.sst  019396.sst
 019670.sst  019756.sst  019877.sst  019889.sst  019955.sst  019992.sst
 020004.sst  020037.sst  020056.sst  020068.sst  LOCK
018329.sst  018590.sst  018954.sst  019125.sst  019345.sst  019400.sst
 019671.sst  019757.sst  019878.sst  019890.sst  019956.sst  019993.sst
 020005.sst  020038.sst  020057.sst  020069.sst  MANIFEST-020081
018406.sst  018591.sst  018995.sst  019126.sst  019346.sst  019467.sst
 019672.sst  019762.sst  019879.sst  019915.sst  019958.sst  019994.sst
 020006.sst  020039.sst  020058.sst  020070.sst  OPTIONS-020081
018407.sst  018727.sst  018996.sst  019127.sst  019370.sst  019468.sst
 019673.sst  019763.sst  019880.sst  019916.sst  019959.sst  019995.sst
 020007.sst  020040.sst  020059.sst  020071.sst  OPTIONS-020084
018443.sst  018728.sst  019073.sst  019128.sst  019371.sst  019469.sst
 019674.sst  019764.sst  019881.sst  019917.sst  019960.sst  019996.sst
 020008.sst  020041.sst  020060.sst  020072.sst

db.slow:

db.wal:
020082.log
[root@s3db10 export-bluefs]# du -hs
12G .
[root@s3db10 export-bluefs]# cat db/CURRENT
�g�U
   uN�[�+p[root@s3db10 export-bluefs]#

Am Mo., 17. Mai 2021 um 13:45 Uhr schrieb Igor Fedotov :

> You might want to check file structure at new DB using bluestore-tools's
> bluefs-export command:
>
> ceph-bluestore-tool --path  --command bluefs-export --out
> 
>
>  needs to have enough free space enough to fit DB data.
>
> Once exported - does   contain valid BlueFS directory
> structure - multiple .sst files, CURRENT and IDENTITY files etc?
>
> If so then please check and share the content of /db/CURRENT
> file.
>
>
> Thanks,
>
> Igor
>
> On 5/17/2021 1:32 PM, Boris Behrens wrote:
> > Hi Igor,
> > I posted it on pastebin: https://pastebin.com/Ze9EuCMD
> >
> > Cheers
> >   Boris
> >
> > Am Mo., 17. Mai 2021 um 12:22 Uhr schrieb Igor Fedotov  >:
> >
> >> Hi Boris,
> >>
> >> could you please share full OSD startup log and file listing for
> >> '/var/lib/ceph/osd/ceph-68'?
> >>
> >>
> >> Thanks,
> >>
> >> Igor
> >>
> >> On 5/17/2021 1:09 PM, Boris Behrens wrote:
> >>> Hi,
> >>> sorry for replying to this old thread:
> >>>
> >>> I tried to add a block.db to an OSD but now the OSD can not start with
> >> the
> >>> error:
> >>> Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -7>
> 2021-05-17
> >>> 09:50:38.362 7fc48ec94a80 -1 rocksdb: Corruption: CURRENT file does not
> >> end
> >>> with newline
> >>> Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -6>
> 2021-05-17
> >>> 09:50:38.362 7fc48ec94a80 -1 bluestore(/var/lib/ceph/osd/ceph-68)
> >> _open_db
> >>> erroring opening db:
> >>> Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -1>
> 2021-05-17
> >>> 09:50:38.866 7fc48ec94a80 -1
> >>>
> >>
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.21/rpm/el7/BUILD/ceph-14.2.21/src/os/bluestore/BlueStore.cc:
> >>> In function 'int BlueStore::_upgrade_super()' thread 7fc4

[ceph-users] Re: Process for adding a separate block.db to an osd

2021-05-17 Thread Boris Behrens
Hi Igor,
I posted it on pastebin: https://pastebin.com/Ze9EuCMD

Cheers
 Boris

Am Mo., 17. Mai 2021 um 12:22 Uhr schrieb Igor Fedotov :

> Hi Boris,
>
> could you please share full OSD startup log and file listing for
> '/var/lib/ceph/osd/ceph-68'?
>
>
> Thanks,
>
> Igor
>
> On 5/17/2021 1:09 PM, Boris Behrens wrote:
> > Hi,
> > sorry for replying to this old thread:
> >
> > I tried to add a block.db to an OSD but now the OSD can not start with
> the
> > error:
> > Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -7> 2021-05-17
> > 09:50:38.362 7fc48ec94a80 -1 rocksdb: Corruption: CURRENT file does not
> end
> > with newline
> > Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -6> 2021-05-17
> > 09:50:38.362 7fc48ec94a80 -1 bluestore(/var/lib/ceph/osd/ceph-68)
> _open_db
> > erroring opening db:
> > Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -1> 2021-05-17
> > 09:50:38.866 7fc48ec94a80 -1
> >
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.21/rpm/el7/BUILD/ceph-14.2.21/src/os/bluestore/BlueStore.cc:
> > In function 'int BlueStore::_upgrade_super()' thread 7fc48ec94a80 time
> > 2021-05-17 09:50:38.865204
> > Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]:
> >
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.21/rpm/el7/BUILD/ceph-14.2.21/src/os/bluestore/BlueStore.cc:
> > 10647: FAILED ceph_assert(ondisk_format > 0)
> >
> > I tried to run an fsck/repair on the disk:
> > [root@s3db10 osd]# ceph-bluestore-tool --path ceph-68  repair
> > 2021-05-17 10:05:25.695 7f714dea3ec0 -1 rocksdb: Corruption: CURRENT file
> > does not end with newline
> > 2021-05-17 10:05:25.695 7f714dea3ec0 -1 bluestore(ceph-68) _open_db
> > erroring opening db:
> > error from fsck: (5) Input/output error
> > [root@s3db10 osd]# ceph-bluestore-tool --path ceph-68  fsck
> > 2021-05-17 10:05:35.012 7fb8f22e6ec0 -1 rocksdb: Corruption: CURRENT file
> > does not end with newline
> > 2021-05-17 10:05:35.012 7fb8f22e6ec0 -1 bluestore(ceph-68) _open_db
> > erroring opening db:
> > error from fsck: (5) Input/output error
> >
> > These are the steps I did to add the disk:
> > $ CEPH_ARGS="--bluestore-block-db-size 53687091200
> > --bluestore_block_db_create=true" ceph-bluestore-tool bluefs-bdev-new-db
> > --path /var/lib/ceph/osd/ceph-68 --dev-target /dev/sdj1
> > $ chown -h ceph:ceph /var/lib/ceph/osd/ceph-68/block.db
> > $ lvchange --addtag ceph.db_device=/dev/sdj1
> >
> /dev/ceph-3bbfd168-2a54-4593-a037-80d0d7e97afd/osd-block-aaeaea54-eb6a-480c-b2fd-d938e336c0f6
> > $ lvchange --addtag ceph.db_uuid=463dd37c-fd49-4ccb-849f-c5827d3d9df2
> >
> /dev/ceph-3bbfd168-2a54-4593-a037-80d0d7e97afd/osd-block-aaeaea54-eb6a-480c-b2fd-d938e336c0f6
> > $ ceph-volume lvm activate --all
> >
> > The UUIDs
> > later I tried this:
> > $ ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-68 --devs-source
> > /var/lib/ceph/osd/ceph-68/block --dev-target
> > /var/lib/ceph/osd/ceph-68/block.db bluefs-bdev-migrate
> >
> > Any ideas how I can get the rocksdb fixed?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Process for adding a separate block.db to an osd

2021-05-17 Thread Boris Behrens
Hi,
sorry for replying to this old thread:

I tried to add a block.db to an OSD but now the OSD can not start with the
error:
Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -7> 2021-05-17
09:50:38.362 7fc48ec94a80 -1 rocksdb: Corruption: CURRENT file does not end
with newline
Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -6> 2021-05-17
09:50:38.362 7fc48ec94a80 -1 bluestore(/var/lib/ceph/osd/ceph-68) _open_db
erroring opening db:
Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -1> 2021-05-17
09:50:38.866 7fc48ec94a80 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.21/rpm/el7/BUILD/ceph-14.2.21/src/os/bluestore/BlueStore.cc:
In function 'int BlueStore::_upgrade_super()' thread 7fc48ec94a80 time
2021-05-17 09:50:38.865204
Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]:
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.21/rpm/el7/BUILD/ceph-14.2.21/src/os/bluestore/BlueStore.cc:
10647: FAILED ceph_assert(ondisk_format > 0)

I tried to run an fsck/repair on the disk:
[root@s3db10 osd]# ceph-bluestore-tool --path ceph-68  repair
2021-05-17 10:05:25.695 7f714dea3ec0 -1 rocksdb: Corruption: CURRENT file
does not end with newline
2021-05-17 10:05:25.695 7f714dea3ec0 -1 bluestore(ceph-68) _open_db
erroring opening db:
error from fsck: (5) Input/output error
[root@s3db10 osd]# ceph-bluestore-tool --path ceph-68  fsck
2021-05-17 10:05:35.012 7fb8f22e6ec0 -1 rocksdb: Corruption: CURRENT file
does not end with newline
2021-05-17 10:05:35.012 7fb8f22e6ec0 -1 bluestore(ceph-68) _open_db
erroring opening db:
error from fsck: (5) Input/output error

These are the steps I did to add the disk:
$ CEPH_ARGS="--bluestore-block-db-size 53687091200
--bluestore_block_db_create=true" ceph-bluestore-tool bluefs-bdev-new-db
--path /var/lib/ceph/osd/ceph-68 --dev-target /dev/sdj1
$ chown -h ceph:ceph /var/lib/ceph/osd/ceph-68/block.db
$ lvchange --addtag ceph.db_device=/dev/sdj1
/dev/ceph-3bbfd168-2a54-4593-a037-80d0d7e97afd/osd-block-aaeaea54-eb6a-480c-b2fd-d938e336c0f6
$ lvchange --addtag ceph.db_uuid=463dd37c-fd49-4ccb-849f-c5827d3d9df2
/dev/ceph-3bbfd168-2a54-4593-a037-80d0d7e97afd/osd-block-aaeaea54-eb6a-480c-b2fd-d938e336c0f6
$ ceph-volume lvm activate --all

The UUIDs
later I tried this:
$ ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-68 --devs-source
/var/lib/ceph/osd/ceph-68/block --dev-target
/var/lib/ceph/osd/ceph-68/block.db bluefs-bdev-migrate

Any ideas how I can get the rocksdb fixed?
-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)

2021-05-11 Thread Boris Behrens
I actually WAS the amount of watchers... narf..

This is so embarissing.. Thanks a lot for all your input.

Am Di., 11. Mai 2021 um 13:54 Uhr schrieb Boris Behrens :

> I tried to debug it with --debug-ms=1.
> Maybe someone could help me to wrap my head around it?
> https://pastebin.com/LD9qrm3x
>
>
>
> Am Di., 11. Mai 2021 um 11:17 Uhr schrieb Boris Behrens :
>
>> Good call. I just restarted the whole cluster, but the problem still
>> persists.
>> I don't think it is a problem with the rados, but with the radosgw.
>>
>> But I still struggle to pin the issue.
>>
>> Am Di., 11. Mai 2021 um 10:45 Uhr schrieb Thomas Schneider <
>> thomas.schneider-...@ruhr-uni-bochum.de>:
>>
>>> Hey all,
>>>
>>> we had slow RGW access when some OSDs were slow due to an (to us)
>>> unknown OSD bug that made PG access either slow or impossible. (It showed
>>> itself through slowness of the mgr as well, but nothing other than that).
>>> We restarted all OSDs that held RGW data and the problem was gone.
>>> I have no good way to debug the problem since it never occured again
>>> after we restarted the OSDs.
>>>
>>> Kind regards,
>>> Thomas
>>>
>>>
>>> Am 11. Mai 2021 08:47:06 MESZ schrieb Boris Behrens :
>>> >Hi Amit,
>>> >
>>> >I just pinged the mons from every system and they are all available.
>>> >
>>> >Am Mo., 10. Mai 2021 um 21:18 Uhr schrieb Amit Ghadge <
>>> amitg@gmail.com>:
>>> >
>>> >> We seen slowness due to unreachable one of them mgr service, maybe
>>> here
>>> >> are different, you can check monmap/ ceph.conf mon entry and then
>>> verify
>>> >> all nodes are successfully ping.
>>> >>
>>> >>
>>> >> -AmitG
>>> >>
>>> >>
>>> >> On Tue, 11 May 2021 at 12:12 AM, Boris Behrens  wrote:
>>> >>
>>> >>> Hi guys,
>>> >>>
>>> >>> does someone got any idea?
>>> >>>
>>> >>> Am Mi., 5. Mai 2021 um 16:16 Uhr schrieb Boris Behrens >> >:
>>> >>>
>>> >>> > Hi,
>>> >>> > since a couple of days we experience a strange slowness on some
>>> >>> > radosgw-admin operations.
>>> >>> > What is the best way to debug this?
>>> >>> >
>>> >>> > For example creating a user takes over 20s.
>>> >>> > [root@s3db1 ~]# time radosgw-admin user create --uid test-bb-user
>>> >>> > --display-name=test-bb-user
>>> >>> > 2021-05-05 14:08:14.297 7f6942286840  1 robust_notify: If at first
>>> you
>>> >>> > don't succeed: (110) Connection timed out
>>> >>> > 2021-05-05 14:08:14.297 7f6942286840  0 ERROR: failed to distribute
>>> >>> cache
>>> >>> > for eu-central-1.rgw.users.uid:test-bb-user
>>> >>> > 2021-05-05 14:08:24.335 7f6942286840  1 robust_notify: If at first
>>> you
>>> >>> > don't succeed: (110) Connection timed out
>>> >>> > 2021-05-05 14:08:24.335 7f6942286840  0 ERROR: failed to distribute
>>> >>> cache
>>> >>> > for eu-central-1.rgw.users.keys:
>>> >>> > {
>>> >>> > "user_id": "test-bb-user",
>>> >>> > "display_name": "test-bb-user",
>>> >>> >
>>> >>> > }
>>> >>> > real 0m20.557s
>>> >>> > user 0m0.087s
>>> >>> > sys 0m0.030s
>>> >>> >
>>> >>> > First I thought that rados operations might be slow, but adding and
>>> >>> > deleting objects in rados are fast as usual (at least from my
>>> >>> perspective).
>>> >>> > Also uploading to buckets is fine.
>>> >>> >
>>> >>> > We changed some things and I think it might have to do with this:
>>> >>> > * We have a HAProxy that distributes via leastconn between the 3
>>> >>> radosgw's
>>> >>> > (this did not change)
>>> >>> > * We had three times a daemon with the name "eu-central-1" running
>>> (on
>>> >>> the
>>> >>> >

[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)

2021-05-11 Thread Boris Behrens
I tried to debug it with --debug-ms=1.
Maybe someone could help me to wrap my head around it?
https://pastebin.com/LD9qrm3x



Am Di., 11. Mai 2021 um 11:17 Uhr schrieb Boris Behrens :

> Good call. I just restarted the whole cluster, but the problem still
> persists.
> I don't think it is a problem with the rados, but with the radosgw.
>
> But I still struggle to pin the issue.
>
> Am Di., 11. Mai 2021 um 10:45 Uhr schrieb Thomas Schneider <
> thomas.schneider-...@ruhr-uni-bochum.de>:
>
>> Hey all,
>>
>> we had slow RGW access when some OSDs were slow due to an (to us) unknown
>> OSD bug that made PG access either slow or impossible. (It showed itself
>> through slowness of the mgr as well, but nothing other than that).
>> We restarted all OSDs that held RGW data and the problem was gone.
>> I have no good way to debug the problem since it never occured again
>> after we restarted the OSDs.
>>
>> Kind regards,
>> Thomas
>>
>>
>> Am 11. Mai 2021 08:47:06 MESZ schrieb Boris Behrens :
>> >Hi Amit,
>> >
>> >I just pinged the mons from every system and they are all available.
>> >
>> >Am Mo., 10. Mai 2021 um 21:18 Uhr schrieb Amit Ghadge <
>> amitg@gmail.com>:
>> >
>> >> We seen slowness due to unreachable one of them mgr service, maybe here
>> >> are different, you can check monmap/ ceph.conf mon entry and then
>> verify
>> >> all nodes are successfully ping.
>> >>
>> >>
>> >> -AmitG
>> >>
>> >>
>> >> On Tue, 11 May 2021 at 12:12 AM, Boris Behrens  wrote:
>> >>
>> >>> Hi guys,
>> >>>
>> >>> does someone got any idea?
>> >>>
>> >>> Am Mi., 5. Mai 2021 um 16:16 Uhr schrieb Boris Behrens > >:
>> >>>
>> >>> > Hi,
>> >>> > since a couple of days we experience a strange slowness on some
>> >>> > radosgw-admin operations.
>> >>> > What is the best way to debug this?
>> >>> >
>> >>> > For example creating a user takes over 20s.
>> >>> > [root@s3db1 ~]# time radosgw-admin user create --uid test-bb-user
>> >>> > --display-name=test-bb-user
>> >>> > 2021-05-05 14:08:14.297 7f6942286840  1 robust_notify: If at first
>> you
>> >>> > don't succeed: (110) Connection timed out
>> >>> > 2021-05-05 14:08:14.297 7f6942286840  0 ERROR: failed to distribute
>> >>> cache
>> >>> > for eu-central-1.rgw.users.uid:test-bb-user
>> >>> > 2021-05-05 14:08:24.335 7f6942286840  1 robust_notify: If at first
>> you
>> >>> > don't succeed: (110) Connection timed out
>> >>> > 2021-05-05 14:08:24.335 7f6942286840  0 ERROR: failed to distribute
>> >>> cache
>> >>> > for eu-central-1.rgw.users.keys:
>> >>> > {
>> >>> > "user_id": "test-bb-user",
>> >>> > "display_name": "test-bb-user",
>> >>> >
>> >>> > }
>> >>> > real 0m20.557s
>> >>> > user 0m0.087s
>> >>> > sys 0m0.030s
>> >>> >
>> >>> > First I thought that rados operations might be slow, but adding and
>> >>> > deleting objects in rados are fast as usual (at least from my
>> >>> perspective).
>> >>> > Also uploading to buckets is fine.
>> >>> >
>> >>> > We changed some things and I think it might have to do with this:
>> >>> > * We have a HAProxy that distributes via leastconn between the 3
>> >>> radosgw's
>> >>> > (this did not change)
>> >>> > * We had three times a daemon with the name "eu-central-1" running
>> (on
>> >>> the
>> >>> > 3 radosgw's)
>> >>> > * Because this might have led to our data duplication problem, we
>> have
>> >>> > split that up so now the daemons are named per host
>> (eu-central-1-s3db1,
>> >>> > eu-central-1-s3db2, eu-central-1-s3db3)
>> >>> > * We also added dedicated rgw daemons for garbage collection,
>> because
>> >>> the
>> >>> > current one were not able to keep up.
>> >>> > * So basically ceph status went from "rgw: 1 daemon active
>> >>> (eu-central-1)"
>> >>> > to "rgw: 14 daemons active (eu-central-1-s3db1, eu-central-1-s3db2,
>> >>> > eu-central-1-s3db3, gc-s3db12, gc-s3db13...)
>> >>> >
>> >>> >
>> >>> > Cheers
>> >>> >  Boris
>> >>> >
>> >>>
>> >>>
>> >>> --
>> >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
>> im
>> >>> groüen Saal.
>> >>> ___
>> >>> ceph-users mailing list -- ceph-users@ceph.io
>> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>>
>> >>
>> >
>>
>> --
>> Thomas Schneider
>> IT.SERVICES
>> Wissenschaftliche Informationsversorgung Ruhr-Universität Bochum | 44780
>> Bochum
>> Telefon: +49 234 32 23939
>> http://www.it-services.rub.de/
>>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] "radosgw-admin bucket radoslist" loops when a multipart upload is happening

2021-05-11 Thread Boris Behrens
Hi together,

I still search for orphan objects and came across a strange bug:
There is a huge multipart upload happening (around 4TB), and listing the
rados objects in the bucket loops over the multipart upload.



-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)

2021-05-11 Thread Boris Behrens
Hi Amit,
it is the same physical interface but different VLANs. I checked all IP
adresses from all systems and everything is direct connected, without any
gateway hops.

Am Di., 11. Mai 2021 um 10:59 Uhr schrieb Amit Ghadge :

> I hope you are using a single network interface for the public and cluster?
>
> On Tue, May 11, 2021 at 2:15 PM Thomas Schneider <
> thomas.schneider-...@ruhr-uni-bochum.de> wrote:
>
>> Hey all,
>>
>> we had slow RGW access when some OSDs were slow due to an (to us) unknown
>> OSD bug that made PG access either slow or impossible. (It showed itself
>> through slowness of the mgr as well, but nothing other than that).
>> We restarted all OSDs that held RGW data and the problem was gone.
>> I have no good way to debug the problem since it never occured again
>> after we restarted the OSDs.
>>
>> Kind regards,
>> Thomas
>>
>>
>> Am 11. Mai 2021 08:47:06 MESZ schrieb Boris Behrens :
>> >Hi Amit,
>> >
>> >I just pinged the mons from every system and they are all available.
>> >
>> >Am Mo., 10. Mai 2021 um 21:18 Uhr schrieb Amit Ghadge <
>> amitg@gmail.com>:
>> >
>> >> We seen slowness due to unreachable one of them mgr service, maybe here
>> >> are different, you can check monmap/ ceph.conf mon entry and then
>> verify
>> >> all nodes are successfully ping.
>> >>
>> >>
>> >> -AmitG
>> >>
>> >>
>> >> On Tue, 11 May 2021 at 12:12 AM, Boris Behrens  wrote:
>> >>
>> >>> Hi guys,
>> >>>
>> >>> does someone got any idea?
>> >>>
>> >>> Am Mi., 5. Mai 2021 um 16:16 Uhr schrieb Boris Behrens > >:
>> >>>
>> >>> > Hi,
>> >>> > since a couple of days we experience a strange slowness on some
>> >>> > radosgw-admin operations.
>> >>> > What is the best way to debug this?
>> >>> >
>> >>> > For example creating a user takes over 20s.
>> >>> > [root@s3db1 ~]# time radosgw-admin user create --uid test-bb-user
>> >>> > --display-name=test-bb-user
>> >>> > 2021-05-05 14:08:14.297 7f6942286840  1 robust_notify: If at first
>> you
>> >>> > don't succeed: (110) Connection timed out
>> >>> > 2021-05-05 14:08:14.297 7f6942286840  0 ERROR: failed to distribute
>> >>> cache
>> >>> > for eu-central-1.rgw.users.uid:test-bb-user
>> >>> > 2021-05-05 14:08:24.335 7f6942286840  1 robust_notify: If at first
>> you
>> >>> > don't succeed: (110) Connection timed out
>> >>> > 2021-05-05 14:08:24.335 7f6942286840  0 ERROR: failed to distribute
>> >>> cache
>> >>> > for eu-central-1.rgw.users.keys:
>> >>> > {
>> >>> > "user_id": "test-bb-user",
>> >>> > "display_name": "test-bb-user",
>> >>> >
>> >>> > }
>> >>> > real 0m20.557s
>> >>> > user 0m0.087s
>> >>> > sys 0m0.030s
>> >>> >
>> >>> > First I thought that rados operations might be slow, but adding and
>> >>> > deleting objects in rados are fast as usual (at least from my
>> >>> perspective).
>> >>> > Also uploading to buckets is fine.
>> >>> >
>> >>> > We changed some things and I think it might have to do with this:
>> >>> > * We have a HAProxy that distributes via leastconn between the 3
>> >>> radosgw's
>> >>> > (this did not change)
>> >>> > * We had three times a daemon with the name "eu-central-1" running
>> (on
>> >>> the
>> >>> > 3 radosgw's)
>> >>> > * Because this might have led to our data duplication problem, we
>> have
>> >>> > split that up so now the daemons are named per host
>> (eu-central-1-s3db1,
>> >>> > eu-central-1-s3db2, eu-central-1-s3db3)
>> >>> > * We also added dedicated rgw daemons for garbage collection,
>> because
>> >>> the
>> >>> > current one were not able to keep up.
>> >>> > * So basically ceph status went from "rgw: 1 daemon active
>> >>> (eu-central-1)"
>> >>> > to "rgw: 14 daemons active (eu-central-1-s3db1, eu-central-1-s3db2,
>> >>> > eu-central-1-s3db3, gc-s3db12, gc-s3db13...)
>> >>> >
>> >>> >
>> >>> > Cheers
>> >>> >  Boris
>> >>> >
>> >>>
>> >>>
>> >>> --
>> >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
>> im
>> >>> groüen Saal.
>> >>> ___
>> >>> ceph-users mailing list -- ceph-users@ceph.io
>> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>>
>> >>
>> >
>>
>> --
>> Thomas Schneider
>> IT.SERVICES
>> Wissenschaftliche Informationsversorgung Ruhr-Universität Bochum | 44780
>> Bochum
>> Telefon: +49 234 32 23939
>> http://www.it-services.rub.de/
>>
>

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)

2021-05-11 Thread Boris Behrens
Good call. I just restarted the whole cluster, but the problem still
persists.
I don't think it is a problem with the rados, but with the radosgw.

But I still struggle to pin the issue.

Am Di., 11. Mai 2021 um 10:45 Uhr schrieb Thomas Schneider <
thomas.schneider-...@ruhr-uni-bochum.de>:

> Hey all,
>
> we had slow RGW access when some OSDs were slow due to an (to us) unknown
> OSD bug that made PG access either slow or impossible. (It showed itself
> through slowness of the mgr as well, but nothing other than that).
> We restarted all OSDs that held RGW data and the problem was gone.
> I have no good way to debug the problem since it never occured again after
> we restarted the OSDs.
>
> Kind regards,
> Thomas
>
>
> Am 11. Mai 2021 08:47:06 MESZ schrieb Boris Behrens :
> >Hi Amit,
> >
> >I just pinged the mons from every system and they are all available.
> >
> >Am Mo., 10. Mai 2021 um 21:18 Uhr schrieb Amit Ghadge <
> amitg@gmail.com>:
> >
> >> We seen slowness due to unreachable one of them mgr service, maybe here
> >> are different, you can check monmap/ ceph.conf mon entry and then verify
> >> all nodes are successfully ping.
> >>
> >>
> >> -AmitG
> >>
> >>
> >> On Tue, 11 May 2021 at 12:12 AM, Boris Behrens  wrote:
> >>
> >>> Hi guys,
> >>>
> >>> does someone got any idea?
> >>>
> >>> Am Mi., 5. Mai 2021 um 16:16 Uhr schrieb Boris Behrens :
> >>>
> >>> > Hi,
> >>> > since a couple of days we experience a strange slowness on some
> >>> > radosgw-admin operations.
> >>> > What is the best way to debug this?
> >>> >
> >>> > For example creating a user takes over 20s.
> >>> > [root@s3db1 ~]# time radosgw-admin user create --uid test-bb-user
> >>> > --display-name=test-bb-user
> >>> > 2021-05-05 14:08:14.297 7f6942286840  1 robust_notify: If at first
> you
> >>> > don't succeed: (110) Connection timed out
> >>> > 2021-05-05 14:08:14.297 7f6942286840  0 ERROR: failed to distribute
> >>> cache
> >>> > for eu-central-1.rgw.users.uid:test-bb-user
> >>> > 2021-05-05 14:08:24.335 7f6942286840  1 robust_notify: If at first
> you
> >>> > don't succeed: (110) Connection timed out
> >>> > 2021-05-05 14:08:24.335 7f6942286840  0 ERROR: failed to distribute
> >>> cache
> >>> > for eu-central-1.rgw.users.keys:
> >>> > {
> >>> > "user_id": "test-bb-user",
> >>> > "display_name": "test-bb-user",
> >>> >
> >>> > }
> >>> > real 0m20.557s
> >>> > user 0m0.087s
> >>> > sys 0m0.030s
> >>> >
> >>> > First I thought that rados operations might be slow, but adding and
> >>> > deleting objects in rados are fast as usual (at least from my
> >>> perspective).
> >>> > Also uploading to buckets is fine.
> >>> >
> >>> > We changed some things and I think it might have to do with this:
> >>> > * We have a HAProxy that distributes via leastconn between the 3
> >>> radosgw's
> >>> > (this did not change)
> >>> > * We had three times a daemon with the name "eu-central-1" running
> (on
> >>> the
> >>> > 3 radosgw's)
> >>> > * Because this might have led to our data duplication problem, we
> have
> >>> > split that up so now the daemons are named per host
> (eu-central-1-s3db1,
> >>> > eu-central-1-s3db2, eu-central-1-s3db3)
> >>> > * We also added dedicated rgw daemons for garbage collection, because
> >>> the
> >>> > current one were not able to keep up.
> >>> > * So basically ceph status went from "rgw: 1 daemon active
> >>> (eu-central-1)"
> >>> > to "rgw: 14 daemons active (eu-central-1-s3db1, eu-central-1-s3db2,
> >>> > eu-central-1-s3db3, gc-s3db12, gc-s3db13...)
> >>> >
> >>> >
> >>> > Cheers
> >>> >  Boris
> >>> >
> >>>
> >>>
> >>> --
> >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> im
> >>> groüen Saal.
> >>> ___
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>>
> >>
> >
>
> --
> Thomas Schneider
> IT.SERVICES
> Wissenschaftliche Informationsversorgung Ruhr-Universität Bochum | 44780
> Bochum
> Telefon: +49 234 32 23939
> http://www.it-services.rub.de/
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)

2021-05-10 Thread Boris Behrens
Hi Amit,

I just pinged the mons from every system and they are all available.

Am Mo., 10. Mai 2021 um 21:18 Uhr schrieb Amit Ghadge :

> We seen slowness due to unreachable one of them mgr service, maybe here
> are different, you can check monmap/ ceph.conf mon entry and then verify
> all nodes are successfully ping.
>
>
> -AmitG
>
>
> On Tue, 11 May 2021 at 12:12 AM, Boris Behrens  wrote:
>
>> Hi guys,
>>
>> does someone got any idea?
>>
>> Am Mi., 5. Mai 2021 um 16:16 Uhr schrieb Boris Behrens :
>>
>> > Hi,
>> > since a couple of days we experience a strange slowness on some
>> > radosgw-admin operations.
>> > What is the best way to debug this?
>> >
>> > For example creating a user takes over 20s.
>> > [root@s3db1 ~]# time radosgw-admin user create --uid test-bb-user
>> > --display-name=test-bb-user
>> > 2021-05-05 14:08:14.297 7f6942286840  1 robust_notify: If at first you
>> > don't succeed: (110) Connection timed out
>> > 2021-05-05 14:08:14.297 7f6942286840  0 ERROR: failed to distribute
>> cache
>> > for eu-central-1.rgw.users.uid:test-bb-user
>> > 2021-05-05 14:08:24.335 7f6942286840  1 robust_notify: If at first you
>> > don't succeed: (110) Connection timed out
>> > 2021-05-05 14:08:24.335 7f6942286840  0 ERROR: failed to distribute
>> cache
>> > for eu-central-1.rgw.users.keys:
>> > {
>> > "user_id": "test-bb-user",
>> > "display_name": "test-bb-user",
>> >
>> > }
>> > real 0m20.557s
>> > user 0m0.087s
>> > sys 0m0.030s
>> >
>> > First I thought that rados operations might be slow, but adding and
>> > deleting objects in rados are fast as usual (at least from my
>> perspective).
>> > Also uploading to buckets is fine.
>> >
>> > We changed some things and I think it might have to do with this:
>> > * We have a HAProxy that distributes via leastconn between the 3
>> radosgw's
>> > (this did not change)
>> > * We had three times a daemon with the name "eu-central-1" running (on
>> the
>> > 3 radosgw's)
>> > * Because this might have led to our data duplication problem, we have
>> > split that up so now the daemons are named per host (eu-central-1-s3db1,
>> > eu-central-1-s3db2, eu-central-1-s3db3)
>> > * We also added dedicated rgw daemons for garbage collection, because
>> the
>> > current one were not able to keep up.
>> > * So basically ceph status went from "rgw: 1 daemon active
>> (eu-central-1)"
>> > to "rgw: 14 daemons active (eu-central-1-s3db1, eu-central-1-s3db2,
>> > eu-central-1-s3db3, gc-s3db12, gc-s3db13...)
>> >
>> >
>> > Cheers
>> >  Boris
>> >
>>
>>
>> --
>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
>> groüen Saal.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)

2021-05-10 Thread Boris Behrens
Hi guys,

does someone got any idea?

Am Mi., 5. Mai 2021 um 16:16 Uhr schrieb Boris Behrens :

> Hi,
> since a couple of days we experience a strange slowness on some
> radosgw-admin operations.
> What is the best way to debug this?
>
> For example creating a user takes over 20s.
> [root@s3db1 ~]# time radosgw-admin user create --uid test-bb-user
> --display-name=test-bb-user
> 2021-05-05 14:08:14.297 7f6942286840  1 robust_notify: If at first you
> don't succeed: (110) Connection timed out
> 2021-05-05 14:08:14.297 7f6942286840  0 ERROR: failed to distribute cache
> for eu-central-1.rgw.users.uid:test-bb-user
> 2021-05-05 14:08:24.335 7f6942286840  1 robust_notify: If at first you
> don't succeed: (110) Connection timed out
> 2021-05-05 14:08:24.335 7f6942286840  0 ERROR: failed to distribute cache
> for eu-central-1.rgw.users.keys:
> {
> "user_id": "test-bb-user",
> "display_name": "test-bb-user",
>
> }
> real 0m20.557s
> user 0m0.087s
> sys 0m0.030s
>
> First I thought that rados operations might be slow, but adding and
> deleting objects in rados are fast as usual (at least from my perspective).
> Also uploading to buckets is fine.
>
> We changed some things and I think it might have to do with this:
> * We have a HAProxy that distributes via leastconn between the 3 radosgw's
> (this did not change)
> * We had three times a daemon with the name "eu-central-1" running (on the
> 3 radosgw's)
> * Because this might have led to our data duplication problem, we have
> split that up so now the daemons are named per host (eu-central-1-s3db1,
> eu-central-1-s3db2, eu-central-1-s3db3)
> * We also added dedicated rgw daemons for garbage collection, because the
> current one were not able to keep up.
> * So basically ceph status went from "rgw: 1 daemon active (eu-central-1)"
> to "rgw: 14 daemons active (eu-central-1-s3db1, eu-central-1-s3db2,
> eu-central-1-s3db3, gc-s3db12, gc-s3db13...)
>
>
> Cheers
>  Boris
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] radosgw-admin user create takes a long time (with failed to distribute cache message)

2021-05-05 Thread Boris Behrens
Hi,
since a couple of days we experience a strange slowness on some
radosgw-admin operations.
What is the best way to debug this?

For example creating a user takes over 20s.
[root@s3db1 ~]# time radosgw-admin user create --uid test-bb-user
--display-name=test-bb-user
2021-05-05 14:08:14.297 7f6942286840  1 robust_notify: If at first you
don't succeed: (110) Connection timed out
2021-05-05 14:08:14.297 7f6942286840  0 ERROR: failed to distribute cache
for eu-central-1.rgw.users.uid:test-bb-user
2021-05-05 14:08:24.335 7f6942286840  1 robust_notify: If at first you
don't succeed: (110) Connection timed out
2021-05-05 14:08:24.335 7f6942286840  0 ERROR: failed to distribute cache
for eu-central-1.rgw.users.keys:
{
"user_id": "test-bb-user",
"display_name": "test-bb-user",
   
}
real 0m20.557s
user 0m0.087s
sys 0m0.030s

First I thought that rados operations might be slow, but adding and
deleting objects in rados are fast as usual (at least from my perspective).
Also uploading to buckets is fine.

We changed some things and I think it might have to do with this:
* We have a HAProxy that distributes via leastconn between the 3 radosgw's
(this did not change)
* We had three times a daemon with the name "eu-central-1" running (on the
3 radosgw's)
* Because this might have led to our data duplication problem, we have
split that up so now the daemons are named per host (eu-central-1-s3db1,
eu-central-1-s3db2, eu-central-1-s3db3)
* We also added dedicated rgw daemons for garbage collection, because the
current one were not able to keep up.
* So basically ceph status went from "rgw: 1 daemon active (eu-central-1)"
to "rgw: 14 daemons active (eu-central-1-s3db1, eu-central-1-s3db2,
eu-central-1-s3db3, gc-s3db12, gc-s3db13...)


Cheers
 Boris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] global multipart lc policy in radosgw

2021-05-02 Thread Boris Behrens
Hi,
I have a lot of multipart uploads that look like they never finished. Some
of them date back to 2019.

Is there a way to clean them up when they didn't finish in 28 days?

I know I can implement a LC policy per bucket, but how do I implement it
cluster wide?

Cheers
 Boris

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: how to handle rgw leaked data (aka data that is not available via buckets but eats diskspace)

2021-04-27 Thread Boris Behrens
So,
maybe somebody can answer me the following question:

I have ~150m objects in the ceph cluster (ceph status shows (objects:
152.68M objects, 316 TiB).
How can
radosgw-admin bucket --bucket BUCKET radoslist
create an output that is 252677729 and still growing?

Am Di., 27. Apr. 2021 um 06:59 Uhr schrieb Boris Behrens :

> Hi Anthony,
>
> yes we are using replication, the lost space is calculated before it's
> replicated.
> RAW STORAGE:
> CLASS SIZEAVAIL   USEDRAW USED %RAW USED
> hdd   1.1 PiB 191 TiB 968 TiB  968 TiB 83.55
> TOTAL 1.1 PiB 191 TiB 968 TiB  968 TiB 83.55
>
> POOLS:
> POOLID PGS  STORED
>  OBJECTS USED%USED MAX AVAIL
> rbd  0   64 0 B
> 0 0 B 013 TiB
> .rgw.root1   64  99 KiB
> 119  99 KiB 013 TiB
> eu-central-1.rgw.control 2   64 0 B
> 8 0 B 013 TiB
> eu-central-1.rgw.data.root   3   64 947 KiB
> 2.82k 947 KiB 013 TiB
> eu-central-1.rgw.gc  4   64 101 MiB
> 128 101 MiB 013 TiB
> eu-central-1.rgw.log 5   64 267 MiB
> 500 267 MiB 013 TiB
> eu-central-1.rgw.users.uid   6   64 2.9 MiB
> 6.91k 2.9 MiB 013 TiB
> eu-central-1.rgw.users.keys  7   64 263 KiB
> 6.73k 263 KiB 013 TiB
> eu-central-1.rgw.meta8   64 384 KiB
>  1k 384 KiB 013 TiB
> eu-central-1.rgw.users.email 9   6440 B
> 140 B 013 TiB
> eu-central-1.rgw.buckets.index  10   64  10 GiB
>  67.28k  10 GiB  0.0313 TiB
> eu-central-1.rgw.buckets.data   11 2048 313 TiB
> 151.71M 313 TiB 89.2513 TiB
> ...
>
> EC profile is pretty standard
> [root@s3db16 ~]# ceph osd erasure-code-profile ls
> default
> [root@s3db16 ~]# ceph osd erasure-code-profile get default
> k=2
> m=1
> plugin=jerasure
> technique=reed_sol_van
>
> We use mainly ceph 14.2.18. There is an OSD host with 14.2.19 and one with
> 14.2.20
>
> Object populations is mixed, but the most amount of data is in huge files.
> We store our platforms RBD snapshots in it.
>
> Cheers
>  Boris
>
>
> Am Di., 27. Apr. 2021 um 06:49 Uhr schrieb Anthony D'Atri <
> anthony.da...@gmail.com>:
>
>> Are you using Replication?  EC? How many copies / which profile?
>> On which Ceph release were your OSDs built?  BlueStore? Filestore?
>> What is your RGW object population like?  Lots of small objects?  Mostly
>> large objects?  Average / median object size?
>>
>> > On Apr 26, 2021, at 9:32 PM, Boris Behrens  wrote:
>> >
>> > HI,
>> >
>> > we still have the problem that our rgw eats more diskspace than it
>> should.
>> > Summing up the "size_kb_actual" of all buckets show only half of the
>> used
>> > diskspace.
>> >
>> > There are 312TiB stored acording to "ceph df" but we only need around
>> 158TB.
>> >
>> > I've already wrote to this ML with the problem, but there were no
>> solutions
>> > that would help.
>> > I've doug through the ML archive and found some interesting threads
>> > regarding orphan objects and these kind of issues.
>> >
>> > Did someone ever solved this problem?
>> > Or do you just add more disk space.
>> >
>> > we tried to:
>> > * use the "radosgw-admin orphan find/finish" tool (didn't work)
>> > * manually triggering the GC (didn't work)
>> >
>> > currently running (since yesterday evening):
>> > * rgw-orphan-list, which procused 270GB of text output, and it's not
>> done
>> > yet (I have 60GB diskspace left)
>> >
>> > --
>> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
>> > groüen Saal.
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: how to handle rgw leaked data (aka data that is not available via buckets but eats diskspace)

2021-04-26 Thread Boris Behrens
Hi Anthony,

yes we are using replication, the lost space is calculated before it's
replicated.
RAW STORAGE:
CLASS SIZEAVAIL   USEDRAW USED %RAW USED
hdd   1.1 PiB 191 TiB 968 TiB  968 TiB 83.55
TOTAL 1.1 PiB 191 TiB 968 TiB  968 TiB 83.55

POOLS:
POOLID PGS  STORED  OBJECTS
USED%USED MAX AVAIL
rbd  0   64 0 B   0
0 B 013 TiB
.rgw.root1   64  99 KiB 119
 99 KiB 013 TiB
eu-central-1.rgw.control 2   64 0 B   8
0 B 013 TiB
eu-central-1.rgw.data.root   3   64 947 KiB   2.82k
947 KiB 013 TiB
eu-central-1.rgw.gc  4   64 101 MiB 128
101 MiB 013 TiB
eu-central-1.rgw.log 5   64 267 MiB 500
267 MiB 013 TiB
eu-central-1.rgw.users.uid   6   64 2.9 MiB   6.91k
2.9 MiB 013 TiB
eu-central-1.rgw.users.keys  7   64 263 KiB   6.73k
263 KiB 013 TiB
eu-central-1.rgw.meta8   64 384 KiB  1k
384 KiB 013 TiB
eu-central-1.rgw.users.email 9   6440 B   1
   40 B 013 TiB
eu-central-1.rgw.buckets.index  10   64  10 GiB  67.28k
 10 GiB  0.0313 TiB
eu-central-1.rgw.buckets.data   11 2048 313 TiB 151.71M
313 TiB 89.2513 TiB
...

EC profile is pretty standard
[root@s3db16 ~]# ceph osd erasure-code-profile ls
default
[root@s3db16 ~]# ceph osd erasure-code-profile get default
k=2
m=1
plugin=jerasure
technique=reed_sol_van

We use mainly ceph 14.2.18. There is an OSD host with 14.2.19 and one with
14.2.20

Object populations is mixed, but the most amount of data is in huge files.
We store our platforms RBD snapshots in it.

Cheers
 Boris


Am Di., 27. Apr. 2021 um 06:49 Uhr schrieb Anthony D'Atri <
anthony.da...@gmail.com>:

> Are you using Replication?  EC? How many copies / which profile?
> On which Ceph release were your OSDs built?  BlueStore? Filestore?
> What is your RGW object population like?  Lots of small objects?  Mostly
> large objects?  Average / median object size?
>
> > On Apr 26, 2021, at 9:32 PM, Boris Behrens  wrote:
> >
> > HI,
> >
> > we still have the problem that our rgw eats more diskspace than it
> should.
> > Summing up the "size_kb_actual" of all buckets show only half of the used
> > diskspace.
> >
> > There are 312TiB stored acording to "ceph df" but we only need around
> 158TB.
> >
> > I've already wrote to this ML with the problem, but there were no
> solutions
> > that would help.
> > I've doug through the ML archive and found some interesting threads
> > regarding orphan objects and these kind of issues.
> >
> > Did someone ever solved this problem?
> > Or do you just add more disk space.
> >
> > we tried to:
> > * use the "radosgw-admin orphan find/finish" tool (didn't work)
> > * manually triggering the GC (didn't work)
> >
> > currently running (since yesterday evening):
> > * rgw-orphan-list, which procused 270GB of text output, and it's not done
> > yet (I have 60GB diskspace left)
> >
> > --
> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> > groüen Saal.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] how to handle rgw leaked data (aka data that is not available via buckets but eats diskspace)

2021-04-26 Thread Boris Behrens
HI,

we still have the problem that our rgw eats more diskspace than it should.
Summing up the "size_kb_actual" of all buckets show only half of the used
diskspace.

There are 312TiB stored acording to "ceph df" but we only need around 158TB.

I've already wrote to this ML with the problem, but there were no solutions
that would help.
I've doug through the ML archive and found some interesting threads
regarding orphan objects and these kind of issues.

Did someone ever solved this problem?
Or do you just add more disk space.

we tried to:
* use the "radosgw-admin orphan find/finish" tool (didn't work)
* manually triggering the GC (didn't work)

currently running (since yesterday evening):
* rgw-orphan-list, which procused 270GB of text output, and it's not done
yet (I have 60GB diskspace left)

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd snap create now working and just hangs forever

2021-04-23 Thread Boris Behrens
Am Fr., 23. Apr. 2021 um 12:16 Uhr schrieb Ilya Dryomov :

> On Fri, Apr 23, 2021 at 12:03 PM Boris Behrens  wrote:
> >
> >
> >
> > Am Fr., 23. Apr. 2021 um 11:52 Uhr schrieb Ilya Dryomov <
> idryo...@gmail.com>:
> >>
> >>
> >> This snippet confirms my suspicion.  Unfortunately without a verbose
> >> log from that VM from three days ago (i.e. when it got into this state)
> >> it's hard to tell what exactly went wrong.
> >>
> >> The problem is that the VM doesn't consider itself to be the rightful
> >> owner of the lock and so when "rbd snap create" requests the lock from
> >> it in order to make a snapshot, the VM just ignores the request because
> >> even though it owns the lock, its record appears to be of sync.
> >>
> >> I'd suggest to kick it by restarting osd36.  If the VM is active, it
> >> should reacquire the lock and hopefully update its internal record as
> >> expected.  If "rbd snap create" still hangs after that, it would mean
> >> that we have a reproducer and can gather logs on the VM side.
> >>
> >> What version of qemu/librbd and ceph is in use (both on the VM side and
> >> on the side you are running "rbd snap create"?
> >>
> > I just stopped the OSD, waited some seconds and started it again.
> > I still can't create snapshots.
> >
> > Ceph version is 14.2.18 accross the board
> > qemu is 4.1.0-1
> > as we use krbd, the kernel version is 5.2.9-arch1-1-ARCH
> >
> > How can I gather more logs to debug it?
>
> Are you saying that this image is mapped and the lock is held by the
> kernel client?  It doesn't look that way from the logs you shared.
>
> Thanks,
>
> Ilya
>

We use krbd instead of librbd (at least this is what I think I know), but
qemu is doing the kvm/rbd stuff.
-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd snap create now working and just hangs forever

2021-04-23 Thread Boris Behrens
Am Fr., 23. Apr. 2021 um 11:52 Uhr schrieb Ilya Dryomov :

>
> This snippet confirms my suspicion.  Unfortunately without a verbose
> log from that VM from three days ago (i.e. when it got into this state)
> it's hard to tell what exactly went wrong.
>
> The problem is that the VM doesn't consider itself to be the rightful
> owner of the lock and so when "rbd snap create" requests the lock from
> it in order to make a snapshot, the VM just ignores the request because
> even though it owns the lock, its record appears to be of sync.
>
> I'd suggest to kick it by restarting osd36.  If the VM is active, it
> should reacquire the lock and hopefully update its internal record as
> expected.  If "rbd snap create" still hangs after that, it would mean
> that we have a reproducer and can gather logs on the VM side.
>
> What version of qemu/librbd and ceph is in use (both on the VM side and
> on the side you are running "rbd snap create"?
>
> I just stopped the OSD, waited some seconds and started it again.
I still can't create snapshots.

Ceph version is 14.2.18 accross the board
qemu is 4.1.0-1
as we use krbd, the kernel version is 5.2.9-arch1-1-ARCH

How can I gather more logs to debug it?

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: s3 requires twice the space it should use

2021-04-23 Thread Boris Behrens
So I am following the orphans trail.

Now I have a job that is running since 3 1/2 days. Can I hit finish on a
job that is in the comparing state? It is in this since 2 days and the
messages in the output are repeating and look like this:

leaked:
ff7a8b0c-07e6-463a-861b-78f0adeba8ad.81095307.578__shadow_95a2980b-7012-43dd-81f2-07577cfcb9f0/25bc235b-3bf9-4db2-ace0-d149653bfd8b/e79909ed-e52e-4d16-a3a9-e84e332d37fa.lz4.2~YVwbe-JPoLioLOSEQtTwYOPt_wmUCHn.4310_2

This is the find job. I created it with: radosgw-admin orphans find
--job-id bb-orphan-2021-04-19 --bucket=BUCKET --yes-i-really-mean-it --pool
eu-central-1.rgw.buckets.data
{
"orphan_search_state": {
"info": {
"orphan_search_info": {
"job_name": "bb-orphan-2021-04-19",
"pool": "eu-central-1.rgw.buckets.data",
"num_shards": 64,
"start_time": "2021-04-19 16:42:45.993615Z"
}
},
"stage": {
"orphan_search_stage": {
"search_stage": "comparing",
    "shard": 0,
"marker": ""
}
}
    }
},

Am Fr., 16. Apr. 2021 um 10:57 Uhr schrieb Boris Behrens :

> Could this also be failed multipart uploads?
>
> Am Do., 15. Apr. 2021 um 18:23 Uhr schrieb Boris Behrens :
>
>> Cheers,
>>
>> [root@s3db1 ~]#  ceph daemon osd.23 perf dump | grep numpg
>> "numpg": 187,
>> "numpg_primary": 64,
>> "numpg_replica": 121,
>> "numpg_stray": 2,
>> "numpg_removing": 0,
>>
>>
>> Am Do., 15. Apr. 2021 um 18:18 Uhr schrieb 胡 玮文 :
>>
>>> Hi Boris,
>>>
>>> Could you check something like
>>>
>>> ceph daemon osd.23 perf dump | grep numpg
>>>
>>> to see if there are some stray or removing PG?
>>>
>>> Weiwen Hu
>>>
>>> > 在 2021年4月15日,22:53,Boris Behrens  写道:
>>> >
>>> > Ah you are right.
>>> > [root@s3db1 ~]# ceph daemon osd.23 config get
>>> bluestore_min_alloc_size_hdd
>>> > {
>>> >"bluestore_min_alloc_size_hdd": "65536"
>>> > }
>>> > But I also checked how many objects our s3 hold and the numbers just
>>> do not
>>> > add up.
>>> > There are only 26509200 objects, which would result in around 1TB
>>> "waste"
>>> > if every object would be empty.
>>> >
>>> > I think the problem began when I updated the PG count from 1024 to
>>> 2048.
>>> > Could there be an issue where the data is written twice?
>>> >
>>> >
>>> >> Am Do., 15. Apr. 2021 um 16:48 Uhr schrieb Amit Ghadge <
>>> amitg@gmail.com
>>> >>> :
>>> >>
>>> >> verify those two parameter values ,bluestore_min_alloc_size_hdd &
>>> >> bluestore_min_alloc_size_sdd, If you are using hdd disk then
>>> >> bluestore_min_alloc_size_hdd are applicable.
>>> >>
>>> >>> On Thu, Apr 15, 2021 at 8:06 PM Boris Behrens  wrote:
>>> >>>
>>> >>> So, I need to live with it? A value of zero leads to use the default?
>>> >>> [root@s3db1 ~]# ceph daemon osd.23 config get
>>> bluestore_min_alloc_size
>>> >>> {
>>> >>>"bluestore_min_alloc_size": "0"
>>> >>> }
>>> >>>
>>> >>> I also checked the fragmentation on the bluestore OSDs and it is
>>> around
>>> >>> 0.80 - 0.89 on most OSDs. yikes.
>>> >>> [root@s3db1 ~]# ceph daemon osd.23 bluestore allocator score block
>>> >>> {
>>> >>>"fragmentation_rating": 0.85906054329923576
>>> >>> }
>>> >>>
>>> >>> The problem I currently have is, that I barely keep up with adding
>>> OSD
>>> >>> disks.
>>> >>>
>>> >>> Am Do., 15. Apr. 2021 um 16:18 Uhr schrieb Amit Ghadge <
>>> >>> amitg@gmail.com>:
>>> >>>
>>> >>>> size_kb_actual are actually bucket object size but on OSD level the
>>> >>>> bluestore_min_alloc_size default 64KB and SSD are 16KB
>>> >>&g

[ceph-users] Re: rbd snap create now working and just hangs forever

2021-04-22 Thread Boris Behrens
Am Do., 22. Apr. 2021 um 17:27 Uhr schrieb Ilya Dryomov :

> On Thu, Apr 22, 2021 at 5:08 PM Boris Behrens  wrote:
> >
> >
> >
> > Am Do., 22. Apr. 2021 um 16:43 Uhr schrieb Ilya Dryomov <
> idryo...@gmail.com>:
> >>
> >> On Thu, Apr 22, 2021 at 4:20 PM Boris Behrens  wrote:
> >> >
> >> > Hi,
> >> >
> >> > I have a customer VM that is running fine, but I can not make
> snapshots
> >> > anymore.
> >> > rbd snap create rbd/IMAGE@test-bb-1
> >> > just hangs forever.
> >>
> >> Hi Boris,
> >>
> >> Run
> >>
> >> $ rbd snap create rbd/IMAGE@test-bb-1 --debug-ms=1 --debug-rbd=20
> >>
> >> let it hang for a few minutes and attach the output.
> >
> >
> > I just pasted a short snip here: https://pastebin.com/B3Xgpbzd
> > If you need more I can give it to you, but the output is very large.
>
> Paste the first couple thousand lines (i.e. from the very beginning),
> that should be enough.
>
> sure: https://pastebin.com/GsKpLbqG

good luck :)

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd snap create now working and just hangs forever

2021-04-22 Thread Boris Behrens
Am Do., 22. Apr. 2021 um 16:43 Uhr schrieb Ilya Dryomov :

> On Thu, Apr 22, 2021 at 4:20 PM Boris Behrens  wrote:
> >
> > Hi,
> >
> > I have a customer VM that is running fine, but I can not make snapshots
> > anymore.
> > rbd snap create rbd/IMAGE@test-bb-1
> > just hangs forever.
>
> Hi Boris,
>
> Run
>
> $ rbd snap create rbd/IMAGE@test-bb-1 --debug-ms=1 --debug-rbd=20
>
> let it hang for a few minutes and attach the output.
>

I just pasted a short snip here: https://pastebin.com/B3Xgpbzd
If you need more I can give it to you, but the output is very large.

>
> >
> > When I checked the status with
> > rbd status rbd/IMAGE
> > it shows one watcher, the cpu node where the VM is running.
> >
> > What can I do to investigate further, without restarting the VM.
> > This is the only affected VM and it stopped working three days ago.
>
> Can you think of any event related to the cluster, that VM or the
> VM fleet in general that occurred three days ago?
>
> We had an incident where the cpu nodes connected to the wrong cluster, but
this VM was not affected IIRC.

Cheers
 Boris

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] rbd snap create now working and just hangs forever

2021-04-22 Thread Boris Behrens
Hi,

I have a customer VM that is running fine, but I can not make snapshots
anymore.
rbd snap create rbd/IMAGE@test-bb-1
just hangs forever.

When I checked the status with
rbd status rbd/IMAGE
it shows one watcher, the cpu node where the VM is running.

What can I do to investigate further, without restarting the VM.
This is the only affected VM and it stopped working three days ago.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Suspicious newsletter] cleanup multipart in radosgw

2021-04-19 Thread Boris Behrens
Hi Istvan,

both of them require bucket access, correct?
Is there a way to add the LC policy globally?

Cheers
 Boris

Am Mo., 19. Apr. 2021 um 11:58 Uhr schrieb Szabo, Istvan (Agoda) <
istvan.sz...@agoda.com>:

> Hi,
>
> You have 2 ways:
>
> First is using s3vrowser app and in the menu select the multipart uploads
> and clean it up.
> The other is like this:
>
> Set lifecycle policy
> On the client:
> vim lifecyclepolicy
> 
> http://s3.amazonaws.com/doc/2006-03-01/";>
> 
> Incomplete Multipart
> Uploads
> 
> Enabled
>
> 
>
> 1
>
> 
> 
> 
>
> /bin/s3cmd setlifecycle lifecyclepolicy  s3://bucketname
> On mon node process manually
> radosgw-admin lc list
> radosgw-admin lc process
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
> -Original Message-
> From: Boris Behrens 
> Sent: Monday, April 19, 2021 4:10 PM
> To: ceph-users@ceph.io
> Subject: [Suspicious newsletter] [ceph-users] cleanup multipart in radosgw
>
> Hi,
> is there a way to remove multipart uploads that are older than X days?
>
> It doesn't need to be build into ceph or is automated to the end. Just
> something I don't need to build on my own.
>
> I currently try to debug a problem where ceph reports a lot more used
> space than it actually requires (
> https://www.mail-archive.com/ceph-users@ceph.io/msg09810.html).
>
> I came across a lot of old _multipart_ files in some buckets and now I
> want to clean them up.
> I don't know if this will fix my problem but I would love to rule that out.
>
> radosgw-admin bucket check --bucket=bucket --check-objects --fix does not
> work because it is a shareded bucket.
>
> I have also some buckets that look like this, and contain 100% _multipart_
> files which are >2 years old:
> "buckets": [
> {
> "bucket": "ncprod",
> "tenant": "",
> "num_objects": -482,
> "num_shards": 0,
> "objects_per_shard": -482,
> "fill_status": "OVER 180143985094819%"
> }
> ]
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to ceph-users-le...@ceph.io
>
> 
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cleanup multipart in radosgw

2021-04-19 Thread Boris Behrens
Hi,
is there a way to remove multipart uploads that are older than X days?

It doesn't need to be build into ceph or is automated to the end. Just
something I don't need to build on my own.

I currently try to debug a problem where ceph reports a lot more used space
than it actually requires (
https://www.mail-archive.com/ceph-users@ceph.io/msg09810.html).

I came across a lot of old _multipart_ files in some buckets and now I want
to clean them up.
I don't know if this will fix my problem but I would love to rule that out.

radosgw-admin bucket check --bucket=bucket --check-objects --fix does not
work because it is a shareded bucket.

I have also some buckets that look like this, and contain 100% _multipart_
files which are >2 years old:
"buckets": [
{
"bucket": "ncprod",
"tenant": "",
"num_objects": -482,
"num_shards": 0,
"objects_per_shard": -482,
"fill_status": "OVER 180143985094819%"
}
]

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: s3 requires twice the space it should use

2021-04-16 Thread Boris Behrens
Could this also be failed multipart uploads?

Am Do., 15. Apr. 2021 um 18:23 Uhr schrieb Boris Behrens :

> Cheers,
>
> [root@s3db1 ~]#  ceph daemon osd.23 perf dump | grep numpg
> "numpg": 187,
> "numpg_primary": 64,
> "numpg_replica": 121,
> "numpg_stray": 2,
> "numpg_removing": 0,
>
>
> Am Do., 15. Apr. 2021 um 18:18 Uhr schrieb 胡 玮文 :
>
>> Hi Boris,
>>
>> Could you check something like
>>
>> ceph daemon osd.23 perf dump | grep numpg
>>
>> to see if there are some stray or removing PG?
>>
>> Weiwen Hu
>>
>> > 在 2021年4月15日,22:53,Boris Behrens  写道:
>> >
>> > Ah you are right.
>> > [root@s3db1 ~]# ceph daemon osd.23 config get
>> bluestore_min_alloc_size_hdd
>> > {
>> >"bluestore_min_alloc_size_hdd": "65536"
>> > }
>> > But I also checked how many objects our s3 hold and the numbers just do
>> not
>> > add up.
>> > There are only 26509200 objects, which would result in around 1TB
>> "waste"
>> > if every object would be empty.
>> >
>> > I think the problem began when I updated the PG count from 1024 to 2048.
>> > Could there be an issue where the data is written twice?
>> >
>> >
>> >> Am Do., 15. Apr. 2021 um 16:48 Uhr schrieb Amit Ghadge <
>> amitg@gmail.com
>> >>> :
>> >>
>> >> verify those two parameter values ,bluestore_min_alloc_size_hdd &
>> >> bluestore_min_alloc_size_sdd, If you are using hdd disk then
>> >> bluestore_min_alloc_size_hdd are applicable.
>> >>
>> >>> On Thu, Apr 15, 2021 at 8:06 PM Boris Behrens  wrote:
>> >>>
>> >>> So, I need to live with it? A value of zero leads to use the default?
>> >>> [root@s3db1 ~]# ceph daemon osd.23 config get
>> bluestore_min_alloc_size
>> >>> {
>> >>>"bluestore_min_alloc_size": "0"
>> >>> }
>> >>>
>> >>> I also checked the fragmentation on the bluestore OSDs and it is
>> around
>> >>> 0.80 - 0.89 on most OSDs. yikes.
>> >>> [root@s3db1 ~]# ceph daemon osd.23 bluestore allocator score block
>> >>> {
>> >>>"fragmentation_rating": 0.85906054329923576
>> >>> }
>> >>>
>> >>> The problem I currently have is, that I barely keep up with adding OSD
>> >>> disks.
>> >>>
>> >>> Am Do., 15. Apr. 2021 um 16:18 Uhr schrieb Amit Ghadge <
>> >>> amitg@gmail.com>:
>> >>>
>> >>>> size_kb_actual are actually bucket object size but on OSD level the
>> >>>> bluestore_min_alloc_size default 64KB and SSD are 16KB
>> >>>>
>> >>>>
>> >>>>
>> https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faccess.redhat.com%2Fdocumentation%2Fen-us%2Fred_hat_ceph_storage%2F3%2Fhtml%2Fadministration_guide%2Fosd-bluestore&data=04%7C01%7C%7Cba98c0dff13941ea96ff08d9001e3759%7C84df9e7fe9f640afb435%7C1%7C0%7C637540952043049058%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wfSqqyiDHRXp4ypOGTxx4p%2Buy902OGPEmGkNfJ2BF6I%3D&reserved=0
>> >>>>
>> >>>> -AmitG
>> >>>>
>> >>>> On Thu, Apr 15, 2021 at 7:29 PM Boris Behrens  wrote:
>> >>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> maybe it is just a problem in my understanding, but it looks like
>> our s3
>> >>>>> requires twice the space it should use.
>> >>>>>
>> >>>>> I ran "radosgw-admin bucket stats", and added all "size_kb_actual"
>> >>>>> values
>> >>>>> up and divided to TB (/1024/1024/1024).
>> >>>>> The resulting space is 135,1636733 TB. When I tripple it because of
>> >>>>> replication I end up with around 405TB which is nearly half the
>> space of
>> >>>>> what ceph df tells me.
>> >>>>>
>> >>>>> Hope someone can help me.
>> >>>>>
>> >>>>> ceph df shows
>> >>>>> RAW STORAGE:
>> >>>>>CLASS SIZE AVAIL   USEDRAW USED %RAW
&

[ceph-users] Re: s3 requires twice the space it should use

2021-04-15 Thread Boris Behrens
Cheers,

[root@s3db1 ~]#  ceph daemon osd.23 perf dump | grep numpg
"numpg": 187,
"numpg_primary": 64,
"numpg_replica": 121,
"numpg_stray": 2,
"numpg_removing": 0,


Am Do., 15. Apr. 2021 um 18:18 Uhr schrieb 胡 玮文 :

> Hi Boris,
>
> Could you check something like
>
> ceph daemon osd.23 perf dump | grep numpg
>
> to see if there are some stray or removing PG?
>
> Weiwen Hu
>
> > 在 2021年4月15日,22:53,Boris Behrens  写道:
> >
> > Ah you are right.
> > [root@s3db1 ~]# ceph daemon osd.23 config get
> bluestore_min_alloc_size_hdd
> > {
> >"bluestore_min_alloc_size_hdd": "65536"
> > }
> > But I also checked how many objects our s3 hold and the numbers just do
> not
> > add up.
> > There are only 26509200 objects, which would result in around 1TB "waste"
> > if every object would be empty.
> >
> > I think the problem began when I updated the PG count from 1024 to 2048.
> > Could there be an issue where the data is written twice?
> >
> >
> >> Am Do., 15. Apr. 2021 um 16:48 Uhr schrieb Amit Ghadge <
> amitg@gmail.com
> >>> :
> >>
> >> verify those two parameter values ,bluestore_min_alloc_size_hdd &
> >> bluestore_min_alloc_size_sdd, If you are using hdd disk then
> >> bluestore_min_alloc_size_hdd are applicable.
> >>
> >>> On Thu, Apr 15, 2021 at 8:06 PM Boris Behrens  wrote:
> >>>
> >>> So, I need to live with it? A value of zero leads to use the default?
> >>> [root@s3db1 ~]# ceph daemon osd.23 config get bluestore_min_alloc_size
> >>> {
> >>>"bluestore_min_alloc_size": "0"
> >>> }
> >>>
> >>> I also checked the fragmentation on the bluestore OSDs and it is around
> >>> 0.80 - 0.89 on most OSDs. yikes.
> >>> [root@s3db1 ~]# ceph daemon osd.23 bluestore allocator score block
> >>> {
> >>>"fragmentation_rating": 0.85906054329923576
> >>> }
> >>>
> >>> The problem I currently have is, that I barely keep up with adding OSD
> >>> disks.
> >>>
> >>> Am Do., 15. Apr. 2021 um 16:18 Uhr schrieb Amit Ghadge <
> >>> amitg@gmail.com>:
> >>>
> >>>> size_kb_actual are actually bucket object size but on OSD level the
> >>>> bluestore_min_alloc_size default 64KB and SSD are 16KB
> >>>>
> >>>>
> >>>>
> https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faccess.redhat.com%2Fdocumentation%2Fen-us%2Fred_hat_ceph_storage%2F3%2Fhtml%2Fadministration_guide%2Fosd-bluestore&data=04%7C01%7C%7Cba98c0dff13941ea96ff08d9001e3759%7C84df9e7fe9f640afb435%7C1%7C0%7C637540952043049058%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wfSqqyiDHRXp4ypOGTxx4p%2Buy902OGPEmGkNfJ2BF6I%3D&reserved=0
> >>>>
> >>>> -AmitG
> >>>>
> >>>> On Thu, Apr 15, 2021 at 7:29 PM Boris Behrens  wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> maybe it is just a problem in my understanding, but it looks like
> our s3
> >>>>> requires twice the space it should use.
> >>>>>
> >>>>> I ran "radosgw-admin bucket stats", and added all "size_kb_actual"
> >>>>> values
> >>>>> up and divided to TB (/1024/1024/1024).
> >>>>> The resulting space is 135,1636733 TB. When I tripple it because of
> >>>>> replication I end up with around 405TB which is nearly half the
> space of
> >>>>> what ceph df tells me.
> >>>>>
> >>>>> Hope someone can help me.
> >>>>>
> >>>>> ceph df shows
> >>>>> RAW STORAGE:
> >>>>>CLASS SIZE AVAIL   USEDRAW USED %RAW
> >>>>> USED
> >>>>>hdd   1009 TiB 189 TiB 820 TiB  820 TiB
> >>>>> 81.26
> >>>>>TOTAL 1009 TiB 189 TiB 820 TiB  820 TiB
> >>>>> 81.26
> >>>>>
> >>>>> POOLS:
> >>>>>POOLID PGS  STORED
> >>>>> OBJECTS
> >>>>>USED%USED MAX AVAIL
> >>>>> 

[ceph-users] Re: s3 requires twice the space it should use

2021-04-15 Thread Boris Behrens
Ah you are right.
[root@s3db1 ~]# ceph daemon osd.23 config get bluestore_min_alloc_size_hdd
{
"bluestore_min_alloc_size_hdd": "65536"
}
But I also checked how many objects our s3 hold and the numbers just do not
add up.
There are only 26509200 objects, which would result in around 1TB "waste"
if every object would be empty.

I think the problem began when I updated the PG count from 1024 to 2048.
Could there be an issue where the data is written twice?


Am Do., 15. Apr. 2021 um 16:48 Uhr schrieb Amit Ghadge :

> verify those two parameter values ,bluestore_min_alloc_size_hdd &
> bluestore_min_alloc_size_sdd, If you are using hdd disk then
> bluestore_min_alloc_size_hdd are applicable.
>
> On Thu, Apr 15, 2021 at 8:06 PM Boris Behrens  wrote:
>
>> So, I need to live with it? A value of zero leads to use the default?
>> [root@s3db1 ~]# ceph daemon osd.23 config get bluestore_min_alloc_size
>> {
>> "bluestore_min_alloc_size": "0"
>> }
>>
>> I also checked the fragmentation on the bluestore OSDs and it is around
>> 0.80 - 0.89 on most OSDs. yikes.
>> [root@s3db1 ~]# ceph daemon osd.23 bluestore allocator score block
>> {
>> "fragmentation_rating": 0.85906054329923576
>> }
>>
>> The problem I currently have is, that I barely keep up with adding OSD
>> disks.
>>
>> Am Do., 15. Apr. 2021 um 16:18 Uhr schrieb Amit Ghadge <
>> amitg@gmail.com>:
>>
>>> size_kb_actual are actually bucket object size but on OSD level the
>>> bluestore_min_alloc_size default 64KB and SSD are 16KB
>>>
>>>
>>> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/administration_guide/osd-bluestore
>>>
>>> -AmitG
>>>
>>> On Thu, Apr 15, 2021 at 7:29 PM Boris Behrens  wrote:
>>>
>>>> Hi,
>>>>
>>>> maybe it is just a problem in my understanding, but it looks like our s3
>>>> requires twice the space it should use.
>>>>
>>>> I ran "radosgw-admin bucket stats", and added all "size_kb_actual"
>>>> values
>>>> up and divided to TB (/1024/1024/1024).
>>>> The resulting space is 135,1636733 TB. When I tripple it because of
>>>> replication I end up with around 405TB which is nearly half the space of
>>>> what ceph df tells me.
>>>>
>>>> Hope someone can help me.
>>>>
>>>> ceph df shows
>>>> RAW STORAGE:
>>>> CLASS SIZE AVAIL   USEDRAW USED %RAW
>>>> USED
>>>> hdd   1009 TiB 189 TiB 820 TiB  820 TiB
>>>>  81.26
>>>> TOTAL 1009 TiB 189 TiB 820 TiB  820 TiB
>>>>  81.26
>>>>
>>>> POOLS:
>>>> POOLID PGS  STORED
>>>> OBJECTS
>>>> USED%USED MAX AVAIL
>>>> rbd  0   64 0 B
>>>>0
>>>> 0 B 018 TiB
>>>> .rgw.root1   64  99 KiB
>>>>  119
>>>>  99 KiB 018 TiB
>>>> eu-central-1.rgw.control 2   64 0 B
>>>>8
>>>> 0 B 018 TiB
>>>> eu-central-1.rgw.data.root   3   64 1.0 MiB
>>>>  3.15k
>>>> 1.0 MiB 018 TiB
>>>> eu-central-1.rgw.gc  4   64  71 MiB
>>>>   32
>>>>  71 MiB 018 TiB
>>>> eu-central-1.rgw.log 5   64 267 MiB
>>>>  564
>>>> 267 MiB 018 TiB
>>>> eu-central-1.rgw.users.uid   6   64 2.8 MiB
>>>>  6.91k
>>>> 2.8 MiB 018 TiB
>>>> eu-central-1.rgw.users.keys  7   64 263 KiB
>>>>  6.73k
>>>> 263 KiB 018 TiB
>>>> eu-central-1.rgw.meta8   64 384 KiB
>>>>   1k
>>>> 384 KiB 018 TiB
>>>> eu-central-1.rgw.users.email 9   6440 B
>>>>1
>>>>40 B 018 TiB
>>>> eu-central-1.rgw.buckets.index  10   64  10 GiB
>>>> 67.61k
>>>>  10 GiB  0.0218 TiB
>>>

[ceph-users] Re: s3 requires twice the space it should use

2021-04-15 Thread Boris Behrens
So, I need to live with it? A value of zero leads to use the default?
[root@s3db1 ~]# ceph daemon osd.23 config get bluestore_min_alloc_size
{
"bluestore_min_alloc_size": "0"
}

I also checked the fragmentation on the bluestore OSDs and it is around
0.80 - 0.89 on most OSDs. yikes.
[root@s3db1 ~]# ceph daemon osd.23 bluestore allocator score block
{
"fragmentation_rating": 0.85906054329923576
}

The problem I currently have is, that I barely keep up with adding OSD
disks.

Am Do., 15. Apr. 2021 um 16:18 Uhr schrieb Amit Ghadge :

> size_kb_actual are actually bucket object size but on OSD level the
> bluestore_min_alloc_size default 64KB and SSD are 16KB
>
>
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/administration_guide/osd-bluestore
>
> -AmitG
>
> On Thu, Apr 15, 2021 at 7:29 PM Boris Behrens  wrote:
>
>> Hi,
>>
>> maybe it is just a problem in my understanding, but it looks like our s3
>> requires twice the space it should use.
>>
>> I ran "radosgw-admin bucket stats", and added all "size_kb_actual" values
>> up and divided to TB (/1024/1024/1024).
>> The resulting space is 135,1636733 TB. When I tripple it because of
>> replication I end up with around 405TB which is nearly half the space of
>> what ceph df tells me.
>>
>> Hope someone can help me.
>>
>> ceph df shows
>> RAW STORAGE:
>> CLASS SIZE AVAIL   USEDRAW USED %RAW USED
>> hdd   1009 TiB 189 TiB 820 TiB  820 TiB 81.26
>> TOTAL 1009 TiB 189 TiB 820 TiB  820 TiB 81.26
>>
>> POOLS:
>> POOLID PGS  STORED
>> OBJECTS
>> USED%USED MAX AVAIL
>> rbd  0   64 0 B
>>  0
>> 0 B 018 TiB
>> .rgw.root1   64  99 KiB
>>  119
>>  99 KiB 018 TiB
>> eu-central-1.rgw.control 2   64 0 B
>>  8
>> 0 B 018 TiB
>> eu-central-1.rgw.data.root   3   64 1.0 MiB
>>  3.15k
>> 1.0 MiB 018 TiB
>> eu-central-1.rgw.gc  4   64  71 MiB
>> 32
>>  71 MiB 018 TiB
>> eu-central-1.rgw.log 5   64 267 MiB
>>  564
>> 267 MiB 018 TiB
>> eu-central-1.rgw.users.uid   6   64 2.8 MiB
>>  6.91k
>> 2.8 MiB 018 TiB
>> eu-central-1.rgw.users.keys  7   64 263 KiB
>>  6.73k
>> 263 KiB 018 TiB
>> eu-central-1.rgw.meta8   64 384 KiB
>> 1k
>> 384 KiB 018 TiB
>> eu-central-1.rgw.users.email 9   6440 B
>>  1
>>40 B 018 TiB
>> eu-central-1.rgw.buckets.index  10   64  10 GiB
>> 67.61k
>>  10 GiB  0.0218 TiB
>> eu-central-1.rgw.buckets.data   11 2048 264 TiB
>>  138.31M
>> 264 TiB 83.3718 TiB
>> eu-central-1.rgw.buckets.non-ec 12   64 297 MiB
>> 11.32k
>> 297 MiB 018 TiB
>> eu-central-1.rgw.usage  13   64 536 MiB
>> 32
>> 536 MiB 018 TiB
>> eu-msg-1.rgw.control56   64 0 B
>>  8
>> 0 B 018 TiB
>> eu-msg-1.rgw.data.root  57   64  72 KiB
>>  227
>>  72 KiB 018 TiB
>> eu-msg-1.rgw.gc 58   64 300 KiB
>> 32
>> 300 KiB 018 TiB
>> eu-msg-1.rgw.log59   64 835 KiB
>>  242
>> 835 KiB 018 TiB
>> eu-msg-1.rgw.users.uid  60   64  56 KiB
>>  104
>>  56 KiB 018 TiB
>> eu-msg-1.rgw.usage  61   64  37 MiB
>> 25
>>  37 MiB 018 TiB
>> eu-msg-1.rgw.users.keys 62   64 3.8 KiB
>> 97
>> 3.8 KiB 018 TiB
>> eu-msg-1.rgw.meta   63   64 607 KiB
>>  1.60k
>> 607 KiB 018 TiB
>> eu-msg-1.rgw.buckets.index  64   64  71 MiB
>>  119
>>  71 MiB 018 TiB
>> eu-msg-1.rgw.users

[ceph-users] s3 requires twice the space it should use

2021-04-15 Thread Boris Behrens
Hi,

maybe it is just a problem in my understanding, but it looks like our s3
requires twice the space it should use.

I ran "radosgw-admin bucket stats", and added all "size_kb_actual" values
up and divided to TB (/1024/1024/1024).
The resulting space is 135,1636733 TB. When I tripple it because of
replication I end up with around 405TB which is nearly half the space of
what ceph df tells me.

Hope someone can help me.

ceph df shows
RAW STORAGE:
CLASS SIZE AVAIL   USEDRAW USED %RAW USED
hdd   1009 TiB 189 TiB 820 TiB  820 TiB 81.26
TOTAL 1009 TiB 189 TiB 820 TiB  820 TiB 81.26

POOLS:
POOLID PGS  STORED  OBJECTS
USED%USED MAX AVAIL
rbd  0   64 0 B   0
0 B 018 TiB
.rgw.root1   64  99 KiB 119
 99 KiB 018 TiB
eu-central-1.rgw.control 2   64 0 B   8
0 B 018 TiB
eu-central-1.rgw.data.root   3   64 1.0 MiB   3.15k
1.0 MiB 018 TiB
eu-central-1.rgw.gc  4   64  71 MiB  32
 71 MiB 018 TiB
eu-central-1.rgw.log 5   64 267 MiB 564
267 MiB 018 TiB
eu-central-1.rgw.users.uid   6   64 2.8 MiB   6.91k
2.8 MiB 018 TiB
eu-central-1.rgw.users.keys  7   64 263 KiB   6.73k
263 KiB 018 TiB
eu-central-1.rgw.meta8   64 384 KiB  1k
384 KiB 018 TiB
eu-central-1.rgw.users.email 9   6440 B   1
   40 B 018 TiB
eu-central-1.rgw.buckets.index  10   64  10 GiB  67.61k
 10 GiB  0.0218 TiB
eu-central-1.rgw.buckets.data   11 2048 264 TiB 138.31M
264 TiB 83.3718 TiB
eu-central-1.rgw.buckets.non-ec 12   64 297 MiB  11.32k
297 MiB 018 TiB
eu-central-1.rgw.usage  13   64 536 MiB  32
536 MiB 018 TiB
eu-msg-1.rgw.control56   64 0 B   8
0 B 018 TiB
eu-msg-1.rgw.data.root  57   64  72 KiB 227
 72 KiB 018 TiB
eu-msg-1.rgw.gc 58   64 300 KiB  32
300 KiB 018 TiB
eu-msg-1.rgw.log59   64 835 KiB 242
835 KiB 018 TiB
eu-msg-1.rgw.users.uid  60   64  56 KiB 104
 56 KiB 018 TiB
eu-msg-1.rgw.usage  61   64  37 MiB  25
 37 MiB 018 TiB
eu-msg-1.rgw.users.keys 62   64 3.8 KiB  97
3.8 KiB 018 TiB
eu-msg-1.rgw.meta   63   64 607 KiB   1.60k
607 KiB 018 TiB
eu-msg-1.rgw.buckets.index  64   64  71 MiB 119
 71 MiB 018 TiB
eu-msg-1.rgw.users.email65   64 0 B   0
0 B 018 TiB
eu-msg-1.rgw.buckets.data   66   64 2.9 TiB   1.16M
2.9 TiB  5.3018 TiB
eu-msg-1.rgw.buckets.non-ec 67   64 2.2 MiB 354
2.2 MiB 018 TiB
default.rgw.control 69   32 0 B   8
0 B 018 TiB
default.rgw.data.root   70   32 0 B   0
0 B 018 TiB
default.rgw.gc  71   32 0 B   0
0 B 018 TiB
default.rgw.log 72   32 0 B   0
0 B 018 TiB
default.rgw.users.uid   73   32 0 B   0
0 B 018 TiB
fra-1.rgw.control   74   32 0 B   8
0 B 018 TiB
fra-1.rgw.meta  75   32 0 B   0
0 B 018 TiB
fra-1.rgw.log   76   3250 B  28
   50 B 018 TiB


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: should I increase the amount of PGs?

2021-03-30 Thread Boris Behrens
I raised the backfillfull_ratio to .91 to see what happens, now I am
waiting. Some OSDs were around 89-91%, some are around 50-60%
The pgp_num is on 1946 since one week. I think this will solve itself, when
the cluster becomes a bit more tidy.

Am Di., 30. März 2021 um 15:23 Uhr schrieb Dan van der Ster <
d...@vanderster.com>:

> You started with 1024 PGs, and are splitting to 2048.
> Currently there are 1946 PGs used .. so it is nearly there at the goal.
>
> You need to watch that value 1946 and see if it increases slowly. If
> it does not increase, then those backfill_toofull PGs are probably
> splitting PGs, and they are blocked by not having enough free space.
>
> To solve that free space problem, you could either increase the
> backfillfull_ratio like we discussed earlier, or add capacity.
> I prefer the former, if the OSDs are just over the 90% default limit.
>
> -- dan
>
> On Tue, Mar 30, 2021 at 3:18 PM Boris Behrens  wrote:
> >
> > The output from ceph osd pool ls detail tell me nothing, except that the
> pgp_num is not where it should be. Can you help me to read the output? How
> do I estimate how long the split will take?
> >
> > [root@s3db1 ~]# ceph status
> >   cluster:
> > id: dca79fff-ffd0-58f4-1cff-82a2feea05f4
> > health: HEALTH_WARN
> > noscrub,nodeep-scrub flag(s) set
> > 10 backfillfull osd(s)
> > 19 nearfull osd(s)
> > 37 pool(s) backfillfull
> > BlueFS spillover detected on 1 OSD(s)
> > 13 large omap objects
> > Low space hindering backfill (add storage if this doesn't
> resolve itself): 234 pgs backfill_toofull
> > ...
> >   data:
> > pools:   37 pools, 4032 pgs
> > objects: 121.40M objects, 199 TiB
> > usage:   627 TiB used, 169 TiB / 795 TiB avail
> > pgs: 45263471/364213596 objects misplaced (12.428%)
> >  3719 active+clean
> >  209  active+remapped+backfill_wait+backfill_toofull
> >  59   active+remapped+backfill_wait
> >  24   active+remapped+backfill_toofull
> >  20   active+remapped+backfilling
> >  1active+remapped+forced_backfill+backfill_toofull
> >
> >   io:
> > client:   8.4 MiB/s rd, 127 MiB/s wr, 208 op/s rd, 163 op/s wr
> > recovery: 276 MiB/s, 164 objects/s
> >
> > [root@s3db1 ~]# ceph osd pool ls detail
> > ...
> > pool 10 'eu-central-1.rgw.buckets.index' replicated size 3 min_size 1
> crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn
> last_change 320966 lfor 0/193276/306366 flags hashpspool,backfillfull
> stripe_width 0 application rgw
> > pool 11 'eu-central-1.rgw.buckets.data' replicated size 3 min_size 2
> crush_rule 0 object_hash rjenkins pg_num 2048 pgp_num 1946 pgp_num_target
> 2048 autoscale_mode warn last_change 320966 lfor 0/263549/317774 flags
> hashpspool,backfillfull stripe_width 0 application rgw
> > ...
> >
> > Am Di., 30. März 2021 um 15:07 Uhr schrieb Dan van der Ster <
> d...@vanderster.com>:
> >>
> >> It would be safe to turn off the balancer, yes go ahead.
> >>
> >> To know if adding more hardware will help, we need to see how much
> >> longer this current splitting should take. This will help:
> >>
> >> ceph status
> >> ceph osd pool ls detail
> >>
> >> -- dan
> >>
> >> On Tue, Mar 30, 2021 at 3:00 PM Boris Behrens  wrote:
> >> >
> >> > I would think due to splitting, because the balancer doesn't refuses
> it's work, because to many misplaced objects.
> >> > I also think to turn it off for now, so it doesn't begin it's work at
> 5% missplaced objects.
> >> >
> >> > Would adding more hardware help? We wanted to insert another OSD node
> with 7x8TB disks anyway, but postponed it due to the rebalancing.
> >> >
> >> > Am Di., 30. März 2021 um 14:23 Uhr schrieb Dan van der Ster <
> d...@vanderster.com>:
> >> >>
> >> >> Are those PGs backfilling due to splitting or due to balancing?
> >> >> If it's the former, I don't think there's a way to pause them with
> >> >> upmap or any other trick.
> >> >>
> >> >> -- dan
> >> >>
> >> >> On Tue, Mar 30, 2021 at 2:07 PM Boris Behrens  wrote:
> >> >> >
> >> >> > One week later the ceph is still balancing.
> >> >> > What worries me like hel

[ceph-users] Re: should I increase the amount of PGs?

2021-03-30 Thread Boris Behrens
The output from ceph osd pool ls detail tell me nothing, except that the
pgp_num is not where it should be. Can you help me to read the output? How
do I estimate how long the split will take?

[root@s3db1 ~]# ceph status
  cluster:
id: dca79fff-ffd0-58f4-1cff-82a2feea05f4
health: HEALTH_WARN
noscrub,nodeep-scrub flag(s) set
10 backfillfull osd(s)
19 nearfull osd(s)
37 pool(s) backfillfull
BlueFS spillover detected on 1 OSD(s)
13 large omap objects
Low space hindering backfill (add storage if this doesn't
resolve itself): 234 pgs backfill_toofull
...
  data:
pools:   37 pools, 4032 pgs
objects: 121.40M objects, 199 TiB
usage:   627 TiB used, 169 TiB / 795 TiB avail
pgs: 45263471/364213596 objects misplaced (12.428%)
 3719 active+clean
 209  active+remapped+backfill_wait+backfill_toofull
 59   active+remapped+backfill_wait
 24   active+remapped+backfill_toofull
 20   active+remapped+backfilling
 1active+remapped+forced_backfill+backfill_toofull

  io:
client:   8.4 MiB/s rd, 127 MiB/s wr, 208 op/s rd, 163 op/s wr
recovery: 276 MiB/s, 164 objects/s

[root@s3db1 ~]# ceph osd pool ls detail
...
pool 10 'eu-central-1.rgw.buckets.index' replicated size 3 min_size 1
crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn
last_change 320966 lfor 0/193276/306366 flags hashpspool,backfillfull
stripe_width 0 application rgw
pool 11 'eu-central-1.rgw.buckets.data' replicated size 3 min_size 2
crush_rule 0 object_hash rjenkins pg_num 2048 pgp_num 1946 pgp_num_target
2048 autoscale_mode warn last_change 320966 lfor 0/263549/317774 flags
hashpspool,backfillfull stripe_width 0 application rgw
...

Am Di., 30. März 2021 um 15:07 Uhr schrieb Dan van der Ster <
d...@vanderster.com>:

> It would be safe to turn off the balancer, yes go ahead.
>
> To know if adding more hardware will help, we need to see how much
> longer this current splitting should take. This will help:
>
> ceph status
> ceph osd pool ls detail
>
> -- dan
>
> On Tue, Mar 30, 2021 at 3:00 PM Boris Behrens  wrote:
> >
> > I would think due to splitting, because the balancer doesn't refuses
> it's work, because to many misplaced objects.
> > I also think to turn it off for now, so it doesn't begin it's work at 5%
> missplaced objects.
> >
> > Would adding more hardware help? We wanted to insert another OSD node
> with 7x8TB disks anyway, but postponed it due to the rebalancing.
> >
> > Am Di., 30. März 2021 um 14:23 Uhr schrieb Dan van der Ster <
> d...@vanderster.com>:
> >>
> >> Are those PGs backfilling due to splitting or due to balancing?
> >> If it's the former, I don't think there's a way to pause them with
> >> upmap or any other trick.
> >>
> >> -- dan
> >>
> >> On Tue, Mar 30, 2021 at 2:07 PM Boris Behrens  wrote:
> >> >
> >> > One week later the ceph is still balancing.
> >> > What worries me like hell is the %USE on a lot of those OSDs. Does
> ceph
> >> > resolv this on it's own? We are currently down to 5TB space in the
> cluster.
> >> > Rebalancing single OSDs doesn't work well and it increases the
> "missplaced
> >> > objects".
> >> >
> >> > I thought about letting upmap do some rebalancing. Anyone know if
> this is a
> >> > good idea? Or if I should bite my nails an wait as I am the headache
> of my
> >> > life.
> >> > [root@s3db1 ~]# ceph osd getmap -o om; osdmaptool om --upmap out.txt
> >> > --upmap-pool eu-central-1.rgw.buckets.data --upmap-max 10; cat out.txt
> >> > got osdmap epoch 321975
> >> > osdmaptool: osdmap file 'om'
> >> > writing upmap command output to: out.txt
> >> > checking for upmap cleanups
> >> > upmap, max-count 10, max deviation 5
> >> >  limiting to pools eu-central-1.rgw.buckets.data ([11])
> >> > pools eu-central-1.rgw.buckets.data
> >> > prepared 10/10 changes
> >> > ceph osd rm-pg-upmap-items 11.209
> >> > ceph osd rm-pg-upmap-items 11.253
> >> > ceph osd pg-upmap-items 11.7f 79 88
> >> > ceph osd pg-upmap-items 11.fc 53 31 105 78
> >> > ceph osd pg-upmap-items 11.1d8 84 50
> >> > ceph osd pg-upmap-items 11.47f 94 86
> >> > ceph osd pg-upmap-items 11.49c 44 71
> >> > ceph osd pg-upmap-items 11.553 74 50
> >> > ceph osd pg-upmap-items 11.6c3 66 63
> >> > ceph osd pg-u

[ceph-users] Re: forceful remap PGs

2021-03-30 Thread Boris Behrens
I reweighted the OSD to .0 and then forced the backfilling.

How long does it take for ceph to free up space? I looks like it was doing
this, but it could also be the "backup cleanup job" that removed images
from the buckets.

Am Di., 30. März 2021 um 14:41 Uhr schrieb Stefan Kooman :

> On 3/30/21 12:55 PM, Boris Behrens wrote:
> > I just move one PG away from the OSD, but the diskspace will not get
> freed.
>
> How did you move? I would suggest you use upmap:
>
> ceph osd pg-upmap-items
> Invalid command: missing required parameter pgid()
> osd pg-upmap-items   [ (id|osd.id)>...] :  set pg_upmap_items mapping :{ to ,
> [...]} (developers only)
>
>
> So you specify which PG has to move to which OSD.
>
> > Do I need to do something to clean obsolete objects from the osd?
>
> No. The OSD will trim PG data that is not needed anymore.
>
> Gr. Stefan
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: should I increase the amount of PGs?

2021-03-30 Thread Boris Behrens
I would think due to splitting, because the balancer doesn't refuses it's
work, because to many misplaced objects.
I also think to turn it off for now, so it doesn't begin it's work at 5%
missplaced objects.

Would adding more hardware help? We wanted to insert another OSD node with
7x8TB disks anyway, but postponed it due to the rebalancing.

Am Di., 30. März 2021 um 14:23 Uhr schrieb Dan van der Ster <
d...@vanderster.com>:

> Are those PGs backfilling due to splitting or due to balancing?
> If it's the former, I don't think there's a way to pause them with
> upmap or any other trick.
>
> -- dan
>
> On Tue, Mar 30, 2021 at 2:07 PM Boris Behrens  wrote:
> >
> > One week later the ceph is still balancing.
> > What worries me like hell is the %USE on a lot of those OSDs. Does ceph
> > resolv this on it's own? We are currently down to 5TB space in the
> cluster.
> > Rebalancing single OSDs doesn't work well and it increases the
> "missplaced
> > objects".
> >
> > I thought about letting upmap do some rebalancing. Anyone know if this
> is a
> > good idea? Or if I should bite my nails an wait as I am the headache of
> my
> > life.
> > [root@s3db1 ~]# ceph osd getmap -o om; osdmaptool om --upmap out.txt
> > --upmap-pool eu-central-1.rgw.buckets.data --upmap-max 10; cat out.txt
> > got osdmap epoch 321975
> > osdmaptool: osdmap file 'om'
> > writing upmap command output to: out.txt
> > checking for upmap cleanups
> > upmap, max-count 10, max deviation 5
> >  limiting to pools eu-central-1.rgw.buckets.data ([11])
> > pools eu-central-1.rgw.buckets.data
> > prepared 10/10 changes
> > ceph osd rm-pg-upmap-items 11.209
> > ceph osd rm-pg-upmap-items 11.253
> > ceph osd pg-upmap-items 11.7f 79 88
> > ceph osd pg-upmap-items 11.fc 53 31 105 78
> > ceph osd pg-upmap-items 11.1d8 84 50
> > ceph osd pg-upmap-items 11.47f 94 86
> > ceph osd pg-upmap-items 11.49c 44 71
> > ceph osd pg-upmap-items 11.553 74 50
> > ceph osd pg-upmap-items 11.6c3 66 63
> > ceph osd pg-upmap-items 11.7ad 43 50
> >
> > ID  CLASS WEIGHTREWEIGHT SIZERAW USE DATA OMAP META
> >  AVAIL%USE  VAR  PGS STATUS TYPE NAME
> >  -1   795.42548- 795 TiB 626 TiB  587 TiB   82 GiB 1.4 TiB
> 170
> > TiB 78.64 1.00   -root default
> >  56   hdd   7.32619  1.0 7.3 TiB 6.4 TiB  6.4 TiB  684 MiB  16 GiB
> 910
> > GiB 87.87 1.12 129 up osd.56
> >  67   hdd   7.27739  1.0 7.3 TiB 6.4 TiB  6.4 TiB  582 MiB  16 GiB
> 865
> > GiB 88.40 1.12 115 up osd.67
> >  79   hdd   3.63689  1.0 3.6 TiB 3.2 TiB  432 GiB  1.9 GiB 0 B
> 432
> > GiB 88.40 1.12  63 up osd.79
> >  53   hdd   7.32619  1.0 7.3 TiB 6.5 TiB  6.4 TiB  971 MiB  22 GiB
> 864
> > GiB 88.48 1.13 114 up osd.53
> >  51   hdd   7.27739  1.0 7.3 TiB 6.5 TiB  6.4 TiB  734 MiB  15 GiB
> 837
> > GiB 88.77 1.13 120 up osd.51
> >  73   hdd  14.55269  1.0  15 TiB  13 TiB   13 TiB  1.8 GiB  39 GiB
> 1.6
> > TiB 88.97 1.13 246 up osd.73
> >  55   hdd   7.32619  1.0 7.3 TiB 6.5 TiB  6.5 TiB  259 MiB  15 GiB
> 825
> > GiB 89.01 1.13 118 up osd.55
> >  70   hdd   7.27739  1.0 7.3 TiB 6.5 TiB  6.5 TiB  291 MiB  16 GiB
> 787
> > GiB 89.44 1.14 119 up osd.70
> >  42   hdd   3.73630  1.0 3.7 TiB 3.4 TiB  3.3 TiB  685 MiB 8.2 GiB
> 374
> > GiB 90.23 1.15  60 up osd.42
> >  94   hdd   3.63869  1.0 3.6 TiB 3.3 TiB  3.3 TiB  132 MiB 7.7 GiB
> 345
> > GiB 90.75 1.15  64 up osd.94
> >  25   hdd   3.73630  1.0 3.7 TiB 3.4 TiB  3.3 TiB  3.2 MiB 8.1 GiB
> 352
> > GiB 90.79 1.15  53 up osd.25
> >  31   hdd   7.32619  1.0 7.3 TiB 6.7 TiB  6.6 TiB  223 MiB  15 GiB
> 690
> > GiB 90.80 1.15 117 up osd.31
> >  84   hdd   7.52150  1.0 7.5 TiB 6.8 TiB  6.6 TiB  159 MiB  16 GiB
> 699
> > GiB 90.93 1.16 121 up osd.84
> >  82   hdd   3.63689  1.0 3.6 TiB 3.3 TiB  332 GiB  1.0 GiB 0 B
> 332
> > GiB 91.08 1.16  59 up osd.82
> >  89   hdd   7.52150  1.0 7.5 TiB 6.9 TiB  6.6 TiB  400 MiB  15 GiB
> 670
> > GiB 91.29 1.16 126 up osd.89
> >  33   hdd   3.73630  1.0 3.7 TiB 3.4 TiB  3.3 TiB  382 MiB 8.6 GiB
> 327
> > GiB 91.46 1.16  66 up osd.33
> >  90   hdd   7.52150  1.0 7.5 TiB 6.9 TiB  6.6 TiB  338 MiB  15 GiB
> 658
> > GiB 91.46 1.16 112 up osd.90
> > 105   hdd   3.6

[ceph-users] Re: should I increase the amount of PGs?

2021-03-30 Thread Boris Behrens
One week later the ceph is still balancing.
What worries me like hell is the %USE on a lot of those OSDs. Does ceph
resolv this on it's own? We are currently down to 5TB space in the cluster.
Rebalancing single OSDs doesn't work well and it increases the "missplaced
objects".

I thought about letting upmap do some rebalancing. Anyone know if this is a
good idea? Or if I should bite my nails an wait as I am the headache of my
life.
[root@s3db1 ~]# ceph osd getmap -o om; osdmaptool om --upmap out.txt
--upmap-pool eu-central-1.rgw.buckets.data --upmap-max 10; cat out.txt
got osdmap epoch 321975
osdmaptool: osdmap file 'om'
writing upmap command output to: out.txt
checking for upmap cleanups
upmap, max-count 10, max deviation 5
 limiting to pools eu-central-1.rgw.buckets.data ([11])
pools eu-central-1.rgw.buckets.data
prepared 10/10 changes
ceph osd rm-pg-upmap-items 11.209
ceph osd rm-pg-upmap-items 11.253
ceph osd pg-upmap-items 11.7f 79 88
ceph osd pg-upmap-items 11.fc 53 31 105 78
ceph osd pg-upmap-items 11.1d8 84 50
ceph osd pg-upmap-items 11.47f 94 86
ceph osd pg-upmap-items 11.49c 44 71
ceph osd pg-upmap-items 11.553 74 50
ceph osd pg-upmap-items 11.6c3 66 63
ceph osd pg-upmap-items 11.7ad 43 50

ID  CLASS WEIGHTREWEIGHT SIZERAW USE DATA OMAP META
 AVAIL%USE  VAR  PGS STATUS TYPE NAME
 -1   795.42548- 795 TiB 626 TiB  587 TiB   82 GiB 1.4 TiB  170
TiB 78.64 1.00   -root default
 56   hdd   7.32619  1.0 7.3 TiB 6.4 TiB  6.4 TiB  684 MiB  16 GiB  910
GiB 87.87 1.12 129 up osd.56
 67   hdd   7.27739  1.0 7.3 TiB 6.4 TiB  6.4 TiB  582 MiB  16 GiB  865
GiB 88.40 1.12 115 up osd.67
 79   hdd   3.63689  1.0 3.6 TiB 3.2 TiB  432 GiB  1.9 GiB 0 B  432
GiB 88.40 1.12  63 up osd.79
 53   hdd   7.32619  1.0 7.3 TiB 6.5 TiB  6.4 TiB  971 MiB  22 GiB  864
GiB 88.48 1.13 114 up osd.53
 51   hdd   7.27739  1.0 7.3 TiB 6.5 TiB  6.4 TiB  734 MiB  15 GiB  837
GiB 88.77 1.13 120 up osd.51
 73   hdd  14.55269  1.0  15 TiB  13 TiB   13 TiB  1.8 GiB  39 GiB  1.6
TiB 88.97 1.13 246 up osd.73
 55   hdd   7.32619  1.0 7.3 TiB 6.5 TiB  6.5 TiB  259 MiB  15 GiB  825
GiB 89.01 1.13 118 up osd.55
 70   hdd   7.27739  1.0 7.3 TiB 6.5 TiB  6.5 TiB  291 MiB  16 GiB  787
GiB 89.44 1.14 119 up osd.70
 42   hdd   3.73630  1.0 3.7 TiB 3.4 TiB  3.3 TiB  685 MiB 8.2 GiB  374
GiB 90.23 1.15  60 up osd.42
 94   hdd   3.63869  1.0 3.6 TiB 3.3 TiB  3.3 TiB  132 MiB 7.7 GiB  345
GiB 90.75 1.15  64 up osd.94
 25   hdd   3.73630  1.0 3.7 TiB 3.4 TiB  3.3 TiB  3.2 MiB 8.1 GiB  352
GiB 90.79 1.15  53 up osd.25
 31   hdd   7.32619  1.0 7.3 TiB 6.7 TiB  6.6 TiB  223 MiB  15 GiB  690
GiB 90.80 1.15 117 up osd.31
 84   hdd   7.52150  1.0 7.5 TiB 6.8 TiB  6.6 TiB  159 MiB  16 GiB  699
GiB 90.93 1.16 121 up osd.84
 82   hdd   3.63689  1.0 3.6 TiB 3.3 TiB  332 GiB  1.0 GiB 0 B  332
GiB 91.08 1.16  59 up osd.82
 89   hdd   7.52150  1.0 7.5 TiB 6.9 TiB  6.6 TiB  400 MiB  15 GiB  670
GiB 91.29 1.16 126 up osd.89
 33   hdd   3.73630  1.0 3.7 TiB 3.4 TiB  3.3 TiB  382 MiB 8.6 GiB  327
GiB 91.46 1.16  66 up osd.33
 90   hdd   7.52150  1.0 7.5 TiB 6.9 TiB  6.6 TiB  338 MiB  15 GiB  658
GiB 91.46 1.16 112 up osd.90
105   hdd   3.63869  0.8 3.6 TiB 3.3 TiB  3.3 TiB  206 MiB 8.1 GiB  301
GiB 91.91 1.17  56 up osd.105
 66   hdd   7.27739  0.95000 7.3 TiB 6.7 TiB  6.7 TiB  322 MiB  16 GiB  548
GiB 92.64 1.18 121 up osd.66
 46   hdd   7.27739  1.0 7.3 TiB 6.8 TiB  6.7 TiB  316 MiB  16 GiB  536
GiB 92.81 1.18 119 up osd.46

Am Di., 23. März 2021 um 19:59 Uhr schrieb Boris Behrens :

> Good point. Thanks for the hint. I changed it for all OSDs from 5 to 1
> *crossing finger*
>
> Am Di., 23. März 2021 um 19:45 Uhr schrieb Dan van der Ster <
> d...@vanderster.com>:
>
>> I see. When splitting PGs, the OSDs will increase is used space
>> temporarily to make room for the new PGs.
>> When going from 1024->2048 PGs, that means that half of the objects from
>> each PG will be copied to a new PG, and then the previous PGs will have
>> those objects deleted.
>>
>> Make sure osd_max_backfills is set to 1, so that not too many PGs are
>> moving concurrently.
>>
>>
>>
>> On Tue, Mar 23, 2021, 7:39 PM Boris Behrens  wrote:
>>
>>> Thank you.
>>> Currently I do not have any full OSDs (all <90%) but I keep this in mind.
>>> What worries me is the ever increasing %USE metric (it went up from
>>> around 72% to 75% in three hours). It looks like there is comming a lot of
>>> data (there comes barely new data at the moment), but I think this might
>&

[ceph-users] Re: forceful remap PGs

2021-03-30 Thread Boris Behrens
I just move one PG away from the OSD, but the diskspace will not get freed.
Do I need to do something to clean obsolete objects from the osd?

Am Di., 30. März 2021 um 11:47 Uhr schrieb Boris Behrens :

> Hi,
> I have a couple OSDs that currently get a lot of data, and are running
> towards 95% fillrate.
>
> I would like to forcefully remap some PGs (they are around 100GB) to more
> empty OSDs and drop them from the full OSDs. I know this would lead to
> degraded objects, but I am not sure how long the cluster will stay in a
> state where it can allocate objects.
>
> OSD.105 grew from around 85% to 92% in the last 4 hours.
>
> This is the current state
>   cluster:
> id: dca79fff-ffd0-58f4-1cff-82a2feea05f4
> health: HEALTH_WARN
> noscrub,nodeep-scrub flag(s) set
> 9 backfillfull osd(s)
> 19 nearfull osd(s)
> 37 pool(s) backfillfull
> BlueFS spillover detected on 1 OSD(s)
> 13 large omap objects
> Low space hindering backfill (add storage if this doesn't
> resolve itself): 248 pgs backfill_toofull
> Degraded data redundancy: 18115/362288820 objects degraded
> (0.005%), 1 pg degraded, 1 pg undersized
>
>   services:
> mon: 3 daemons, quorum ceph-s3-mon1,ceph-s3-mon2,ceph-s3-mon3 (age 6d)
> mgr: ceph-mgr2(active, since 6d), standbys: ceph-mgr3, ceph-mgr1
> mds:  3 up:standby
> osd: 110 osds: 110 up (since 4d), 110 in (since 6d); 324 remapped pgs
>  flags noscrub,nodeep-scrub
> rgw: 4 daemons active (admin, eu-central-1, eu-msg-1, eu-secure-1)
>
>   task status:
>
>   data:
> pools:   37 pools, 4032 pgs
> objects: 120.76M objects, 197 TiB
> usage:   620 TiB used, 176 TiB / 795 TiB avail
> pgs: 18115/362288820 objects degraded (0.005%)
>  47144186/362288820 objects misplaced (13.013%)
>  3708 active+clean
>  241  active+remapped+backfill_wait+backfill_toofull
>  63   active+remapped+backfill_wait
>  11   active+remapped+backfilling
>  6active+remapped+backfill_toofull
>  1active+remapped+backfilling+forced_backfill
>  1active+remapped+forced_backfill+backfill_toofull
>  1active+undersized+degraded+remapped+backfilling
>
>   io:
> client:   23 MiB/s rd, 252 MiB/s wr, 347 op/s rd, 381 op/s wr
> recovery: 194 MiB/s, 112 objects/s
> ---
> ID  CLASS WEIGHTREWEIGHT SIZERAW USE DATAOMAP META
>  AVAIL%USE  VAR  PGS STATUS TYPE NAME
>  -1   795.42548- 795 TiB 620 TiB 582 TiB   82 GiB 1.4 TiB  176
> TiB 77.90 1.00   -root default
>  84   hdd   7.52150  1.0 7.5 TiB 6.8 TiB 6.5 TiB  158 MiB  15 GiB  764
> GiB 90.07 1.16 121 up osd.84
>  79   hdd   3.63689  1.0 3.6 TiB 3.3 TiB 367 GiB  1.9 GiB 0 B  367
> GiB 90.15 1.16  64 up osd.79
>  70   hdd   7.27739  1.0 7.3 TiB 6.6 TiB 6.5 TiB  268 MiB  15 GiB  730
> GiB 90.20 1.16 121 up osd.70
>  82   hdd   3.63689  1.0 3.6 TiB 3.3 TiB 364 GiB  1.1 GiB 0 B  364
> GiB 90.23 1.16  59 up osd.82
>  89   hdd   7.52150  1.0 7.5 TiB 6.8 TiB 6.6 TiB  395 MiB  16 GiB  735
> GiB 90.45 1.16 126 up osd.89
>  90   hdd   7.52150  1.0 7.5 TiB 6.8 TiB 6.6 TiB  338 MiB  15 GiB  723
> GiB 90.62 1.16 112 up osd.90
>  33   hdd   3.73630  1.0 3.7 TiB 3.4 TiB 3.3 TiB  382 MiB 8.6 GiB  358
> GiB 90.64 1.16  66 up osd.33
>  66   hdd   7.27739  0.95000 7.3 TiB 6.7 TiB 6.7 TiB  313 MiB  16 GiB  605
> GiB 91.88 1.18 122 up osd.66
>  46   hdd   7.27739  1.0 7.3 TiB 6.7 TiB 6.7 TiB  312 MiB  16 GiB  601
> GiB 91.93 1.18 119 up osd.46
> 105   hdd   3.63869  0.8 3.6 TiB 3.4 TiB 3.4 TiB  206 MiB 8.1 GiB  281
> GiB 92.45 1.19  58 up osd.105
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] forceful remap PGs

2021-03-30 Thread Boris Behrens
Hi,
I have a couple OSDs that currently get a lot of data, and are running
towards 95% fillrate.

I would like to forcefully remap some PGs (they are around 100GB) to more
empty OSDs and drop them from the full OSDs. I know this would lead to
degraded objects, but I am not sure how long the cluster will stay in a
state where it can allocate objects.

OSD.105 grew from around 85% to 92% in the last 4 hours.

This is the current state
  cluster:
id: dca79fff-ffd0-58f4-1cff-82a2feea05f4
health: HEALTH_WARN
noscrub,nodeep-scrub flag(s) set
9 backfillfull osd(s)
19 nearfull osd(s)
37 pool(s) backfillfull
BlueFS spillover detected on 1 OSD(s)
13 large omap objects
Low space hindering backfill (add storage if this doesn't
resolve itself): 248 pgs backfill_toofull
Degraded data redundancy: 18115/362288820 objects degraded
(0.005%), 1 pg degraded, 1 pg undersized

  services:
mon: 3 daemons, quorum ceph-s3-mon1,ceph-s3-mon2,ceph-s3-mon3 (age 6d)
mgr: ceph-mgr2(active, since 6d), standbys: ceph-mgr3, ceph-mgr1
mds:  3 up:standby
osd: 110 osds: 110 up (since 4d), 110 in (since 6d); 324 remapped pgs
 flags noscrub,nodeep-scrub
rgw: 4 daemons active (admin, eu-central-1, eu-msg-1, eu-secure-1)

  task status:

  data:
pools:   37 pools, 4032 pgs
objects: 120.76M objects, 197 TiB
usage:   620 TiB used, 176 TiB / 795 TiB avail
pgs: 18115/362288820 objects degraded (0.005%)
 47144186/362288820 objects misplaced (13.013%)
 3708 active+clean
 241  active+remapped+backfill_wait+backfill_toofull
 63   active+remapped+backfill_wait
 11   active+remapped+backfilling
 6active+remapped+backfill_toofull
 1active+remapped+backfilling+forced_backfill
 1active+remapped+forced_backfill+backfill_toofull
 1active+undersized+degraded+remapped+backfilling

  io:
client:   23 MiB/s rd, 252 MiB/s wr, 347 op/s rd, 381 op/s wr
recovery: 194 MiB/s, 112 objects/s
---
ID  CLASS WEIGHTREWEIGHT SIZERAW USE DATAOMAP METAAVAIL
   %USE  VAR  PGS STATUS TYPE NAME
 -1   795.42548- 795 TiB 620 TiB 582 TiB   82 GiB 1.4 TiB  176
TiB 77.90 1.00   -root default
 84   hdd   7.52150  1.0 7.5 TiB 6.8 TiB 6.5 TiB  158 MiB  15 GiB  764
GiB 90.07 1.16 121 up osd.84
 79   hdd   3.63689  1.0 3.6 TiB 3.3 TiB 367 GiB  1.9 GiB 0 B  367
GiB 90.15 1.16  64 up osd.79
 70   hdd   7.27739  1.0 7.3 TiB 6.6 TiB 6.5 TiB  268 MiB  15 GiB  730
GiB 90.20 1.16 121 up osd.70
 82   hdd   3.63689  1.0 3.6 TiB 3.3 TiB 364 GiB  1.1 GiB 0 B  364
GiB 90.23 1.16  59 up osd.82
 89   hdd   7.52150  1.0 7.5 TiB 6.8 TiB 6.6 TiB  395 MiB  16 GiB  735
GiB 90.45 1.16 126 up osd.89
 90   hdd   7.52150  1.0 7.5 TiB 6.8 TiB 6.6 TiB  338 MiB  15 GiB  723
GiB 90.62 1.16 112 up osd.90
 33   hdd   3.73630  1.0 3.7 TiB 3.4 TiB 3.3 TiB  382 MiB 8.6 GiB  358
GiB 90.64 1.16  66 up osd.33
 66   hdd   7.27739  0.95000 7.3 TiB 6.7 TiB 6.7 TiB  313 MiB  16 GiB  605
GiB 91.88 1.18 122 up osd.66
 46   hdd   7.27739  1.0 7.3 TiB 6.7 TiB 6.7 TiB  312 MiB  16 GiB  601
GiB 91.93 1.18 119 up osd.46
105   hdd   3.63869  0.8 3.6 TiB 3.4 TiB 3.4 TiB  206 MiB 8.1 GiB  281
GiB 92.45 1.19  58 up osd.105

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


<    1   2   3   4   >