[ceph-users] Re: Configuring rgw connection timeouts

2022-11-16 Thread Casey Bodley
hi Thilo, you can find a 'request_timeout_ms' frontend option
documented in https://docs.ceph.com/en/quincy/radosgw/frontends/

On Wed, Nov 16, 2022 at 12:32 PM Thilo-Alexander Ginkel
 wrote:
>
> Hi there,
>
> we are using Ceph Quincy's rgw S3 API to retrieve one file ("GET") over a
> longer time period (i.e., reads alternate with periods of no activity).
>
> Eventually the connection is closed by the rgw before the file has been
> completely read.
>
> Is there a way to increase the read (?) timeout to keep the connection
> alive despite the intermittent read inactivity?
>
> Thanks & kind regards,
> Thilo
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Configuring rgw connection timeouts

2022-11-17 Thread Casey Bodley
it doesn't look like cephadm supports extra frontend options during
deployment. but these are stored as part of the `rgw_frontends` config
option, so you can use a command like 'ceph config set' after
deployment to add request_timeout_ms

On Thu, Nov 17, 2022 at 11:18 AM Thilo-Alexander Ginkel
 wrote:
>
> Hi Casey,
>
> one followup question: We are using cephadm to deploy our Ceph cluster. How 
> would we configure the timeout setting using a service spec through cephadm?
>
> Thanks,
> Thilo

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: failure resharding radosgw bucket

2022-11-23 Thread Casey Bodley
hi Jan,

On Wed, Nov 23, 2022 at 12:45 PM Jan Horstmann  wrote:
>
> Hi list,
> I am completely lost trying to reshard a radosgw bucket which fails
> with the error:
>
> process_single_logshard: Error during resharding bucket
> 68ddc61c613a4e3096ca8c349ee37f56/snapshotnfs:(2) No such file or
> directory
>
> But let me start from the beginning. We are running a ceph cluster
> version 15.2.17. Recently we received a health warning because of
> "large omap objects". So I grepped through the logs to get more
> information about the object and then mapped that to a radosgw bucket
> instance ([1]).
> I believe this should normally be handled by dynamic resharding of the
> bucket, which has already been done 23 times for this bucket ([2]).
> For recent resharding tries the radosgw is logging the error mentioned
> at the beginning. I tried to reshard manually by following the process
> in [3], but that consequently leads to the same error.
> When running the reshard with debug options ( --debug-rgw=20 --debug-
> ms=1) I can get some additional insight on where exactly the failure
> occurs:
>
> 2022-11-23T10:41:20.754+ 7f58cf9d2080  1 --
> 10.38.128.3:0/1221656497 -->
> [v2:10.38.128.6:6880/44286,v1:10.38.128.6:6881/44286] --
> osd_op(unknown.0.0:46 5.6 5:66924383:reshard::reshard.05:head
> [call rgw.reshard_get in=149b] snapc 0=[]
> ondisk+read+known_if_redirected e44374) v8 -- 0x56092dd46a10 con
> 0x56092dcfd7a0
> 2022-11-23T10:41:20.754+ 7f58bb889700  1 --
> 10.38.128.3:0/1221656497 <== osd.210 v2:10.38.128.6:6880/44286 4 
> osd_op_reply(46 reshard.05 [call] v0'0 uv1180019 ondisk = -2
> ((2) No such file or directory)) v8  162+0+0 (crc 0 0 0)
> 0x7f58b00dc020 con 0x56092dcfd7a0
>
>
> I am not sure how to interpret this and how to debug this any further.
> Of course I can provide the full output if that helps.
>
> Thanks and regards,
> Jan
>
> [1]
> root@ceph-mon1:~# grep -r 'Large omap object found. Object'
> /var/log/ceph/ceph.log
> 2022-11-15T14:47:28.900679+ osd.47 (osd.47) 10890 : cluster [WRN]
> Large omap object found. Object: 3:9660022b:::.dir.ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19.9:head PG: 3.d4400669 (3.29) Key count:
> 336457 Size (bytes): 117560231
> 2022-11-17T04:51:43.593811+ osd.50 (osd.50) 90 : cluster [WRN]
> Large omap object found. Object: 3:0de49b75:::.dir.ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19.10:head PG: 3.aed927b0 (3.30) Key count:
> 205346 Size (bytes): 71669614
> 2022-11-18T02:55:07.182419+ osd.47 (osd.47) 10917 : cluster [WRN]
> Large omap object found. Object: 3:9660022b:::.dir.ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19.9:head PG: 3.d4400669 (3.29) Key count:
> 449776 Size (bytes): 157310435
> 2022-11-19T09:56:47.630679+ osd.29 (osd.29) 114 : cluster [WRN]
> Large omap object found. Object: 3:61ad76c5:::.dir.ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19.12:head PG: 3.a36eb586 (3.6) Key count:
> 213843 Size (bytes): 74703544
> 2022-11-20T13:04:39.979349+ osd.72 (osd.72) 83 : cluster [WRN]
> Large omap object found. Object: 3:2b3227e7:::.dir.ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19.22:head PG: 3.e7e44cd4 (3.14) Key count:
> 326676 Size (bytes): 114453145
> 2022-11-21T02:53:32.410698+ osd.50 (osd.50) 151 : cluster [WRN]
> Large omap object found. Object: 3:0de49b75:::.dir.ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19.10:head PG: 3.aed927b0 (3.30) Key count:
> 216764 Size (bytes): 75674839
> 2022-11-22T18:04:09.757825+ osd.47 (osd.47) 10964 : cluster [WRN]
> Large omap object found. Object: 3:9660022b:::.dir.ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19.9:head PG: 3.d4400669 (3.29) Key count:
> 449776 Size (bytes): 157310435
> 2022-11-23T00:44:55.316254+ osd.29 (osd.29) 163 : cluster [WRN]
> Large omap object found. Object: 3:61ad76c5:::.dir.ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19.12:head PG: 3.a36eb586 (3.6) Key count:
> 213843 Size (bytes): 74703544
> 2022-11-23T09:10:07.842425+ osd.55 (osd.55) 13968 : cluster [WRN]
> Large omap object found. Object: 3:3fa378c9:::.dir.ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19.20:head PG: 3.931ec5fc (3.3c) Key count:
> 219204 Size (bytes): 76509687
> 2022-11-23T09:11:15.516973+ osd.72 (osd.72) 112 : cluster [WRN]
> Large omap object found. Object: 3:2b3227e7:::.dir.ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19.22:head PG: 3.e7e44cd4 (3.14) Key count:
> 326676 Size (bytes): 114453145
> root@ceph-mon1:~# radosgw-admin metadata list "bucket.instance" | grep
> ee3fa6a3-4af3-4ac2-86c2-d2c374080b54.63073818.19
> "68ddc61c613a4e3096ca8c349ee37f56/snapshotnfs:ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19",
>
> [2]
> root@ceph-mon1:~# radosgw-admin bucket stats --bucket
> 68ddc61c613a4e3096ca8c349ee37f56/snapshotnfs
> {
> "bucket": "snapshotnfs",
> "num_shards": 23,
> "tenant": "68ddc61c613a4e3096ca8c349ee37f56",
> "zonegroup": "bf22bf53-c135-450b-946f-97e16d1bc326",
> "plac

[ceph-users] Re: 16.2.11 pacific QE validation status

2022-12-20 Thread Casey Bodley
thanks Yuri, rgw approved based on today's results from
https://pulpito.ceph.com/yuriw-2022-12-20_15:27:49-rgw-pacific_16.2.11_RC2-distro-default-smithi/

On Mon, Dec 19, 2022 at 12:08 PM Yuri Weinstein  wrote:

> If you look at the pacific 16.2.8 QE validation history (
> https://tracker.ceph.com/issues/55356), we had pacific-x, nautilus-x, and
> pacific-p2p all green with one exception (
> https://tracker.ceph.com/issues/51652)
>
> Now we see so many failures in this point release with references to old
> issues.
>
> Is there anything we can fix to make them less "red"?
>
> Thx
> YuriW
>
> On Thu, Dec 15, 2022 at 2:56 PM Laura Flores  wrote:
>
>> I reviewed the upgrade runs:
>>
>>
>> https://pulpito.ceph.com/yuriw-2022-12-13_15:57:57-upgrade:nautilus-x-pacific_16.2.11_RC-distro-default-smithi/
>>
>> https://pulpito.ceph.com/yuriw-2022-12-13_21:47:46-upgrade:nautilus-x-pacific_16.2.11_RC-distro-default-smithi/
>>
>> https://pulpito.ceph.com/yuriw-2022-12-13_15:58:18-upgrade:octopus-x-pacific_16.2.11_RC-distro-default-smithi/
>>
>> https://pulpito.ceph.com/yuriw-2022-12-14_15:41:10-upgrade:octopus-x-pacific_16.2.11_RC-distro-default-smithi/
>>
>> Failures:
>>   1. https://tracker.ceph.com/issues/50618 -- known bug assigned to
>> Ilya; assuming it's not a big deal since it's been around for over a year
>>
>> Details:
>>   1. qemu_xfstests_luks1 failed on xfstest 168 - Ceph - RBD
>>
>>
>>
>> https://pulpito.ceph.com/yuriw-2022-12-13_15:58:24-upgrade:pacific-p2p-pacific_16.2.11_RC-distro-default-smithi/
>>
>> https://pulpito.ceph.com/yuriw-2022-12-14_15:40:37-upgrade:pacific-p2p-pacific_16.2.11_RC-distro-default-smithi/
>>
>> Failures, unrelated:
>>   1. https://tracker.ceph.com/issues/58223 -- new failure reported by me
>> 7 days ago; seems infrastructure related and not regression-related
>>   2. https://tracker.ceph.com/issues/52590 -- closed by Casey; must not
>> be of importance
>>   3. https://tracker.ceph.com/issues/58289 -- new failure raised by me
>> today; seems related to other "wait_for_recovery" failures, which are
>> generally not cause for concern since they're so infrequent.
>>   4. https://tracker.ceph.com/issues/51652 -- known bug from over a year
>> ago
>>
>> Details;
>>   1. failure on `sudo fuser -v /var/lib/dpkg/lock-frontend` -
>> Infrastructure
>>   2. "[ FAILED ] CmpOmap.cmp_vals_u64_invalid_default" in
>> upgrade:pacific-p2p-pacific - Ceph - RGW
>>   3. "AssertionError: wait_for_recovery: failed before timeout expired"
>> from down pg in pacific-p2p-pacific - Ceph - RADOS
>>   4. heartbeat timeouts on filestore OSDs while deleting objects in
>> upgrade:pacific-p2p-pacific - Ceph - RADOS
>>
>> On Thu, Dec 15, 2022 at 4:34 PM Brad Hubbard  wrote:
>>
>>> On Fri, Dec 16, 2022 at 3:15 AM Yuri Weinstein 
>>> wrote:
>>> >
>>> > Details of this release are summarized here:
>>> >
>>> > https://tracker.ceph.com/issues/58257#note-1
>>> > Release Notes - TBD
>>> >
>>> > Seeking approvals for:
>>> >
>>> > rados - Neha (https://github.com/ceph/ceph/pull/49431 is still being
>>> > tested and will be merged soon)
>>> > rook - Sébastien Han
>>> > cephadm - Adam
>>> > dashboard - Ernesto
>>> > rgw - Casey (rwg will be rerun on the latest SHA1)
>>> > rbd - Ilya, Deepika
>>> > krbd - Ilya, Deepika
>>> > fs - Venky, Patrick
>>> > upgrade/nautilus-x (pacific) - Neha, Laura
>>> > upgrade/octopus-x (pacific) - Neha, Laura
>>> > upgrade/pacific-p2p - Neha - Neha, Laura
>>> > powercycle - Brad
>>>
>>> The failure here is due to fallout from the recent lab issues and was
>>> fixed in main by https://github.com/ceph/ceph/pull/49021 I'm waiting
>>> to see if there are plans to backport this to pacific and quincy since
>>> that will be needed.
>>>
>>> > ceph-volume - Guillaume, Adam K
>>> >
>>> > Thx
>>> > YuriW
>>> >
>>> > ___
>>> > Dev mailing list -- d...@ceph.io
>>> > To unsubscribe send an email to dev-le...@ceph.io
>>>
>>>
>>>
>>> --
>>> Cheers,
>>> Brad
>>>
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>
>>
>> --
>>
>> Laura Flores
>>
>> She/Her/Hers
>>
>> Software Engineer, Ceph Storage
>>
>> Red Hat Inc. 
>>
>> Chicago, IL
>>
>> lflo...@redhat.com
>> M: +17087388804
>> @RedHat    Red Hat
>>   Red Hat
>> 
>> 
>>
>> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.11 pacific QE validation status

2023-01-20 Thread Casey Bodley
On Fri, Jan 20, 2023 at 11:39 AM Yuri Weinstein  wrote:
>
> The overall progress on this release is looking much better and if we
> can approve it we can plan to publish it early next week.
>
> Still seeking approvals
>
> rados - Neha, Laura
> rook - Sébastien Han
> cephadm - Adam
> dashboard - Ernesto
> rgw - Casey

+1 rgw still approved

> rbd - Ilya (full rbd run in progress now)
> krbd - Ilya
> fs - Venky, Patrick
> upgrade/nautilus-x (pacific) - passed thx Adam Kraitman!
> upgrade/octopus-x (pacific) - almost passed, still running 1 job
> upgrade/pacific-p2p - Neha (same as in 16.2.8)
> powercycle - Brad (see new SELinux denials)
>
> On Tue, Jan 17, 2023 at 10:45 AM Yuri Weinstein  wrote:
> >
> > OK I will rerun failed jobs filtering rhel in
> >
> > Thx!
> >
> > On Tue, Jan 17, 2023 at 10:43 AM Adam Kraitman  wrote:
> > >
> > > Hey the satellite issue was fixed
> > >
> > > Thanks
> > >
> > > On Tue, Jan 17, 2023 at 7:43 PM Laura Flores  wrote:
> > >>
> > >> This was my summary of rados failures. There was nothing new or amiss,
> > >> although it is important to note that runs were done with filtering out
> > >> rhel 8.
> > >>
> > >> I will leave it to Neha for final approval.
> > >>
> > >> Failures:
> > >> 1. https://tracker.ceph.com/issues/58258
> > >> 2. https://tracker.ceph.com/issues/58146
> > >> 3. https://tracker.ceph.com/issues/58458
> > >> 4. https://tracker.ceph.com/issues/57303
> > >> 5. https://tracker.ceph.com/issues/54071
> > >>
> > >> Details:
> > >> 1. rook: kubelet fails from connection refused - Ceph - Orchestrator
> > >> 2. test_cephadm.sh: Error: Error initializing source docker://
> > >> quay.ceph.io/ceph-ci/ceph:master - Ceph - Orchestrator
> > >> 3. qa/workunits/post-file.sh: postf...@drop.ceph.com: Permission 
> > >> denied
> > >> - Ceph
> > >> 4. rados/cephadm: Failed to fetch package version from
> > >> https://shaman.ceph.com/api/search/?status=ready&project=ceph&flavor=default&distros=ubuntu%2F22.04%2Fx86_64&sha1=b34ca7d1c2becd6090874ccda56ef4cd8dc64bf7
> > >> - Ceph - Orchestrator
> > >> 5. rados/cephadm/osds: Invalid command: missing required parameter
> > >> hostname() - Ceph - Orchestrator
> > >>
> > >> On Tue, Jan 17, 2023 at 9:48 AM Yuri Weinstein  
> > >> wrote:
> > >>
> > >> > Please see the test results on the rebased RC 6.6 in this comment:
> > >> >
> > >> > https://tracker.ceph.com/issues/58257#note-2
> > >> >
> > >> > We're still having infrastructure issues making testing difficult.
> > >> > Therefore all reruns were done excluding the rhel 8 distro
> > >> > ('--filter-out rhel_8')
> > >> >
> > >> > Also, the upgrades failed and Adam is looking into this.
> > >> >
> > >> > Seeking new approvals
> > >> >
> > >> > rados - Neha, Laura
> > >> > rook - Sébastien Han
> > >> > cephadm - Adam
> > >> > dashboard - Ernesto
> > >> > rgw - Casey
> > >> > rbd - Ilya
> > >> > krbd - Ilya
> > >> > fs - Venky, Patrick
> > >> > upgrade/nautilus-x (pacific) - Adam Kraitman
> > >> > upgrade/octopus-x (pacific) - Adam Kraitman
> > >> > upgrade/pacific-p2p - Neha - Adam Kraitman
> > >> > powercycle - Brad
> > >> >
> > >> > Thx
> > >> >
> > >> > On Fri, Jan 6, 2023 at 8:37 AM Yuri Weinstein  
> > >> > wrote:
> > >> > >
> > >> > > Happy New Year all!
> > >> > >
> > >> > > This release remains to be in "progress"/"on hold" status as we are
> > >> > > sorting all infrastructure-related issues.
> > >> > >
> > >> > > Unless I hear objections, I suggest doing a full rebase/retest QE
> > >> > > cycle (adding PRs merged lately) since it's taking much longer than
> > >> > > anticipated when sepia is back online.
> > >> > >
> > >> > > Objections?
> > >> > >
> > >> > > Thx
> > >> > > YuriW
> > >> > >
> > >> > > On Thu, Dec 15, 2022 at 9:14 AM Yuri Weinstein 
> > >> > wrote:
> > >> > > >
> > >> > > > Details of this release are summarized here:
> > >> > > >
> > >> > > > https://tracker.ceph.com/issues/58257#note-1
> > >> > > > Release Notes - TBD
> > >> > > >
> > >> > > > Seeking approvals for:
> > >> > > >
> > >> > > > rados - Neha (https://github.com/ceph/ceph/pull/49431 is still 
> > >> > > > being
> > >> > > > tested and will be merged soon)
> > >> > > > rook - Sébastien Han
> > >> > > > cephadm - Adam
> > >> > > > dashboard - Ernesto
> > >> > > > rgw - Casey (rwg will be rerun on the latest SHA1)
> > >> > > > rbd - Ilya, Deepika
> > >> > > > krbd - Ilya, Deepika
> > >> > > > fs - Venky, Patrick
> > >> > > > upgrade/nautilus-x (pacific) - Neha, Laura
> > >> > > > upgrade/octopus-x (pacific) - Neha, Laura
> > >> > > > upgrade/pacific-p2p - Neha - Neha, Laura
> > >> > > > powercycle - Brad
> > >> > > > ceph-volume - Guillaume, Adam K
> > >> > > >
> > >> > > > Thx
> > >> > > > YuriW
> > >> > ___
> > >> > Dev mailing list -- d...@ceph.io
> > >> > To unsubscribe send an email to dev-le...@ceph.io
> > >> >
> > >>
> > >>
> > >> --
> > >>
> > >> Laura Flores
> > >>
> > >> She/Her/Hers
> > >>
> > >> Software Engineer,

[ceph-users] CLT meeting summary 2023-02-01

2023-02-01 Thread Casey Bodley
distro testing for reef
* https://github.com/ceph/ceph/pull/49443 adds centos9 and ubuntu22 to
supported distros
* centos9 blocked by teuthology bug https://tracker.ceph.com/issues/58491
  - lsb_release command no longer exists, use /etc/os-release instead
  - ceph stopped depending on lsb_release in 2021 with
https://github.com/ceph/ceph/pull/42770
* ubuntu22 not blocked by teuthology, but the new python version
breaks most of the rgw tests

can we drop centos8 or ubuntu20 support for reef?
* we usually support the latest centos and two ubuntu LTSs
* users need an upgrade path that doesn't require OS and ceph upgrade
at the same time
* we might be able to drop centos8 support for Reef by adding centos9
support to Quincy
* python versioning issues make longer-term support of older distros
problematic. related work:
  - https://github.com/ceph/ceph/pull/41979
  - https://github.com/ceph/ceph/pull/47501

ondisk format changes in minor releases
* https://github.com/ceph/ceph/pull/48915 introduced some BlueFS log
changes in 16.2.11 which makes it incompatible with previous Pacific
releases. Hence no downgrade is permitted any more.
  - doc text tracked in https://tracker.ceph.com/issues/58625
* how do we prevent these issues in the future?
  - better testing of mixed-version rgw/mds/mgr/etc

infrastructure update
* a planned network outage yesterday still affecting LRC
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Migrate a bucket from replicated pool to ec pool

2023-02-11 Thread Casey Bodley
hi Boris,

On Sat, Feb 11, 2023 at 7:07 AM Boris Behrens  wrote:
>
> Hi,
> we use rgw as our backup storage, and it basically holds only compressed
> rbd snapshots.
> I would love to move these out of the replicated into a ec pool.
>
> I've read that I can set a default placement target for a user (
> https://docs.ceph.com/en/octopus/radosgw/placement/). What does happen to
> the existing user data?

changes to the user's default placement target/storage class don't
apply to existing buckets, only newly-created ones. a bucket's default
placement target/storage class can't be changed after creation

>
> How do I move the existing data to the new pool?

you might add the EC pool as a new storage class in the existing
placement target, and use lifecycle transitions to move the objects.
but the bucket's default storage class would still be replicated, so
new uploads would go there unless the client adds a
x-amz-storage-class header to override it. if you want to change those
defaults, you'd need to create a new bucket and copy the objects over

> Does it somehow interfere with ongoing data upload (it is one internal
> user, with 800 buckets which constantly get new data and old data removed)?

lifecycle transitions would be transparent to the user, but migrating
to new buckets would not

>
> Cheers
>  Boris
>
> ps: Can't wait to see some of you at the cephalocon :)
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Migrate a bucket from replicated pool to ec pool

2023-02-13 Thread Casey Bodley
On Mon, Feb 13, 2023 at 4:31 AM Boris Behrens  wrote:
>
> Hi Casey,
>
>> changes to the user's default placement target/storage class don't
>> apply to existing buckets, only newly-created ones. a bucket's default
>> placement target/storage class can't be changed after creation
>
>
> so I can easily update the placement rules for this user and can migrate 
> existing buckets one at a time. Very cool. Thanks
>
>>
>> you might add the EC pool as a new storage class in the existing
>> placement target, and use lifecycle transitions to move the objects.
>> but the bucket's default storage class would still be replicated, so
>> new uploads would go there unless the client adds a
>> x-amz-storage-class header to override it. if you want to change those
>> defaults, you'd need to create a new bucket and copy the objects over
>
>
> Can you link me to documentation. It might be the monday, but I do not 
> understand that totally.

https://docs.ceph.com/en/octopus/radosgw/placement/#adding-a-storage-class
should cover the addition of a new storage class for your EC pool

>
> Do you know how much more CPU/RAM EC takes, and when (putting, reading, 
> deleting objects, recovering OSD failure)?

i don't have any data on that myself. maybe others on the list can share theirs?

>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im 
> groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [RGW - octopus] too many omapkeys on versioned bucket

2023-02-13 Thread Casey Bodley
On Mon, Feb 13, 2023 at 8:41 AM Boris Behrens  wrote:
>
> I've tried it the other way around and let cat give out all escaped chars
> and the did the grep:
>
> # cat -A omapkeys_list | grep -aFn '/'
> 9844:/$
> 9845:/^@v913^@$
> 88010:M-^@1000_/^@$
> 128981:M-^@1001_/$
>
> Did anyone ever saw something like this?
>
> Am Mo., 13. Feb. 2023 um 14:31 Uhr schrieb Boris Behrens :
>
> > So here is some more weirdness:
> > I've piped a list of all omapkeys into a file: (dedacted customer data
> > with placeholders in <>)
> >
> > # grep -aFn '//' omapkeys_list
> > 9844://
> > 9845://v913
> > 88010:�1000_//
> > 128981:�1001_//
> >
> > # grep -aFn '/'
> > omapkeys_list
> > 
> >
> > # vim omapkeys_list +88010 (copy pasted from terminal)
> > <80>1000_//^@
> >
> > Any idea what this is?
> >
> > Am Mo., 13. Feb. 2023 um 13:57 Uhr schrieb Boris Behrens :
> >
> >> Hi,
> >> I have one bucket that showed up with a large omap warning, but the
> >> amount of objects in the bucket, does not align with the amount of omap
> >> keys. The bucket is sharded to get rid of the "large omapkeys" warning.
> >>
> >> I've counted all the omapkeys of one bucket and it came up with 33.383.622
> >> (rados -p INDEXPOOL listomapkeys INDEXOBJECT | wc -l)
> >> I've checked the amount of actual rados objects and it came up with
> >> 17.095.877
> >> (rados -p DATAPOOL ls | grep BUCKETMARKER | wc -l)
> >> I've checked the bucket index and it came up with 16.738.482
> >> (radosgw-admin bi list --bucket BUCKET | grep -F '"idx":' | wc -l)
> >>
> >> I have tried to fix it with
> >> radosgw-admin bucket check --check-objects --fix --bucket BUCKET
> >> but this did not change anything.
> >>
> >> Is this a known bug or might there be something else going on. How can I
> >> investigate further?
> >>
> >> Cheers
> >>  Boris
> >> --
> >> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> >> groüen Saal.
> >>
> >
> >
> > --
> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> > groüen Saal.
> >
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

hi Boris,

the bucket index is more complicated for versioned buckets than normal
ones. i wrote a high-level summary of this in
https://docs.ceph.com/en/latest/dev/radosgw/bucket_index/#s3-object-versioning

each object version may have additional keys starting with 1000_ and
1001_. the keys starting with 1000_ are sorted by time (most recent
version first), and the 1001_ keys correspond to the ‘olh' entry. the
output of `radosgw-admin bi list` should distinguish between these
index entry types using the names "plain", "instance", and "olh"

it's hard to tell from your email whether there's anything wrong, but
i hope this helps with your debugging
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OpenSSL in librados

2023-02-26 Thread Casey Bodley
On Sun, Feb 26, 2023 at 8:20 AM Ilya Dryomov  wrote:
>
> On Sun, Feb 26, 2023 at 2:15 PM Patrick Schlangen  
> wrote:
> >
> > Hi Ilya,
> >
> > > Am 26.02.2023 um 14:05 schrieb Ilya Dryomov :
> > >
> > > Isn't OpenSSL 1.0 long out of support?  I'm not sure if extending
> > > librados API to support a workaround for something that went EOL over
> > > three years ago is worth it.
> >
> > fair point. However, as long as ceph still supports compiling against 
> > OpenSSL 1.0 and has special code paths to initialize OpenSSL for versions 
> > <= 1.0, I think this should be fixed. The other option would be to remove 
> > OpenSSL 1.0 support completely.
> >
> > What do you think?
>
> Removing OpenSSL 1.0 support is fine with me but it would need a wider
> discussion.  I'm CCing the development list.
>
> Thanks,
>
> Ilya
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

if librados still works with openssl 1.0 when you're not using it
elsewhere in the process, i don't see a compelling reason to break
that. maybe just add a #warning about it to librados.h?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CompleteMultipartUploadResult has empty ETag response

2023-02-28 Thread Casey Bodley
On Tue, Feb 28, 2023 at 8:19 AM Lars Dunemark  wrote:
>
> Hi,
>
> I notice that CompleteMultipartUploadResult does return an empty ETag
> field when completing an multipart upload in v17.2.3.
>
> I haven't had the possibility to verify from which version this changed
> and can't find in the changelog that it should be fixed in newer version.
>
> The response looks like:
>
> 
> http://s3.amazonaws.com/doc/2006-03-01/ 
>  ">
>  s3.myceph.com/test-bucket/test.file
>  test-bucket
>  test.file
>  
> 
>
> I have found a old issue that is closed around 9 years ago with the same
> issue so I guess that this has been fixed before.
> https://tracker.ceph.com/issues/6830 
>
> It looks like my account to the tracker is still not activated so I
> can't create or comment on the issue.

thanks Lars, i've opened https://tracker.ceph.com/issues/58879 to
track the regression

>
> Best regards,
> Lars Dunemark
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.6 QE Validation status

2023-03-22 Thread Casey Bodley
On Tue, Mar 21, 2023 at 4:06 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/59070#note-1
> Release Notes - TBD
>
> The reruns were in the queue for 4 days because of some slowness issues.
> The core team (Neha, Radek, Laura, and others) are trying to narrow
> down the root cause.
>
> Seeking approvals/reviews for:
>
> rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to test
> and merge at least one PR https://github.com/ceph/ceph/pull/50575 for
> the core)
> rgw - Casey

there were some java_s3test failures related to
https://tracker.ceph.com/issues/58554. i've added the fix to
https://github.com/ceph/java_s3tests/commits/ceph-quincy, so a rerun
should resolve those failures
there were also some 'Failed to fetch package version' failures in the
rerun that warranted another rerun anyway

there's also an urgent priority bug fix in
https://github.com/ceph/ceph/pull/50625 that i'd really like to add to
this release; sorry for the late notice

> fs - Venky (the fs suite has an unusually high amount of failed jobs,
> any reason to suspect it in the observed slowness?)
> orch - Adam King
> rbd - Ilya
> krbd - Ilya
> upgrade/octopus-x - Laura is looking into failures
> upgrade/pacific-x - Laura is looking into failures
> upgrade/quincy-p2p - Laura is looking into failures
> client-upgrade-octopus-quincy-quincy - missing packages, Adam Kraitman
> is looking into it
> powercycle - Brad
> ceph-volume - needs a rerun on merged
> https://github.com/ceph/ceph-ansible/pull/7409
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Also, share any findings or hypnosis about the slowness in the
> execution of the suite.
>
> Josh, Neha - gibba and LRC upgrades pending major suites approvals.
> RC release - pending major suites approvals.
>
> Thx
> YuriW
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Mgr/Dashboard Python depedencies: a new approach

2023-03-23 Thread Casey Bodley
hi Ernesto and lists,

> [1] https://github.com/ceph/ceph/pull/47501

are we planning to backport this to quincy so we can support centos 9
there? enabling that upgrade path on centos 9 was one of the
conditions for dropping centos 8 support in reef, which i'm still keen
to do

if not, can we find another resolution to
https://tracker.ceph.com/issues/58832? as i understand it, all of
those python packages exist in centos 8. do we know why they were
dropped for centos 9? have we looked into making those available in
epel? (cc Ken and Kaleb)

On Fri, Sep 2, 2022 at 12:01 PM Ernesto Puerta  wrote:
>
> Hi Kevin,
>
>>
>> Isn't this one of the reasons containers were pushed, so that the packaging 
>> isn't as big a deal?
>
>
> Yes, but the Ceph community has a strong commitment to provide distro 
> packages for those users who are not interested in moving to containers.
>
>> Is it the continued push to support lots of distros without using containers 
>> that is the problem?
>
>
> If not a problem, it definitely makes it more challenging. Compiled 
> components often sort this out by statically linking deps whose packages are 
> not widely available in distros. The approach we're proposing here would be 
> the closest equivalent to static linking for interpreted code (bundling).
>
> Thanks for sharing your questions!
>
> Kind regards,
> Ernesto
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.6 QE Validation status

2023-03-23 Thread Casey Bodley
On Wed, Mar 22, 2023 at 9:27 AM Casey Bodley  wrote:
>
> On Tue, Mar 21, 2023 at 4:06 PM Yuri Weinstein  wrote:
> >
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/59070#note-1
> > Release Notes - TBD
> >
> > The reruns were in the queue for 4 days because of some slowness issues.
> > The core team (Neha, Radek, Laura, and others) are trying to narrow
> > down the root cause.
> >
> > Seeking approvals/reviews for:
> >
> > rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to test
> > and merge at least one PR https://github.com/ceph/ceph/pull/50575 for
> > the core)
> > rgw - Casey
>
> there were some java_s3test failures related to
> https://tracker.ceph.com/issues/58554. i've added the fix to
> https://github.com/ceph/java_s3tests/commits/ceph-quincy, so a rerun
> should resolve those failures
> there were also some 'Failed to fetch package version' failures in the
> rerun that warranted another rerun anyway
>
> there's also an urgent priority bug fix in
> https://github.com/ceph/ceph/pull/50625 that i'd really like to add to
> this release; sorry for the late notice

this fix merged, so rgw is now approved. thanks Yuri

>
> > fs - Venky (the fs suite has an unusually high amount of failed jobs,
> > any reason to suspect it in the observed slowness?)
> > orch - Adam King
> > rbd - Ilya
> > krbd - Ilya
> > upgrade/octopus-x - Laura is looking into failures
> > upgrade/pacific-x - Laura is looking into failures
> > upgrade/quincy-p2p - Laura is looking into failures
> > client-upgrade-octopus-quincy-quincy - missing packages, Adam Kraitman
> > is looking into it
> > powercycle - Brad
> > ceph-volume - needs a rerun on merged
> > https://github.com/ceph/ceph-ansible/pull/7409
> >
> > Please reply to this email with approval and/or trackers of known
> > issues/PRs to address them.
> >
> > Also, share any findings or hypnosis about the slowness in the
> > execution of the suite.
> >
> > Josh, Neha - gibba and LRC upgrades pending major suites approvals.
> > RC release - pending major suites approvals.
> >
> > Thx
> > YuriW
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.6 QE Validation status

2023-03-27 Thread Casey Bodley
On Fri, Mar 24, 2023 at 3:46 PM Yuri Weinstein  wrote:
>
> Details of this release are updated here:
>
> https://tracker.ceph.com/issues/59070#note-1
> Release Notes - TBD
>
> The slowness we experienced seemed to be self-cured.
> Neha, Radek, and Laura please provide any findings if you have them.
>
> Seeking approvals/reviews for:
>
> rados - Neha, Radek, Travis, Ernesto, Adam King (rerun on Build 2 with
> PRs merged on top of quincy-release)
> rgw - Casey (rerun on Build 2 with PRs merged on top of quincy-release)

rgw approved

> fs - Venky
>
> upgrade/octopus-x - Neha, Laura (package issue Adam Kraitman any updates?)
> upgrade/pacific-x - Neha, Laura, Ilya see 
> https://tracker.ceph.com/issues/58914
> upgrade/quincy-p2p - Neha, Laura
> client-upgrade-octopus-quincy-quincy - Neha, Laura (package issue Adam
> Kraitman any updates?)
> powercycle - Brad
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Josh, Neha - gibba and LRC upgrades pending major suites approvals.
> RC release - pending major suites approvals.
>
> On Tue, Mar 21, 2023 at 1:04 PM Yuri Weinstein  wrote:
> >
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/59070#note-1
> > Release Notes - TBD
> >
> > The reruns were in the queue for 4 days because of some slowness issues.
> > The core team (Neha, Radek, Laura, and others) are trying to narrow
> > down the root cause.
> >
> > Seeking approvals/reviews for:
> >
> > rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to test
> > and merge at least one PR https://github.com/ceph/ceph/pull/50575 for
> > the core)
> > rgw - Casey
> > fs - Venky (the fs suite has an unusually high amount of failed jobs,
> > any reason to suspect it in the observed slowness?)
> > orch - Adam King
> > rbd - Ilya
> > krbd - Ilya
> > upgrade/octopus-x - Laura is looking into failures
> > upgrade/pacific-x - Laura is looking into failures
> > upgrade/quincy-p2p - Laura is looking into failures
> > client-upgrade-octopus-quincy-quincy - missing packages, Adam Kraitman
> > is looking into it
> > powercycle - Brad
> > ceph-volume - needs a rerun on merged
> > https://github.com/ceph/ceph-ansible/pull/7409
> >
> > Please reply to this email with approval and/or trackers of known
> > issues/PRs to address them.
> >
> > Also, share any findings or hypnosis about the slowness in the
> > execution of the suite.
> >
> > Josh, Neha - gibba and LRC upgrades pending major suites approvals.
> > RC release - pending major suites approvals.
> >
> > Thx
> > YuriW
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Mgr/Dashboard Python depedencies: a new approach

2023-03-27 Thread Casey Bodley
i would hope that packaging for epel9 would be relatively easy, given
that the epel8 packages already exist. as a first step, we'd need to
build a full list of the missing packages. the tracker issue only
complains about python3-asyncssh python3-pecan and python3-routes, but
some of their dependencies may be missing too

On Mon, Mar 27, 2023 at 3:06 PM Ken Dreyer  wrote:
>
> I hope we don't backport such a big change to Quincy. That will have a
> large impact on how we build in restricted environments with no
> internet access.
>
> We could get the missing packages into EPEL.
>
> - Ken
>
> On Fri, Mar 24, 2023 at 7:32 AM Ernesto Puerta  wrote:
> >
> > Hi Casey,
> >
> > The original idea was to leave this to Reef alone, but given that the 
> > CentOS 9 Quincy release is also blocked by missing Python packages, I think 
> > that it'd make sense to backport it.
> >
> > I'm coordinating with Pere (in CC) to expedite this. We may need help to 
> > troubleshoot Shaman/rpmbuild issues. Who would be the best one to help with 
> > that?
> >
> > Regarding your last question, I don't know who's the maintainer of those 
> > packages in EPEL. There's this BZ (https://bugzilla.redhat.com/2166620) 
> > requesting that specific package, but that's only one out of the dozen of 
> > missing packages (plus transitive dependencies)...
> >
> > Kind Regards,
> > Ernesto
> >
> >
> > On Thu, Mar 23, 2023 at 2:19 PM Casey Bodley  wrote:
> >>
> >> hi Ernesto and lists,
> >>
> >> > [1] https://github.com/ceph/ceph/pull/47501
> >>
> >> are we planning to backport this to quincy so we can support centos 9
> >> there? enabling that upgrade path on centos 9 was one of the
> >> conditions for dropping centos 8 support in reef, which i'm still keen
> >> to do
> >>
> >> if not, can we find another resolution to
> >> https://tracker.ceph.com/issues/58832? as i understand it, all of
> >> those python packages exist in centos 8. do we know why they were
> >> dropped for centos 9? have we looked into making those available in
> >> epel? (cc Ken and Kaleb)
> >>
> >> On Fri, Sep 2, 2022 at 12:01 PM Ernesto Puerta  wrote:
> >> >
> >> > Hi Kevin,
> >> >
> >> >>
> >> >> Isn't this one of the reasons containers were pushed, so that the 
> >> >> packaging isn't as big a deal?
> >> >
> >> >
> >> > Yes, but the Ceph community has a strong commitment to provide distro 
> >> > packages for those users who are not interested in moving to containers.
> >> >
> >> >> Is it the continued push to support lots of distros without using 
> >> >> containers that is the problem?
> >> >
> >> >
> >> > If not a problem, it definitely makes it more challenging. Compiled 
> >> > components often sort this out by statically linking deps whose packages 
> >> > are not widely available in distros. The approach we're proposing here 
> >> > would be the closest equivalent to static linking for interpreted code 
> >> > (bundling).
> >> >
> >> > Thanks for sharing your questions!
> >> >
> >> > Kind regards,
> >> > Ernesto
> >> > ___
> >> > Dev mailing list -- d...@ceph.io
> >> > To unsubscribe send an email to dev-le...@ceph.io
> >>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW don't use .rgw.root multisite configuration

2023-04-11 Thread Casey Bodley
there's a rgw_period_root_pool option for the period objects too. but
it shouldn't be necessary to override any of these

On Sun, Apr 9, 2023 at 11:26 PM  wrote:
>
> Up :)
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph 17.2.6 and iam roles (pr#48030)

2023-04-11 Thread Casey Bodley
On Tue, Apr 11, 2023 at 3:19 PM Christopher Durham  wrote:
>
>
> Hi,
> I see that this PR: https://github.com/ceph/ceph/pull/48030
> made it into ceph 17.2.6, as per the change log  at: 
> https://docs.ceph.com/en/latest/releases/quincy/  That's great.
> But my scenario is as follows:
> I have two clusters set up as multisite. Because of  the lack of replication 
> for IAM roles, we have set things up so that roles on the primary 'manually' 
> get replicated to the secondary site via a python script. Thus, if I create a 
> role on the primary, add/delete users or buckets from said role, the role, 
> including the AssumeRolePolicyDocument and policies, gets pushed to the 
> replicated site. This has served us well for three years.
> With the advent of this fix, what should I do before I upgrade to 17.2.6 
> (currently on 17.2.5, rocky 8)
>
> I know that in my situation, roles of the same name have different RoleIDs on 
> the two sites. What should I do before I upgrade? Possibilities that *could* 
> happen if i dont rectify things as we upgrade:
> 1. The different RoleIDs lead to two roles of the same name on the replicated 
> site, perhaps with the system unable to address/look at/modify either
> 2. Roles just don't get repiicated to the second site

no replication would happen until the metadata changes again on the
primary zone. once that gets triggered, the role metadata would
probably fail to sync due to the name conflicts

>
> or other similar situations, all of which I want to avoid.
> Perhaps the safest thing to do is to remove all roles on the secondary site, 
> upgrade, and then force a replication of roles (How would I *force* that for 
> iAM roles if it is the correct answer?)

this removal will probably be necessary to avoid those conflicts. once
that's done, you can force a metadata full sync on the secondary zone
by running 'radosgw-admin metadata sync init' there, then restarting
its gateways. this will have to resync all of the bucket and user
metadata as well

> Here is the original bug report:
>
> https://tracker.ceph.com/issues/57364
> Thanks!
> -Chris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph 17.2.6 and iam roles (pr#48030)

2023-04-11 Thread Casey Bodley
On Tue, Apr 11, 2023 at 3:53 PM Casey Bodley  wrote:
>
> On Tue, Apr 11, 2023 at 3:19 PM Christopher Durham  wrote:
> >
> >
> > Hi,
> > I see that this PR: https://github.com/ceph/ceph/pull/48030
> > made it into ceph 17.2.6, as per the change log  at: 
> > https://docs.ceph.com/en/latest/releases/quincy/  That's great.
> > But my scenario is as follows:
> > I have two clusters set up as multisite. Because of  the lack of 
> > replication for IAM roles, we have set things up so that roles on the 
> > primary 'manually' get replicated to the secondary site via a python 
> > script. Thus, if I create a role on the primary, add/delete users or 
> > buckets from said role, the role, including the AssumeRolePolicyDocument 
> > and policies, gets pushed to the replicated site. This has served us well 
> > for three years.
> > With the advent of this fix, what should I do before I upgrade to 17.2.6 
> > (currently on 17.2.5, rocky 8)
> >
> > I know that in my situation, roles of the same name have different RoleIDs 
> > on the two sites. What should I do before I upgrade? Possibilities that 
> > *could* happen if i dont rectify things as we upgrade:
> > 1. The different RoleIDs lead to two roles of the same name on the 
> > replicated site, perhaps with the system unable to address/look at/modify 
> > either
> > 2. Roles just don't get repiicated to the second site
>
> no replication would happen until the metadata changes again on the
> primary zone. once that gets triggered, the role metadata would
> probably fail to sync due to the name conflicts
>
> >
> > or other similar situations, all of which I want to avoid.
> > Perhaps the safest thing to do is to remove all roles on the secondary 
> > site, upgrade, and then force a replication of roles (How would I *force* 
> > that for iAM roles if it is the correct answer?)
>
> this removal will probably be necessary to avoid those conflicts. once
> that's done, you can force a metadata full sync on the secondary zone
> by running 'radosgw-admin metadata sync init' there, then restarting
> its gateways. this will have to resync all of the bucket and user
> metadata as well

p.s. don't use the DeleteRole rest api on the secondary zone after
upgrading, as the request would get forwarded to the primary zone and
delete it there too. you can use 'radosgw-admin role delete' on the
secondary instead

>
> > Here is the original bug report:
> >
> > https://tracker.ceph.com/issues/57364
> > Thanks!
> > -Chris
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Rados gateway data-pool replacement.

2023-04-19 Thread Casey Bodley
On Wed, Apr 19, 2023 at 5:13 AM Gaël THEROND  wrote:
>
> Hi everyone, quick question regarding radosgw zone data-pool.
>
> I’m currently planning to migrate an old data-pool that was created with
> inappropriate failure-domain to a newly created pool with appropriate
> failure-domain.
>
> If I’m doing something like:
> radosgw-admin zone modify —rgw-zone default —data-pool 
>
> Will data from the old pool be migrated to the new one or do I need to do
> something else to migrate those data out of the old pool?

radosgw won't migrate anything. you'll need to use rados tools to do
that first. make sure you stop all radosgws in the meantime so it
doesn't write more objects to the old data pool

> I’ve read a lot
> of mail archive with peoples willing to do that but I can’t get a clear
> answer from those archives.
>
> I’m running on nautilus release of it ever help.
>
> Thanks a lot!
>
> PS: This mail is a redo of the old one as I’m not sure the former one
> worked (missing tags).
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy user metadata constantly changing versions on multisite slave with radosgw roles

2023-04-20 Thread Casey Bodley
On Wed, Apr 19, 2023 at 7:55 PM Christopher Durham  wrote:
>
> Hi,
>
> I am using 17.2.6 on rocky linux for both the master and the slave site
> I noticed that:
> radosgw-admin sync status
> often shows that the metadata sync is behind a minute or two on the slave. 
> This didn't make sense, as the metadata isn't changing as far as I know.
> radosgw-admin mdlog list
>
> (on slave) showed me that there were user changes in metadata very often. 
> After doing a little research, here is a scenario I was able to develop:
>
> 1. user continually writes to a bucket he owns, pointing his aws cli (for 
> this test) to the master side endpoint as specified in  ~/.aws/config
>
> while this is running, do the following on the slave:
> radosgw-admin metadata get user:
> This always shows the same result for the user, no changes. The data gets to 
> the slave side bucket.
>
> 2. Restart the continual copy, but this time use a role that the user is a 
> member of via profile in ~/.aws/credentials and .~/aws/config, again writing 
> to the master endpoint as specified in ~/.aws/config
>
> aws --profile  s3 cp  s3:///file
> where profile is set up to use a role definiiton. The data gets to the bucket 
> on both sides. I do not have access if I do not use the role  profille (to 
> confirm I set it up right) However, while doing this second test, if I 
> continually do:
> radosgw-admin metadata get user:
> on the slave, I see a definite increase in versions. Here is a section of the 
> json output:
>
> "key": "user:",
> "ver": {   "tag": "somestring",   "ver:" 12145}
> the 12145 value increases over and over again, and the mtime value in the 
> json output increases too based on the current date. (not shown here).  The 
> same value, when queried on the master side, remains 1, and the mtime value 
> is the date the user was created or last changed by an admin. If I write a 
> file only once, the vers value increases by 1 too, but not sure if the 
> increase in vers is ncessarily 1:1 with the number of writes. This seems to 
> be the source of my continual metadata lag.  Am I missing something? I 
> suspect that this has been happening for awhile and not specific to 17.2.6 as 
> I just upgraded and the ver value is over 12000 for the user that I 
> discovered. (I used a python script to sync roles between master and slave 
> prior to 17.2.6. Now roles are replicated in 17.2.6).
> -Chris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

thanks Chris,

it looks like AssumeRole is writing to the user metadata
unnecessarily. i opened https://tracker.ceph.com/issues/59495 to track
this
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can I delete rgw log entries?

2023-04-20 Thread Casey Bodley
On Sun, Apr 16, 2023 at 11:47 PM Richard Bade  wrote:
>
> Hi Everyone,
> I've been having trouble finding an answer to this question. Basically
> I'm wanting to know if stuff in the .log pool is actively used for
> anything or if it's just logs that can be deleted.
> In particular I was wondering about sync logs.
> In my particular situation I have had some tests of zone sync setup,
> but now I've removed the secondary zone and pools. My primary zone is
> filled with thousands of logs like this:
> data_log.71
> data.full-sync.index.e2cf2c3e-7870-4fc4-8ab9-d78a17263b4f.47
> meta.full-sync.index.7
> datalog.sync-status.shard.e2cf2c3e-7870-4fc4-8ab9-d78a17263b4f.13
> bucket.sync-status.f3113d30-ecd3-4873-8537-aa006e54b884:{bucketname}:default.623958784.455
>
> I assume that because I'm not doing any sync anymore I can delete all
> the sync related logs? Is anyone able to confirm this?

yes

> What about if the sync is running? Are these being written and read
> from and therefore must be left alone?

right. while a multisite configuration is operating, the replication
logs will be trimmed in the background. in addition to the replication
logs, the log pool also contains sync status objects. these track the
progress of replication, and removing those objects would generally
cause sync to start over from the beginning

> It seems like these are more of a status than just a log and that
> deleting them might confuse the sync process. If so, does that mean
> that the log pool is not just output that can be removed as needed?
> Are there perhaps other things in there that need to stay?

the log pool is used by several subsystems like multisite sync,
garbage collection, bucket notifications, and lifecycle. those
features won't work reliably if you delete their rados objects

>
> Regards,
> Richard
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Leadership Team meeting minutes - 2023 April 26

2023-04-26 Thread Casey Bodley
# ceph windows tests
PR check will be made required once regressions are fixed
windows build currently depends on gcc11 which limits use of c++20
features. investigating newer gcc or clang toolchain

# 16.2.13 release
final testing in progress

# prometheus metric regressions
https://tracker.ceph.com/issues/59505
related to previous discussion on 4/12 about quincy backports
integration test coverage needed for ceph-exporter and the mgr module

# lab update
centos/rhel tests were failing due to problematic mirrorlists
fixed in https://github.com/ceph/ceph-cm-ansible/pull/731
more sanity checks in progress at
https://github.com/ceph/ceph-cm-ansible/pull/733

# cephalocon feedback
dev summit etherpads: https://pad.ceph.com/p/cephalocon-dev-summit-2023
collect more notes here: https://pad.ceph.com/p/cephalocon-2023-brainstorm

request for dev-focused longer term discussion
could have specific user-focused and dev-focused sessions
dense conference, hard to fit everything in 3 days
could have longer component updates during conf, with time for questions
perhaps 3 days of conf, dev-specific discussions a day before (no cfp,
one big room, then option for breakout), user-feedback sessions during
the normal con
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Mgr/Dashboard Python depedencies: a new approach

2023-04-26 Thread Casey Bodley
are there any volunteers willing to help make these python packages
available upstream?

On Tue, Mar 28, 2023 at 5:34 AM Ernesto Puerta  wrote:
>
> Hey Ken,
>
> This change doesn't not involve any further internet access other than the 
> already required for the "make dist" stage (e.g.: npm packages). That said, 
> where feasible, I also prefer to keep the current approach for a minor 
> version.
>
> Kind Regards,
> Ernesto
>
>
> On Mon, Mar 27, 2023 at 9:06 PM Ken Dreyer  wrote:
>>
>> I hope we don't backport such a big change to Quincy. That will have a
>> large impact on how we build in restricted environments with no
>> internet access.
>>
>> We could get the missing packages into EPEL.
>>
>> - Ken
>>
>> On Fri, Mar 24, 2023 at 7:32 AM Ernesto Puerta  wrote:
>> >
>> > Hi Casey,
>> >
>> > The original idea was to leave this to Reef alone, but given that the 
>> > CentOS 9 Quincy release is also blocked by missing Python packages, I 
>> > think that it'd make sense to backport it.
>> >
>> > I'm coordinating with Pere (in CC) to expedite this. We may need help to 
>> > troubleshoot Shaman/rpmbuild issues. Who would be the best one to help 
>> > with that?
>> >
>> > Regarding your last question, I don't know who's the maintainer of those 
>> > packages in EPEL. There's this BZ (https://bugzilla.redhat.com/2166620) 
>> > requesting that specific package, but that's only one out of the dozen of 
>> > missing packages (plus transitive dependencies)...
>> >
>> > Kind Regards,
>> > Ernesto
>> >
>> >
>> > On Thu, Mar 23, 2023 at 2:19 PM Casey Bodley  wrote:
>> >>
>> >> hi Ernesto and lists,
>> >>
>> >> > [1] https://github.com/ceph/ceph/pull/47501
>> >>
>> >> are we planning to backport this to quincy so we can support centos 9
>> >> there? enabling that upgrade path on centos 9 was one of the
>> >> conditions for dropping centos 8 support in reef, which i'm still keen
>> >> to do
>> >>
>> >> if not, can we find another resolution to
>> >> https://tracker.ceph.com/issues/58832? as i understand it, all of
>> >> those python packages exist in centos 8. do we know why they were
>> >> dropped for centos 9? have we looked into making those available in
>> >> epel? (cc Ken and Kaleb)
>> >>
>> >> On Fri, Sep 2, 2022 at 12:01 PM Ernesto Puerta  
>> >> wrote:
>> >> >
>> >> > Hi Kevin,
>> >> >
>> >> >>
>> >> >> Isn't this one of the reasons containers were pushed, so that the 
>> >> >> packaging isn't as big a deal?
>> >> >
>> >> >
>> >> > Yes, but the Ceph community has a strong commitment to provide distro 
>> >> > packages for those users who are not interested in moving to containers.
>> >> >
>> >> >> Is it the continued push to support lots of distros without using 
>> >> >> containers that is the problem?
>> >> >
>> >> >
>> >> > If not a problem, it definitely makes it more challenging. Compiled 
>> >> > components often sort this out by statically linking deps whose 
>> >> > packages are not widely available in distros. The approach we're 
>> >> > proposing here would be the closest equivalent to static linking for 
>> >> > interpreted code (bundling).
>> >> >
>> >> > Thanks for sharing your questions!
>> >> >
>> >> > Kind regards,
>> >> > Ernesto
>> >> > ___
>> >> > Dev mailing list -- d...@ceph.io
>> >> > To unsubscribe send an email to dev-le...@ceph.io
>> >>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Radosgw multisite replication issues

2023-04-27 Thread Casey Bodley
On Thu, Apr 27, 2023 at 11:36 AM Tarrago, Eli (RIS-BCT)
 wrote:
>
> After working on this issue for a bit.
> The active plan is to fail over master, to the “west” dc. Perform a realm 
> pull from the west so that it forces the failover to occur. Then have the 
> “east” DC, then pull the realm data back. Hopefully will get both sides back 
> in sync..
>
> My concern with this approach is both sides are “active”, meaning the client 
> has been writing data to both endpoints. Will this cause an issue where 
> “west” will have data that the metadata does not have record of, and then 
> delete the data?

no object data would be deleted as a result of metadata failover issues, no

>
> Thanks
>
> From: Tarrago, Eli (RIS-BCT) 
> Date: Thursday, April 20, 2023 at 3:13 PM
> To: Ceph Users 
> Subject: Radosgw multisite replication issues
> Good Afternoon,
>
> I am experiencing an issue where east-1 is no longer able to replicate from 
> west-1, however, after a realm pull, west-1 is now able to replicate from 
> east-1.
>
> In other words:
> West <- Can Replicate <- East
> West -> Cannot Replicate -> East
>
> After confirming the access and secret keys are identical on both sides, I 
> restarted all radosgw services.
>
> Here is the current status of the cluster below.
>
> Thank you for your help,
>
> Eli Tarrago
>
>
> root@east01:~# radosgw-admin zone get
> {
> "id": "ddd66ab8-0417-46ee-a53b-043352a63f93",
> "name": "rgw-east",
> "domain_root": "rgw-east.rgw.meta:root",
> "control_pool": "rgw-east.rgw.control",
> "gc_pool": "rgw-east.rgw.log:gc",
> "lc_pool": "rgw-east.rgw.log:lc",
> "log_pool": "rgw-east.rgw.log",
> "intent_log_pool": "rgw-east.rgw.log:intent",
> "usage_log_pool": "rgw-east.rgw.log:usage",
> "roles_pool": "rgw-east.rgw.meta:roles",
> "reshard_pool": "rgw-east.rgw.log:reshard",
> "user_keys_pool": "rgw-east.rgw.meta:users.keys",
> "user_email_pool": "rgw-east.rgw.meta:users.email",
> "user_swift_pool": "rgw-east.rgw.meta:users.swift",
> "user_uid_pool": "rgw-east.rgw.meta:users.uid",
> "otp_pool": "rgw-east.rgw.otp",
> "system_key": {
> "access_key": "PW",
> "secret_key": "H6"
> },
> "placement_pools": [
> {
> "key": "default-placement",
> "val": {
> "index_pool": "rgw-east.rgw.buckets.index",
> "storage_classes": {
> "STANDARD": {
> "data_pool": "rgw-east.rgw.buckets.data"
> }
> },
> "data_extra_pool": "rgw-east.rgw.buckets.non-ec",
> "index_type": 0
> }
> }
> ],
> "realm_id": "98e0e391-16fb-48da-80a5-08437fd81789",
> "notif_pool": "rgw-east.rgw.log:notif"
> }
>
> root@west01:~# radosgw-admin zone get
> {
>"id": "b2a4a31c-1505-4fdc-b2e0-ea07d9463da1",
> "name": "rgw-west",
> "domain_root": "rgw-west.rgw.meta:root",
> "control_pool": "rgw-west.rgw.control",
> "gc_pool": "rgw-west.rgw.log:gc",
> "lc_pool": "rgw-west.rgw.log:lc",
> "log_pool": "rgw-west.rgw.log",
> "intent_log_pool": "rgw-west.rgw.log:intent",
> "usage_log_pool": "rgw-west.rgw.log:usage",
> "roles_pool": "rgw-west.rgw.meta:roles",
> "reshard_pool": "rgw-west.rgw.log:reshard",
> "user_keys_pool": "rgw-west.rgw.meta:users.keys",
> "user_email_pool": "rgw-west.rgw.meta:users.email",
> "user_swift_pool": "rgw-west.rgw.meta:users.swift",
> "user_uid_pool": "rgw-west.rgw.meta:users.uid",
> "otp_pool": "rgw-west.rgw.otp",
> "system_key": {
> "access_key": "PxxW",
> "secret_key": "Hxx6"
> },
> "placement_pools": [
> {
> "key": "default-placement",
> "val": {
> "index_pool": "rgw-west.rgw.buckets.index",
> "storage_classes": {
> "STANDARD": {
> "data_pool": "rgw-west.rgw.buckets.data"
> }
> },
> "data_extra_pool": "rgw-west.rgw.buckets.non-ec",
> "index_type": 0
> }
> }
> ],
> "realm_id": "98e0e391-16fb-48da-80a5-08437fd81789",
> "notif_pool": "rgw-west.rgw.log:notif"
> east01:~# radosgw-admin metadata sync status
> {
> "sync_status": {
> "info": {
> "status": "init",
> "num_shards": 0,
> "period": "",
> "realm_epoch": 0
> },
> "markers": []
> },
> "full_sync": {
> "total": 0,
> "complete": 0
> }
> }
>
> west01:~#  radosgw-admin metadata sync status
> {
> "sync_status": {
> "info": {
> "status": "sync",
> "num_shards": 64,
> "period": "44b6b308-e2d8-4835-8518-c90447e7b55c",
> "realm_epoch": 3
> },
> "markers": [
>  

[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-02 Thread Casey Bodley
On Thu, Apr 27, 2023 at 5:21 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/59542#note-1
> Release Notes - TBD
>
> Seeking approvals for:
>
> smoke - Radek, Laura
> rados - Radek, Laura
>   rook - Sébastien Han
>   cephadm - Adam K
>   dashboard - Ernesto
>
> rgw - Casey

rgw approved

> rbd - Ilya
> krbd - Ilya
> fs - Venky, Patrick
> upgrade/octopus-x (pacific) - Laura (look the same as in 16.2.8)
> upgrade/pacific-p2p - Laura
> powercycle - Brad (SELinux denials)
> ceph-volume - Guillaume, Adam K
>
> Thx
> YuriW
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-08 Thread Casey Bodley
On Sun, May 7, 2023 at 5:25 PM Yuri Weinstein  wrote:
>
> All PRs were cherry-picked and the new RC1 build is:
>
> https://shaman.ceph.com/builds/ceph/pacific-release/8f93a58b82b94b6c9ac48277cc15bd48d4c0a902/
>
> Rados, fs and rgw were rerun and results are summarized here:
> https://tracker.ceph.com/issues/59542#note-1
>
> Seeking final approvals:
>
> rados - Radek
> fs - Venky
> rgw - Casey

rgw approved, thanks

>
> On Fri, May 5, 2023 at 8:27 AM Yuri Weinstein  wrote:
> >
> > I got verbal approvals for the listed PRs:
> >
> > https://github.com/ceph/ceph/pull/51232 -- Venky approved
> > https://github.com/ceph/ceph/pull/51344  -- Venky approved
> > https://github.com/ceph/ceph/pull/51200 -- Casey approved
> > https://github.com/ceph/ceph/pull/50894  -- Radek approved
> >
> > Suites rados and fs will need to be retested on updates pacific-release 
> > branch.
> >
> >
> > On Thu, May 4, 2023 at 9:13 AM Yuri Weinstein  wrote:
> > >
> > > In summary:
> > >
> > > Release Notes:  https://github.com/ceph/ceph/pull/51301
> > >
> > > We plan to finish this release next week and we have the following PRs
> > > planned to be added:
> > >
> > > https://github.com/ceph/ceph/pull/51232 -- Venky approved
> > > https://github.com/ceph/ceph/pull/51344  -- Venky in progress
> > > https://github.com/ceph/ceph/pull/51200 -- Casey approved
> > > https://github.com/ceph/ceph/pull/50894  -- Radek in progress
> > >
> > > As soon as these PRs are finalized, I will cherry-pick them and
> > > rebuild "pacific-release" and rerun appropriate suites.
> > >
> > > On Thu, May 4, 2023 at 9:07 AM Radoslaw Zarzynski  
> > > wrote:
> > > >
> > > > If we get some time, I would like to include:
> > > >
> > > >   https://github.com/ceph/ceph/pull/50894.
> > > >
> > > > Regards,
> > > > Radek
> > > >
> > > > On Thu, May 4, 2023 at 5:56 PM Venky Shankar  
> > > > wrote:
> > > > >
> > > > > Hi Yuri,
> > > > >
> > > > > On Wed, May 3, 2023 at 7:10 PM Venky Shankar  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, May 2, 2023 at 8:25 PM Yuri Weinstein  
> > > > > > wrote:
> > > > > > >
> > > > > > > Venky, I did plan to cherry-pick this PR if you approve this 
> > > > > > > (this PR
> > > > > > > was used for a rerun)
> > > > > >
> > > > > > OK. The fs suite failure is being looked into
> > > > > > (https://tracker.ceph.com/issues/59626).
> > > > >
> > > > > Fix is being tracked by
> > > > >
> > > > > https://github.com/ceph/ceph/pull/51344
> > > > >
> > > > > Once ready, it needs to be included in 16.2.13 and would require a fs
> > > > > suite re-run (although re-renning the failed tests should suffice,
> > > > > however, I'm a bit inclined in putting it through the fs suite).
> > > > >
> > > > > >
> > > > > > >
> > > > > > > On Tue, May 2, 2023 at 7:51 AM Venky Shankar 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hi Yuri,
> > > > > > > >
> > > > > > > > On Fri, Apr 28, 2023 at 2:53 AM Yuri Weinstein 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > Details of this release are summarized here:
> > > > > > > > >
> > > > > > > > > https://tracker.ceph.com/issues/59542#note-1
> > > > > > > > > Release Notes - TBD
> > > > > > > > >
> > > > > > > > > Seeking approvals for:
> > > > > > > > >
> > > > > > > > > smoke - Radek, Laura
> > > > > > > > > rados - Radek, Laura
> > > > > > > > >   rook - Sébastien Han
> > > > > > > > >   cephadm - Adam K
> > > > > > > > >   dashboard - Ernesto
> > > > > > > > >
> > > > > > > > > rgw - Casey
> > > > > > > > > rbd - Ilya
> > > > > > > > > krbd - Ilya
> > > > > > > > > fs - Venky, Patrick
> > > > > > > >
> > > > > > > > There are a couple of new failures which are qa/test related - 
> > > > > > > > I'll
> > > > > > > > have a look at those (they _do not_ look serious).
> > > > > > > >
> > > > > > > > Also, Yuri, do you plan to merge
> > > > > > > >
> > > > > > > > https://github.com/ceph/ceph/pull/51232
> > > > > > > >
> > > > > > > > into the pacific-release branch although it's tagged with one 
> > > > > > > > of your
> > > > > > > > other pacific runs?
> > > > > > > >
> > > > > > > > > upgrade/octopus-x (pacific) - Laura (look the same as in 
> > > > > > > > > 16.2.8)
> > > > > > > > > upgrade/pacific-p2p - Laura
> > > > > > > > > powercycle - Brad (SELinux denials)
> > > > > > > > > ceph-volume - Guillaume, Adam K
> > > > > > > > >
> > > > > > > > > Thx
> > > > > > > > > YuriW
> > > > > > > > > ___
> > > > > > > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > > > > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Cheers,
> > > > > > > > Venky
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Cheers,
> > > > > > Venky
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Cheers,
> > > > > Venky
> > > > > ___
> > > > > Dev mailing list -- d...@ceph.io
> >

[ceph-users] Re: Radosgw multisite replication issues

2023-05-11 Thread Casey Bodley
ansfer speed less than 1024 Bytes per second during 
> 300 seconds.
> 2023-05-09T15:46:21.069+ 7f20857f2700  0 rgw async rados processor: 
> store->fetch_remote_obj() returned r=-5

these errors would correspond to GetObject requests, and show up as
's3:get_obj' in the radosgw log


> 2023-05-09T15:46:21.069+ 7f20b12b8700  0 WARNING: curl operation timed 
> out, network average transfer speed less than 1024 Bytes per second during 
> 300 seconds.
> 2023-05-09T15:46:21.069+ 7f20b12b8700  0 WARNING: curl operation timed 
> out, network average transfer speed less than 1024 Bytes per second during 
> 300 seconds.
> 2023-05-09T15:46:21.069+ 7f2092ffd700  0 rgw async rados processor: 
> store->fetch_remote_obj() returned r=-5
> 2023-05-09T15:46:21.069+ 7f20b12b8700  0 WARNING: curl operation timed 
> out, network average transfer speed less than 1024 Bytes per second during 
> 300 seconds.
> 2023-05-09T15:46:21.069+ 7f2080fe9700  0 rgw async rados processor: 
> store->fetch_remote_obj() returned r=-5
> 2023-05-09T15:46:21.069+ 7f20b12b8700  0 WARNING: curl operation timed 
> out, network average transfer speed less than 1024 Bytes per second during 
> 300 seconds.
> 2023-05-09T15:46:21.069+ 7f20817ea700  0 rgw async rados processor: 
> store->fetch_remote_obj() returned r=-5
> 2023-05-09T15:46:21.069+ 7f208b7fe700  0 rgw async rados processor: 
> store->fetch_remote_obj() returned r=-5
> 2023-05-09T15:46:21.069+ 7f20867f4700  0 rgw async rados processor: 
> store->fetch_remote_obj() returned r=-5
> 2023-05-09T15:46:21.069+ 7f2086ff5700  0 rgw async rados processor: 
> store->fetch_remote_obj() returned r=-5
> 2023-05-09T15:46:21.069+ 7f20b12b8700  0 WARNING: curl operation timed 
> out, network average transfer speed less than 1024 Bytes per second during 
> 300 seconds.
> 2023-05-09T15:46:21.069+ 7f20b12b8700  0 WARNING: curl operation timed 
> out, network average transfer speed less than 1024 Bytes per second during 
> 300 seconds.
> 2023-05-09T15:46:21.069+ 7f2085ff3700  0 rgw async rados processor: 
> store->fetch_remote_obj() returned r=-5
> 2023-05-09T15:46:21.069+ 7f20827ec700  0 rgw async rados processor: 
> store->fetch_remote_obj() returned r=-5
>
>
> From: Casey Bodley 
> Date: Thursday, April 27, 2023 at 12:37 PM
> To: Tarrago, Eli (RIS-BCT) 
> Cc: Ceph Users 
> Subject: Re: [ceph-users] Re: Radosgw multisite replication issues
> *** External email: use caution ***
>
>
>
> On Thu, Apr 27, 2023 at 11:36 AM Tarrago, Eli (RIS-BCT)
>  wrote:
> >
> > After working on this issue for a bit.
> > The active plan is to fail over master, to the “west” dc. Perform a realm 
> > pull from the west so that it forces the failover to occur. Then have the 
> > “east” DC, then pull the realm data back. Hopefully will get both sides 
> > back in sync..
> >
> > My concern with this approach is both sides are “active”, meaning the 
> > client has been writing data to both endpoints. Will this cause an issue 
> > where “west” will have data that the metadata does not have record of, and 
> > then delete the data?
>
> no object data would be deleted as a result of metadata failover issues, no
>
> >
> > Thanks
> >
> > From: Tarrago, Eli (RIS-BCT) 
> > Date: Thursday, April 20, 2023 at 3:13 PM
> > To: Ceph Users 
> > Subject: Radosgw multisite replication issues
> > Good Afternoon,
> >
> > I am experiencing an issue where east-1 is no longer able to replicate from 
> > west-1, however, after a realm pull, west-1 is now able to replicate from 
> > east-1.
> >
> > In other words:
> > West <- Can Replicate <- East
> > West -> Cannot Replicate -> East
> >
> > After confirming the access and secret keys are identical on both sides, I 
> > restarted all radosgw services.
> >
> > Here is the current status of the cluster below.
> >
> > Thank you for your help,
> >
> > Eli Tarrago
> >
> >
> > root@east01:~# radosgw-admin zone get
> > {
> > "id": "ddd66ab8-0417-46ee-a53b-043352a63f93",
> > "name": "rgw-east",
> > "domain_root": "rgw-east.rgw.meta:root",
> > "control_pool": "rgw-east.rgw.control",
> > "gc_pool": "rgw-east.rgw.log:gc",
> > "lc_pool": "rgw-east.rgw.log:lc",
> > "log_pool": "rgw-east.rgw.log",
> > "intent_log_pool": "rgw-east.rgw.log:intent",
> > "usage_log_pool": "

[ceph-users] Re: multisite sync and multipart uploads

2023-05-11 Thread Casey Bodley
sync doesn't distinguish between multipart and regular object uploads.
once a multipart upload completes, sync will replicate it as a single
object using an s3 GetObject request

replicating the parts individually would have some benefits. for
example, when sync retries are necessary, we might only have to resend
one part instead of the entire object. but it's far simpler to
replicate objects in a single atomic step

On Thu, May 11, 2023 at 1:07 PM Yixin Jin  wrote:
>
> Hi guys,
>
> With Quincy release, does anyone know how multisite sync deals with multipart 
> uploads? I mean those part objects of some incomplete multipart uploads. Are 
> those objects also sync-ed over either with full-sync or incremental sync? I 
> did a quick experiment and notice that these objects are not sync-ed over. Is 
> it intentional or is there a defect of it?
>
> Thanks,
> Yixin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: how to enable multisite resharding feature?

2023-05-17 Thread Casey Bodley
i'm afraid that feature will be new in the reef release. multisite
resharding isn't supported on quincy

On Wed, May 17, 2023 at 11:56 AM Alexander Mamonov  wrote:
>
> https://docs.ceph.com/en/latest/radosgw/multisite/#feature-resharding
> When I try this I get:
> root@ceph-m-02:~# radosgw-admin zone modify --rgw-zone=sel 
> --enable-feature=resharding
> ERROR: invalid flag --enable-feature=resharding
> root@ceph-m-02:~# ceph version
> ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Creating a bucket with bucket constructor in Ceph v16.2.7

2023-05-18 Thread Casey Bodley
On Wed, May 17, 2023 at 11:13 PM Ramin Najjarbashi
 wrote:
>
> Hi
>
> I'm currently using Ceph version 16.2.7 and facing an issue with bucket
> creation in a multi-zone configuration. My setup includes two zone groups:
>
> ZG1 (Master) and ZG2, with one zone in each zone group (zone-1 in ZG1 and
> zone-2 in ZG2).
>
> The objective is to create buckets in a specific zone group (ZG2) using the
> bucket constructor.
> However, despite setting the desired zone group (abrak) in the request, the
> bucket is still being created in the master zone group (ZG1).
> I have defined the following endpoint pattern for each zone group:
>
> s3.{zg}.mydomain.com
>
> I am using the s3cmd client to interact with the Ceph cluster. I have
> ensured that I provide the necessary endpoint and region information while
> executing the bucket creation command. Despite my efforts, the bucket
> consistently gets created in ZG1 instead of ZG2.

this is expected behavior for the metadata consistency model. all
metadata gets created on the metadata master zone first, and syncs to
all other zones in the realm from there. so your buckets will be
visible to every zonegroup

however, ZG2 is still the 'bucket location', and its object data
should only reside in ZG2's zones. any s3 requests on that bucket sent
to ZG1 will get redirected to ZG2 and serviced there

if you don't want any metadata shared between the two zonegroups, you
can put them in separate realms. but that includes user metadata as
well

>
> - Ceph Version: 16.2.7
> - Zone Group 1 (ZG1) Endpoint: http://s3.zonegroup1.mydomain.com
> - Zone Group 2 (ZG2) Endpoint: http://s3.zonegroup2.mydomain.com
> - Desired Bucket Creation Region: zg2-api-name
>
>  have reviewed the Ceph documentation and made necessary configuration
> changes, but I have not been able to achieve the desired result.I kindly
> request your assistance in understanding why the bucket constructor is not
> honoring the specified region and always defaults to ZG1. I would greatly
> appreciate any insights, recommendations, or potential solutions to resolve
> this issue.
>
>  Thank you for your time and support.
>
> -
> Here are the details of my setup:
> -
>
> ```sh
> s3cmd --region zg2-api-name mb s3://test-zg2s3cmd info
> s3://test-zg2s3://test-zg2/ (bucket):
>Location:  zg2-api-name
>Payer: BucketOwner
>Expiration Rule: none
>Policy:none
>CORS:  none
>ACL:   development: FULL_CONTROL
> ```
>
> this is my config file:
>
> ```ini
> [default]
> access_key = 
> secret_key = 
> host_base = s3.zonegroup1.mydomain.com
> host_bucket = s3.%(location)s.mydomain.com
> #host_bucket = %(bucket)s.s3.zonegroup1.mydomain.com
> #host_bucket = s3.%(location)s.mydomain.com
> #host_bucket = s3.%(region)s.mydomain.com
> bucket_location = zg1-api-name
> use_https = False
> ```
>
>
> Zonegroup configuration for the `zonegroup1` region:
>
> ```json
> {
> "id": "fb3f818a-ca9b-4b12-b431-7cdcd80006d",
> "name": "zg1-api-name",
> "api_name": "zg1-api-name",
> "is_master": "false",
> "endpoints": [
> "http://s3.zonegroup1.mydomain.com";,
> ],
> "hostnames": [
> "s3.zonegroup1.mydomain.com",
> ],
> "hostnames_s3website": [
> "s3-website.zonegroup1.mydomain.com",
> ],
> "master_zone": "at2-stg-zone",
> "zones": [
> {
> "id": "at2-stg-zone",
> "name": "at2-stg-zone",
> "endpoints": [
> "http://s3.zonegroup1.mydomain.com";
> ],
> "log_meta": "false",
> "log_data": "true",
> "bucket_index_max_shards": 11,
> "read_only": "false",
> "tier_type": "",
> "sync_from_all": "true",
> "sync_from": [],
> "redirect_zone": ""
> }
> ],
> "placement_targets": [
> {
> "name": "default-placement",
> "tags": [],
> "storage_classes": [
> "STANDARD"
> ]
> }
> ],
> "default_placement": "default-placement",
> "realm_id": "fa2f8194-4a9d-4b98-b411-9cdcd1e5506a",
> "sync_policy": {
> "groups": []
> }
> }
> ```
>
> Zonegroup configuration for the `zonegroup2` region:
>
> ```json
> {
> "id": "a513d60c-44a2-4289-a23d-b7a511be6ee4",
> "name": "zg2-api-name",
> "api_name": "zg2-api-name",
> "is_master": "false",
> "endpoints": [
> "http://s3.zonegroup2.mydomain.com";
> ],
> "hostnames": [
> "s3.zonegroup2.mydomain.com"
> ],
> "hostnames_s3website": [],
> "master_zone": "zonegroup2-sh-1",
> "zones": [
> {
> "id": "zonegroup2-sh-1",
> "name": "zonegroup2-sh-1",
> "endpoints": [
> "http://s3.zonegroup2.mydomain.com";
> ],
> "log_meta": "false",
> "log_data": "false",

[ceph-users] Re: Ceph Mgr/Dashboard Python depedencies: a new approach

2023-05-18 Thread Casey Bodley
thanks Ken! using copr sounds like a great way to unblock testing for
reef until everything lands in epel

for the teuthology part, i raised a pull request against teuthology's
install task to add support for copr repositories
(https://github.com/ceph/teuthology/pull/1844) and updated my ceph pr
that adds centos9 as a supported distro
(https://github.com/ceph/ceph/pull/50441) to enable that

i tested that combination in the rgw suite, and all of the packages
were installed successfully:
http://qa-proxy.ceph.com/teuthology/cbodley-2023-05-18_13:12:32-rgw:verify-main-distro-default-smithi/7277538/teuthology.log

for reference, the teuthology-suite command line for that test was:
$ teuthology-suite -s rgw:verify -m smithi --ceph-repo
https://github.com/ceph/ceph.git -S
fb28670387326ed3faf2b9cefac018ca68093364 --suite-repo
https://github.com/cbodley/ceph.git --suite-branch
wip-qa-distros-centos9 --teuthology-branch wip-install-copr -p 75
--limit 1 --seed 0 --filter centos_latest

On Wed, May 17, 2023 at 3:12 PM Ken Dreyer  wrote:
>
> Originally we had about a hundred packages in
> https://copr.fedorainfracloud.org/coprs/ceph/el9/ before they were
> wiped out in rhbz#2143742. I went back over the list of outstanding
> deps today. EPEL lacks only five packages now. I've built those into
> the Copr today.
>
> You can enable it with "dnf copr enable -y ceph/el9" . I think we
> should add this command to the container Dockerfile, Teuthology tasks,
> install-deps.sh, or whatever needs to run on el9 that is missing these
> packages.
>
> These tickets track moving the final five builds from the Copr into EPEL9:
>
> python-asyncssh - https://bugzilla.redhat.com/2196046
> python-pecan - https://bugzilla.redhat.com/2196045
> python-routes - https://bugzilla.redhat.com/2166620
> python-repoze-lru - no BZ yet
> python-logutils - provide karma here:
> https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2023-6baae8389d
>
> I was interested to see almost all of these are already in progress .
> That final one (logutils) should go to EPEL's stable repo in a week
> (faster with karma).
>
> - Ken
>
>
>
>
> On Wed, Apr 26, 2023 at 11:00 AM Casey Bodley  wrote:
> >
> > are there any volunteers willing to help make these python packages
> > available upstream?
> >
> > On Tue, Mar 28, 2023 at 5:34 AM Ernesto Puerta  wrote:
> > >
> > > Hey Ken,
> > >
> > > This change doesn't not involve any further internet access other than 
> > > the already required for the "make dist" stage (e.g.: npm packages). That 
> > > said, where feasible, I also prefer to keep the current approach for a 
> > > minor version.
> > >
> > > Kind Regards,
> > > Ernesto
> > >
> > >
> > > On Mon, Mar 27, 2023 at 9:06 PM Ken Dreyer  wrote:
> > >>
> > >> I hope we don't backport such a big change to Quincy. That will have a
> > >> large impact on how we build in restricted environments with no
> > >> internet access.
> > >>
> > >> We could get the missing packages into EPEL.
> > >>
> > >> - Ken
> > >>
> > >> On Fri, Mar 24, 2023 at 7:32 AM Ernesto Puerta  
> > >> wrote:
> > >> >
> > >> > Hi Casey,
> > >> >
> > >> > The original idea was to leave this to Reef alone, but given that the 
> > >> > CentOS 9 Quincy release is also blocked by missing Python packages, I 
> > >> > think that it'd make sense to backport it.
> > >> >
> > >> > I'm coordinating with Pere (in CC) to expedite this. We may need help 
> > >> > to troubleshoot Shaman/rpmbuild issues. Who would be the best one to 
> > >> > help with that?
> > >> >
> > >> > Regarding your last question, I don't know who's the maintainer of 
> > >> > those packages in EPEL. There's this BZ 
> > >> > (https://bugzilla.redhat.com/2166620) requesting that specific 
> > >> > package, but that's only one out of the dozen of missing packages 
> > >> > (plus transitive dependencies)...
> > >> >
> > >> > Kind Regards,
> > >> > Ernesto
> > >> >
> > >> >
> > >> > On Thu, Mar 23, 2023 at 2:19 PM Casey Bodley  
> > >> > wrote:
> > >> >>
> > >> >> hi Ernesto and lists,
> > >> >>
> > >> >> > [1] https://github.com/ceph/ceph/pull/47501
> > >> >>
> > &g

[ceph-users] Re: Encryption per user Howto

2023-05-22 Thread Casey Bodley
rgw supports the 3 flavors of S3 Server-Side Encryption, along with
the PutBucketEncryption api for per-bucket default encryption. you can
find the docs in https://docs.ceph.com/en/quincy/radosgw/encryption/

On Mon, May 22, 2023 at 10:49 AM huxia...@horebdata.cn
 wrote:
>
> Dear Alexander,
>
> Thanks a lot for helpful comments and insights. Regarding CephFS and RGW, Per 
> user seems to be daunting and complex.
>
> What if encryption on the server side without per user requirment? would it 
> be relatively easy to achieve, and how?
>
> best regards,
>
> Samuel
>
>
>
>
>
> huxia...@horebdata.cn
>
> From: Alexander E. Patrakov
> Date: 2023-05-21 15:44
> To: huxia...@horebdata.cn
> CC: ceph-users
> Subject: Re: [ceph-users] Encryption per user Howto
> Hello Samuel,
>
> On Sun, May 21, 2023 at 3:48 PM huxia...@horebdata.cn
>  wrote:
> >
> > Dear Ceph folks,
> >
> > Recently one of our clients approached us with a request on encrpytion per 
> > user, i.e. using individual encrytion key for each user and encryption  
> > files and object store.
> >
> > Does anyone know (or have experience) how to do with CephFS and Ceph RGW?
>
> For CephFS, this is unachievable.
>
> For RGW, please use Vault for storing encryption keys. Don't forget
> about the proper high-availability setup. Use an AppRole to manage
> tokens. Use Vault Agent as a proxy that adds the token to requests
> issued by RGWs. Then create a bucket for each user and set the
> encryption policy for this bucket using the PutBucketEncryption API
> that is available through AWS CLI. Either SSE-S3 or SSE-KMS will work
> for you. SSE-S3 is easier to manage. Each object will then be
> encrypted using a different key derived from its name and a per-bucket
> master key which never leaves Vault.
>
> Note that users will be able to create additional buckets by
> themselves, and they won't be encrypted, so tell them either not to do
> that or to encrypt the new buckets similarly.
>
> --
> Alexander E. Patrakov
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Important: RGW multisite bug may silently corrupt encrypted objects on replication

2023-05-26 Thread Casey Bodley
Our downstream QE team recently observed an md5 mismatch of replicated
objects when testing rgw's server-side encryption in multisite. This
corruption is specific to s3 multipart uploads, and only affects the
replicated copy - the original object remains intact. The bug likely
affects Ceph releases all the way back to Luminous where server-side
encryption was first introduced.

To expand on the cause of this corruption: Encryption of multipart
uploads requires special handling around the part boundaries, because
each part is uploaded and encrypted separately. In multisite, objects
are replicated in their encrypted form, and multipart uploads are
replicated as a single part. As a result, the replicated copy loses
its knowledge about the original part boundaries required to decrypt
the data correctly.

We don't have a fix yet, but we're tracking it in
https://tracker.ceph.com/issues/46062. The fix will only modify the
replication logic, so won't repair any objects that have already
replicated incorrectly. We'll need to develop a radosgw-admin command
to search for affected objects and reschedule their replication.

In the meantime, I can only advise multisite users to avoid using
encryption for multipart uploads. If you'd like to scan your cluster
for existing encrypted multipart uploads, you can identify them with a
s3 HeadObject request. The response would include a
x-amz-server-side-encryption header, and the ETag header value (with
"s removed) would be longer than 32 characters (multipart ETags are in
the special form "-"). Take care not to delete the
corrupted replicas, because an active-active multisite configuration
would go on to delete the original copy.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Important: RGW multisite bug may silently corrupt encrypted objects on replication

2023-05-30 Thread Casey Bodley
On Tue, May 30, 2023 at 8:22 AM Tobias Urdin  wrote:
>
> Hello Casey,
>
> Thanks for the information!
>
> Can you please confirm that this is only an issue when using 
> “rgw_crypt_default_encryption_key”
> config opt that says “testing only” in the documentation [1] to enable 
> encryption and not when using
> Barbican or Vault as KMS or using SSE-C with the S3 API?

unfortunately, all flavors of server-side encryption (SSE-C, SSE-KMS,
SSE-S3, and rgw_crypt_default_encryption_key) are affected by this
bug, as they share the same encryption logic. the main difference is
where they get the key

>
> [1] 
> https://docs.ceph.com/en/quincy/radosgw/encryption/#automatic-encryption-for-testing-only
>
> > On 26 May 2023, at 22:45, Casey Bodley  wrote:
> >
> > Our downstream QE team recently observed an md5 mismatch of replicated
> > objects when testing rgw's server-side encryption in multisite. This
> > corruption is specific to s3 multipart uploads, and only affects the
> > replicated copy - the original object remains intact. The bug likely
> > affects Ceph releases all the way back to Luminous where server-side
> > encryption was first introduced.
> >
> > To expand on the cause of this corruption: Encryption of multipart
> > uploads requires special handling around the part boundaries, because
> > each part is uploaded and encrypted separately. In multisite, objects
> > are replicated in their encrypted form, and multipart uploads are
> > replicated as a single part. As a result, the replicated copy loses
> > its knowledge about the original part boundaries required to decrypt
> > the data correctly.
> >
> > We don't have a fix yet, but we're tracking it in
> > https://tracker.ceph.com/issues/46062. The fix will only modify the
> > replication logic, so won't repair any objects that have already
> > replicated incorrectly. We'll need to develop a radosgw-admin command
> > to search for affected objects and reschedule their replication.
> >
> > In the meantime, I can only advise multisite users to avoid using
> > encryption for multipart uploads. If you'd like to scan your cluster
> > for existing encrypted multipart uploads, you can identify them with a
> > s3 HeadObject request. The response would include a
> > x-amz-server-side-encryption header, and the ETag header value (with
> > "s removed) would be longer than 32 characters (multipart ETags are in
> > the special form "-"). Take care not to delete the
> > corrupted replicas, because an active-active multisite configuration
> > would go on to delete the original copy.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Important: RGW multisite bug may silently corrupt encrypted objects on replication

2023-05-31 Thread Casey Bodley
On Wed, May 31, 2023 at 7:24 AM Tobias Urdin  wrote:
>
> Hello Casey,
>
> Understood, thanks!
>
> That means that the original copy in the site that it was uploaded to is still
> safe as long as that copy is not removed, and no underlying changes below
> RadosGW in the Ceph storage could corrupt the original copy?

right, the original multipart upload remains intact and can be
decrypted successfully

as i noted above, take care not to delete or modify any replicas that
were corrupted. replication is bidirectional by default, so those
changes would sync back and delete/overwrite the original copy

>
> Best regards
> Tobias
>
> On 30 May 2023, at 14:48, Casey Bodley  wrote:
>
> On Tue, May 30, 2023 at 8:22 AM Tobias Urdin 
> mailto:tobias.ur...@binero.com>> wrote:
>
> Hello Casey,
>
> Thanks for the information!
>
> Can you please confirm that this is only an issue when using 
> “rgw_crypt_default_encryption_key”
> config opt that says “testing only” in the documentation [1] to enable 
> encryption and not when using
> Barbican or Vault as KMS or using SSE-C with the S3 API?
>
> unfortunately, all flavors of server-side encryption (SSE-C, SSE-KMS,
> SSE-S3, and rgw_crypt_default_encryption_key) are affected by this
> bug, as they share the same encryption logic. the main difference is
> where they get the key
>
>
> [1] 
> https://docs.ceph.com/en/quincy/radosgw/encryption/#automatic-encryption-for-testing-only
>
> On 26 May 2023, at 22:45, Casey Bodley  wrote:
>
> Our downstream QE team recently observed an md5 mismatch of replicated
> objects when testing rgw's server-side encryption in multisite. This
> corruption is specific to s3 multipart uploads, and only affects the
> replicated copy - the original object remains intact. The bug likely
> affects Ceph releases all the way back to Luminous where server-side
> encryption was first introduced.
>
> To expand on the cause of this corruption: Encryption of multipart
> uploads requires special handling around the part boundaries, because
> each part is uploaded and encrypted separately. In multisite, objects
> are replicated in their encrypted form, and multipart uploads are
> replicated as a single part. As a result, the replicated copy loses
> its knowledge about the original part boundaries required to decrypt
> the data correctly.
>
> We don't have a fix yet, but we're tracking it in
> https://tracker.ceph.com/issues/46062. The fix will only modify the
> replication logic, so won't repair any objects that have already
> replicated incorrectly. We'll need to develop a radosgw-admin command
> to search for affected objects and reschedule their replication.
>
> In the meantime, I can only advise multisite users to avoid using
> encryption for multipart uploads. If you'd like to scan your cluster
> for existing encrypted multipart uploads, you can identify them with a
> s3 HeadObject request. The response would include a
> x-amz-server-side-encryption header, and the ETag header value (with
> "s removed) would be longer than 32 characters (multipart ETags are in
> the special form "-"). Take care not to delete the
> corrupted replicas, because an active-active multisite configuration
> would go on to delete the original copy.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
> To unsubscribe send an email to 
> ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: all buckets mtime = "0.000000" after upgrade to 17.2.6

2023-05-31 Thread Casey Bodley
thanks for the report. this regression was already fixed in
https://tracker.ceph.com/issues/58932 and will be in the next quincy
point release

On Wed, May 31, 2023 at 10:46 AM  wrote:
>
> I was running on 17.2.5 since October, and just upgraded to 17.2.6, and now 
> the "mtime" property on all my buckets is 0.00.
>
> On all previous versions going back to Nautilus this wasn't an issue, and we 
> do like to have that value present. radosgw-admin has no quick way to get the 
> last object in the bucket.
>
> Here's my tracker submission:
> https://tracker.ceph.com/issues/61264#change-239348
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: reef v18.1.0 QE Validation status

2023-05-31 Thread Casey Bodley
On Tue, May 30, 2023 at 12:54 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/61515#note-1
> Release Notes - TBD
>
> Seeking approvals/reviews for:
>
> rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to
> merge https://github.com/ceph/ceph/pull/51788 for
> the core)
> rgw - Casey

the rgw suite had several new test_rgw_throttle.sh failures that i
haven't seen before:

qa/workunits/rgw/test_rgw_throttle.sh: line 3: ceph_test_rgw_throttle:
command not found

those only show up on rhel8 jobs, and none of your later reef runs fail this way

Yuri, is it possible that the suite-branch was mixed up somehow? the
ceph "sha1: be098f4642e7d4bbdc3f418c5ad703e23d1e9fe0" didn't match the
workunit "sha1: 4a02f3f496d9039326c49bf1fbe140388cd2f619"

> fs - Venky
> orch - Adam King
> rbd - Ilya
> krbd - Ilya
> upgrade/octopus-x - deprecated
> upgrade/pacific-x - known issues, Ilya, Laura?
> upgrade/reef-p2p - N/A
> clients upgrades - not run yet
> powercycle - Brad
> ceph-volume - in progress
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> gibba upgrade was done and will need to be done again this week.
> LRC upgrade TBD
>
> TIA
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: reef v18.1.0 QE Validation status

2023-06-01 Thread Casey Bodley
thanks Yuri,

i'm happy to approve rgw based on the latest run in
https://pulpito.ceph.com/yuriw-2023-05-31_19:25:20-rgw-reef-release-distro-default-smithi/.
there are still some failures that we're tracking, but nothing that
should block the rc

On Wed, May 31, 2023 at 3:22 PM Yuri Weinstein  wrote:
>
> Casey
>
> I will rerun rgw and we will see.
> Stay tuned.
>
> On Wed, May 31, 2023 at 10:27 AM Casey Bodley  wrote:
> >
> > On Tue, May 30, 2023 at 12:54 PM Yuri Weinstein  wrote:
> > >
> > > Details of this release are summarized here:
> > >
> > > https://tracker.ceph.com/issues/61515#note-1
> > > Release Notes - TBD
> > >
> > > Seeking approvals/reviews for:
> > >
> > > rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to
> > > merge https://github.com/ceph/ceph/pull/51788 for
> > > the core)
> > > rgw - Casey
> >
> > the rgw suite had several new test_rgw_throttle.sh failures that i
> > haven't seen before:
> >
> > qa/workunits/rgw/test_rgw_throttle.sh: line 3: ceph_test_rgw_throttle:
> > command not found
> >
> > those only show up on rhel8 jobs, and none of your later reef runs fail 
> > this way
> >
> > Yuri, is it possible that the suite-branch was mixed up somehow? the
> > ceph "sha1: be098f4642e7d4bbdc3f418c5ad703e23d1e9fe0" didn't match the
> > workunit "sha1: 4a02f3f496d9039326c49bf1fbe140388cd2f619"
> >
> > > fs - Venky
> > > orch - Adam King
> > > rbd - Ilya
> > > krbd - Ilya
> > > upgrade/octopus-x - deprecated
> > > upgrade/pacific-x - known issues, Ilya, Laura?
> > > upgrade/reef-p2p - N/A
> > > clients upgrades - not run yet
> > > powercycle - Brad
> > > ceph-volume - in progress
> > >
> > > Please reply to this email with approval and/or trackers of known
> > > issues/PRs to address them.
> > >
> > > gibba upgrade was done and will need to be done again this week.
> > > LRC upgrade TBD
> > >
> > > TIA
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> >
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW striping configuration.

2023-06-13 Thread Casey Bodley
radosgw's object striping does not repeat, so there is no concept of
'stripe width'. rgw_obj_stripe_size just controls the maximum size of
each rados object, so the 'stripe count' is essentially just the total
s3 object size divided by rgw_obj_stripe_size

On Tue, Jun 13, 2023 at 10:22 AM Teja A  wrote:
>
> Hello
>
> I am working on an application that uses rados gateway to store objects
> onto a ceph cluster. I am currently working on optimizing the latency for
> storing/retrieving objects from the cluster.
>
> My goal to improve read/write latencies is to have RGW write/read multiple
> rados objects in parallel as described here
>  -
> "Significant write performance occurs when the client writes the stripe
> units to their corresponding objects in parallel.". Just like with RAID0,
> by having a large number of rados objects that each radosgw object gets
> mapped to, we can achieve lower latencies as we are not bound by the
> throughput of a single disk.
>
> That documentation suggests that we can configure the stripe count as well
> as stripe width which would let us indirectly control how many rados
> objects each radosgw object gets mapped to. I want to be able to change
> these parameters and run benchmarks against my pools.
>
> The key parameter I am therefore interested in controlling is the stripe
> count (i.e. the number of distinct objects each radosgw object is mapped
> to). More specifically, in the diagram
> 
> attached to those docs, I see that the stripe_count is 4 (4 rados objects
> being written to for a single RGW object). I want to be able to experiment
> with varying numbers for that stripe_count.
>
> I am having trouble figuring out what configuration parameters exist in
> radosgw that lets me control this. I see that there is a stripe_width
> 
> and
> a rgw_max_chunk_size
> 
> but
> did not find anything for stripe_count. Configuring the stripe_width alone
> is not sufficient as I would need to set the stripe_unit size as well to
> get the desired number for stripe_count, but I did not find either.
>
> Am I understanding that correctly? If so, can someone please point me to
> where and how this configuration should be set?
>
> I appreciate your help!
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW accessing real source IP address of a client (e.g. in S3 bucket policies)

2023-06-15 Thread Casey Bodley
On Thu, Jun 15, 2023 at 7:23 AM Christian Rohmann
 wrote:
>
> Hello Ceph-Users,
>
> context or motivation of my question is S3 bucket policies and other
> cases using the source IP address as condition.
>
> I was wondering if and how RadosGW is able to access the source IP
> address of clients if receiving their connections via a loadbalancer /
> reverse proxy like HAProxy.
> So naturally that is where the connection originates from in that case,
> rendering a policy based on IP addresses useless.
>
> Depending on whether the connection balanced as HTTP or TCP there are
> two ways to carry information about the actual source:
>
>   * In case of HTTP via headers like "X-Forwarded-For". This is
> apparently supported only for logging the source in the "rgw ops log" ([1])?
> Or is this info used also when evaluating the source IP condition within
> a bucket policy?

yes, the aws:SourceIp condition key does use the value from
X-Forwarded-For when present

>
>   * In case of TCP loadbalancing, there is the proxy protocol v2. This
> unfortunately seems not even supposed by the BEAST library which RGW uses.
>  I opened feature requests ...
>
>   ** https://tracker.ceph.com/issues/59422
>   ** https://github.com/chriskohlhoff/asio/issues/1091
>   ** https://github.com/boostorg/beast/issues/2484
>
> but there is no outcome yet.
>
>
> Regards
>
>
> Christian
>
>
> [1]
> https://docs.ceph.com/en/quincy/radosgw/config-ref/#confval-rgw_remote_addr_param
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW accessing real source IP address of a client (e.g. in S3 bucket policies)

2023-06-16 Thread Casey Bodley
On Fri, Jun 16, 2023 at 2:55 AM Christian Rohmann
 wrote:
>
> On 15/06/2023 15:46, Casey Bodley wrote:
>
>   * In case of HTTP via headers like "X-Forwarded-For". This is
> apparently supported only for logging the source in the "rgw ops log" ([1])?
> Or is this info used also when evaluating the source IP condition within
> a bucket policy?
>
> yes, the aws:SourceIp condition key does use the value from
> X-Forwarded-For when present
>
> I have an HAProxy in front of the RGWs which has
>
> "option forwardfor" set  to add the "X-Forwarded-For" header.
>
> Then the RGWs have  "rgw remote addr param = http_x_forwarded_for" set,
> according to 
> https://docs.ceph.com/en/quincy/radosgw/config-ref/#confval-rgw_remote_addr_param
>
> and I also see remote_addr properly logged within the rgw ops log.
>
>
>
> But when applying a bucket policy with aws:SourceIp it seems to only work if 
> I set the internal IP of the HAProxy instance, not the public IP of the 
> client.
> So the actual remote address is NOT used in my case.
>
>
> Did I miss any config setting anywhere?
>
>
>
>
> Regards and thanks for your help
>
>
> Christian
>
>

your 'rgw remote addr param' config looks right. with that same
config, i was able to set a bucket policy that denied access based on
that X-Forwarded-For header:

$ cat bucketpolicy.json
{
"Version": "2012-10-17",
"Id": "S3PolicyId1",
"Statement": [
{
"Sid": "IPAllow",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::testbucket",
"arn:aws:s3:::testbucket/*"
],
"Condition": {
"IpAddress": {
"aws:SourceIp": "127.0.0.1"
}
}
}
]
}
$ s3cmd mb s3://testbucket
$ s3cmd setpolicy bucketpolicy.json s3://testbucket
$ s3cmd --add-header=X-Forwarded-For:127.0.0.2 put bucketpolicy.json
s3://testbucket
upload: 'bucketpolicy.json' -> 's3://testbucket/bucketpolicy.json'  [1 of 1]
 489 of 489   100% in0s42.95 KB/s  done
$ s3cmd --add-header=X-Forwarded-For:127.0.0.1 put bucketpolicy.json
s3://testbucket
upload: 'bucketpolicy.json' -> 's3://testbucket/bucketpolicy.json'  [1 of 1]
 489 of 489   100% in0s11.08 KB/s  done
ERROR: S3 error: 403 (AccessDenied)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Starting v17.2.5 RGW SSE with default key (likely others) no longer works

2023-06-19 Thread Casey Bodley
On Sat, Jun 17, 2023 at 1:11 PM Jayanth Reddy
 wrote:
>
> Hello Folks,
>
> I've been experimenting with RGW encryption and found this out.
> Focusing on Quincy and Reef dev, for the SSE (any methods) to work, transit
> has to be end to end encrypted, however if there is a proxy, then [1] can
> be made use to tell RGW that SSL is being terminated. As per docs, RGW can
> still continue to accept SSE if rgw_crypt_require_ssl is set to false as an
> overriding item for the requirement of encryption in transit. Below are my
> observations.
>
> Until v17.2.3 (
> quay.io/ceph/ceph@sha256:43f6e905f3e34abe4adbc9042b9d6f6b625dee8fa8d93c2bae53fa9b61c3df1a),
> setting the same key as in [2], would show the object unreadable when
> copied using
> # rados -p default.rgw.buckets.data get
> 03c2ef32-b7c8-4e18-8e0c-ebac10a42f10.17254.1_file.plain file.enc
> The object would be unreadable. The original object is in plain text.
> Ofcourse, with rgw_crypt_require_ssl to false or [1]
>
> However, starting with v17.2.4 onwards and even until my recent testing
> with reef-dev (18.0.0-4353-g1e3835ab
> 1e3835abb2d19ce6ac4149c260ef804f1041d751)
> When I try getting the same object onto the disk using rados command, the
> object (contains plain text) would still be readable.
>
> Has something changed since v17.2.4? I'll also test with Pacific and let
> you know. Not sure if it affects other SSE mechanisms as well.
>
> [1]
> https://docs.ceph.com/en/quincy/radosgw/config-ref/#confval-rgw_trust_forwarded_https
> [2]
> https://docs.ceph.com/en/quincy/radosgw/encryption/#automatic-encryption-for-testing-only
>
> Thanks,
> Jayanth Reddy
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

hi Jayanth,

17.2.4 coincides with backports of the SSE-S3 and PutBucketEncryption
features. those changes include a regression where the
rgw_crypt_default_encryption_key configurable no longer applies. you
can track the fix for this in https://tracker.ceph.com/issues/61473
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: header_limit in AsioFrontend class

2023-06-19 Thread Casey Bodley
On Sat, Jun 17, 2023 at 8:37 AM Vahideh Alinouri
 wrote:
>
> Dear Ceph Users,
>
> I am writing to request the backporting changes related to the
> AsioFrontend class and specifically regarding the header_limit value.
>
> In the Pacific release of Ceph, the header_limit value in the
> AsioFrontend class was set to 4096. From Quincy release, there has
> been a configurable option introduced to set the header_limit value
> and the default value is 16384.
>
> I would greatly appreciate it if someone from the Ceph development
> team backport this change to the older version.
>
> Best regards,
> Vahideh Alinouri
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

hi Vahideh, i've prepared that pacific backport. you can follow its
progress in https://tracker.ceph.com/issues/61728
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw new zonegroup hammers master with metadata sync

2023-06-20 Thread Casey Bodley
hi Boris,

we've been investigating reports of excessive polling from metadata
sync. i just opened https://tracker.ceph.com/issues/61743 to track
this. restarting the secondary zone radosgws should help as a
temporary workaround

On Tue, Jun 20, 2023 at 5:57 AM Boris Behrens  wrote:
>
> Hi,
> yesterday I added a new zonegroup and it looks like it seems to cycle over
> the same requests over and over again.
>
> In the log of the main zone I see these requests:
> 2023-06-20T09:48:37.979+ 7f8941fb3700  1 beast: 0x7f8a602f3700:
> fd00:2380:0:24::136 - - [2023-06-20T09:48:37.979941+] "GET
> /admin/log?type=metadata&id=62&period=e8fc96f1-ae86-4dc1-b432-470b0772fded&max-entries=100&&rgwx-zonegroup=b39392eb-75f8-47f0-b4f3-7d3882930b26
> HTTP/1.1" 200 44 - - -
>
> Only thing that changes is the &id.
>
> We have two other zonegroups that are configured identical (ceph.conf and
> period) and these don;t seem to spam the main rgw.
>
> root@host:~# radosgw-admin sync status
>   realm 5d6f2ea4-b84a-459b-bce2-bccac338b3ef (main)
>   zonegroup b39392eb-75f8-47f0-b4f3-7d3882930b26 (dc3)
>zone 96f5eca9-425b-4194-a152-86e310e91ddb (dc3)
>   metadata sync syncing
> full sync: 0/64 shards
> incremental sync: 64/64 shards
> metadata is caught up with master
>
> root@host:~# radosgw-admin period get
> {
> "id": "e8fc96f1-ae86-4dc1-b432-470b0772fded",
> "epoch": 92,
> "predecessor_uuid": "5349ac85-3d6d-4088-993f-7a1d4be3835a",
> "sync_status": [
> "",
> ...
> ""
> ],
> "period_map": {
> "id": "e8fc96f1-ae86-4dc1-b432-470b0772fded",
> "zonegroups": [
> {
> "id": "b39392eb-75f8-47f0-b4f3-7d3882930b26",
> "name": "dc3",
> "api_name": "dc3",
> "is_master": "false",
> "endpoints": [
> ],
> "hostnames": [
> ],
> "hostnames_s3website": [
> ],
> "master_zone": "96f5eca9-425b-4194-a152-86e310e91ddb",
> "zones": [
> {
> "id": "96f5eca9-425b-4194-a152-86e310e91ddb",
> "name": "dc3",
> "endpoints": [
> ],
> "log_meta": "false",
> "log_data": "false",
> "bucket_index_max_shards": 11,
> "read_only": "false",
> "tier_type": "",
> "sync_from_all": "true",
> "sync_from": [],
> "redirect_zone": ""
> }
> ],
> "placement_targets": [
> {
> "name": "default-placement",
> "tags": [],
> "storage_classes": [
> "STANDARD"
> ]
> }
> ],
> "default_placement": "default-placement",
> "realm_id": "5d6f2ea4-b84a-459b-bce2-bccac338b3ef",
> "sync_policy": {
> "groups": []
> }
> },
> ...
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Removing the encryption: (essentially decrypt) encrypted RGW objects

2023-06-22 Thread Casey Bodley
hi Jayanth,

i don't know that we have a supported way to do this. the
s3-compatible method would be to copy the object onto itself without
requesting server-side encryption. however, this wouldn't prevent
default encryption if rgw_crypt_default_encryption_key was still
enabled. furthermore, rgw has not implemented support for copying
encrypted objects, so this would fail for other forms of server-side
encryption too. this has been tracked in
https://tracker.ceph.com/issues/23264

On Sat, Jun 17, 2023 at 12:13 PM Jayanth Reddy
 wrote:
>
> Hello Users,
> We've a big cluster (Quincy) with almost 1.7 billion RGW objects, and we've
> enabled SSE on as per
> https://docs.ceph.com/en/quincy/radosgw/encryption/#automatic-encryption-for-testing-only
> (yes, we've chosen this insecure method to store the key)
> We're now in the process of implementing RGW multisite, but stuck due to
> https://tracker.ceph.com/issues/46062 and list at
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/PQW66JJ5DCRTH5XFGTRESF3XXTOSIWFF/#43RHLUVFYNSDLZPXXPZSSXEDX34KWGJX
>
> Was wondering if there is a way to decrypt the objects in-place with the
> applied symmetric key. I tried to remove
> the rgw_crypt_default_encryption_key from the mon configuration database
> (on a test cluster), but as expected, RGW daemons throw 500 server errors
> as it can not work on encrypted objects.
>
> There is a PR being worked on about introducing the command option at
> https://github.com/ceph/ceph/pull/51842 but it appears it takes some time
> to be merged.
>
> Cheers,
> Jayanth Reddy
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [multisite] The purpose of zonegroup

2023-06-30 Thread Casey Bodley
you're correct that the distinction is between metadata and data;
metadata like users and buckets will replicate to all zonegroups,
while object data only replicates within a single zonegroup. any given
bucket is 'owned' by the zonegroup that creates it (or overridden by
the LocationConstraint on creation). requests for data in that bucket
sent to other zonegroups should redirect to the zonegroup where it
resides

the ability to create multiple zonegroups can be useful in cases where
you want some isolation for the datasets, but a shared namespace of
users and buckets. you may have several connected sites sharing
storage, but only require a single backup for purposes of disaster
recovery. there it could make sense to create several zonegroups with
only two zones each to avoid replicating all objects to all zones

in other cases, it could make more sense to isolate things in separate
realms with a single zonegroup each. zonegroups just provide some
flexibility to control the isolation of data and metadata separately

On Thu, Jun 29, 2023 at 5:48 PM Yixin Jin  wrote:
>
> Hi folks,
> In the multisite environment, we can get one realm that contains multiple 
> zonegroups, each in turn can have multiple zones. However, the purpose of 
> zonegroup isn't clear to me. It seems that when a user is created, its 
> metadata is synced to all zones within the same realm, regardless whether 
> they are in different zonegroups or not. The same happens to buckets. 
> Therefore, what is the purpose of having zonegroups? Wouldn't it be easier to 
> just have realm and zones?
> Thanks,Yixin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [multisite] The purpose of zonegroup

2023-06-30 Thread Casey Bodley
cc Zac, who has been working on multisite docs in
https://tracker.ceph.com/issues/58632

On Fri, Jun 30, 2023 at 11:37 AM Alexander E. Patrakov
 wrote:
>
> Thanks! This is something that should be copy-pasted at the top of
> https://docs.ceph.com/en/latest/radosgw/multisite/
>
> Actually, I reported a documentation bug for something very similar.
>
> On Fri, Jun 30, 2023 at 11:30 PM Casey Bodley  wrote:
> >
> > you're correct that the distinction is between metadata and data;
> > metadata like users and buckets will replicate to all zonegroups,
> > while object data only replicates within a single zonegroup. any given
> > bucket is 'owned' by the zonegroup that creates it (or overridden by
> > the LocationConstraint on creation). requests for data in that bucket
> > sent to other zonegroups should redirect to the zonegroup where it
> > resides
> >
> > the ability to create multiple zonegroups can be useful in cases where
> > you want some isolation for the datasets, but a shared namespace of
> > users and buckets. you may have several connected sites sharing
> > storage, but only require a single backup for purposes of disaster
> > recovery. there it could make sense to create several zonegroups with
> > only two zones each to avoid replicating all objects to all zones
> >
> > in other cases, it could make more sense to isolate things in separate
> > realms with a single zonegroup each. zonegroups just provide some
> > flexibility to control the isolation of data and metadata separately
> >
> > On Thu, Jun 29, 2023 at 5:48 PM Yixin Jin  wrote:
> > >
> > > Hi folks,
> > > In the multisite environment, we can get one realm that contains multiple 
> > > zonegroups, each in turn can have multiple zones. However, the purpose of 
> > > zonegroup isn't clear to me. It seems that when a user is created, its 
> > > metadata is synced to all zones within the same realm, regardless whether 
> > > they are in different zonegroups or not. The same happens to buckets. 
> > > Therefore, what is the purpose of having zonegroups? Wouldn't it be 
> > > easier to just have realm and zones?
> > > Thanks,Yixin
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> --
> Alexander E. Patrakov
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Get bucket placement target

2023-07-03 Thread Casey Bodley
On Mon, Jul 3, 2023 at 6:52 AM mahnoosh shahidi  wrote:
>
> I think this part of the doc shows that LocationConstraint can override the
> placement and I can change the placement target with this field.
>
> When creating a bucket with the S3 protocol, a placement target can be
> > provided as part of the LocationConstraint to override the default
> > placement targets from the user and zonegroup.
>
>
>  I just want to get the value that I had set in the create bucket request.

thanks Mahnoosh, i opened a feature request for this at
https://tracker.ceph.com/issues/61887

>
> Best Regards,
> Mahnoosh
>
> On Mon, Jul 3, 2023 at 1:19 PM Konstantin Shalygin  wrote:
>
> > Hi,
> >
> > On 3 Jul 2023, at 12:23, mahnoosh shahidi  wrote:
> >
> > So clients can not get the value which they set in the LocationConstraint
> > field in the create bucket request as in this doc
> > ?
> >
> >
> > LocationConstraint in this case is a AZ [1], not the placement in Ceph
> > (OSD pool, compression settings)
> >
> >
> > [1] https://docs.openstack.org/neutron/rocky/admin/config-az.html
> > k
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [multisite] The purpose of zonegroup

2023-07-05 Thread Casey Bodley
thanks Yixin,

On Tue, Jul 4, 2023 at 1:20 PM Yixin Jin  wrote:
>
>  Hi Casey,
> Thanks a lot for the clarification. I feel that zonegroup made a great sense 
> at the beginning when multisite feature was conceived and (I suspect) zones 
> were always syncing from all other zones within a zonegroup. However, once 
> the "sync_from" was introduced and later the sync policy further enhanced the 
> granularity of the control over data sync, it seems not much advantage is 
> left with zonegroup.

both sync_from and sync policy do offer finer-grained control over
which zones sync from which, but they can't represent a bucket's
'residency' the way that the zonegroup-based LocationConstraint does.
by redirecting requests to the bucket's resident zonegroup, the goal
is to present a single eventually-consistent set of objects per
bucket. while features like sync_from and bucket replication policy do
complicate this picture, i think this concept of residency and
redirects are important to make sense of s3's LocationConstraint. but
perhaps the sync policy model could be extended to take over the
zonegroup's role here?

> Both "sync_from" and sync policy could be moved up to realm level while the 
> isolation of datasets can still be maintained. On the other hand, if some new 
> features are introduced to enable some isolation of metadata within the same 
> realm, probably at zonegroup level, its usefulness may be more justified.> 
> Regards,Yixin
>
> On Friday, June 30, 2023 at 11:29:16 a.m. EDT, Casey Bodley 
>  wrote:
>
>  you're correct that the distinction is between metadata and data;
> metadata like users and buckets will replicate to all zonegroups,
> while object data only replicates within a single zonegroup. any given
> bucket is 'owned' by the zonegroup that creates it (or overridden by
> the LocationConstraint on creation). requests for data in that bucket
> sent to other zonegroups should redirect to the zonegroup where it
> resides
>
> the ability to create multiple zonegroups can be useful in cases where
> you want some isolation for the datasets, but a shared namespace of
> users and buckets. you may have several connected sites sharing
> storage, but only require a single backup for purposes of disaster
> recovery. there it could make sense to create several zonegroups with
> only two zones each to avoid replicating all objects to all zones
>
> in other cases, it could make more sense to isolate things in separate
> realms with a single zonegroup each. zonegroups just provide some
> flexibility to control the isolation of data and metadata separately
>
> On Thu, Jun 29, 2023 at 5:48 PM Yixin Jin  wrote:
> >
> > Hi folks,
> > In the multisite environment, we can get one realm that contains multiple 
> > zonegroups, each in turn can have multiple zones. However, the purpose of 
> > zonegroup isn't clear to me. It seems that when a user is created, its 
> > metadata is synced to all zones within the same realm, regardless whether 
> > they are in different zonegroups or not. The same happens to buckets. 
> > Therefore, what is the purpose of having zonegroups? Wouldn't it be easier 
> > to just have realm and zones?
> > Thanks,Yixin
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW dynamic resharding blocks write ops

2023-07-07 Thread Casey Bodley
while a bucket is resharding, rgw will retry several times internally
to apply the write before returning an error to the client. while most
buckets can be resharded within seconds, very large buckets may hit
these timeouts. any other cause of slow osd ops could also have that
effect. it can be helpful to pre-shard very large buckets to avoid
these resharding delays

can you tell which error code was returned to the client there? it
should be a retryable error, and many http clients have retry logic to
prevent these errors from reaching the application

On Fri, Jul 7, 2023 at 6:35 AM Eugen Block  wrote:
>
> Hi *,
> last week I successfully upgraded a customer cluster from Nautilus to
> Pacific, no real issues, their main use is RGW. A couple of hours
> after most of the OSDs were upgraded (the RGWs were not yet) their
> application software reported an error, it couldn't write to a bucket.
> This error occured again two days ago, in the RGW logs I found the
> relevant messages that resharding was happening at that time. I'm
> aware that this is nothing unusual, but I can't find anything helpful
> how to prevent this except for deactivating dynamic resharding and
> then manually do it during maintenance windows. We don't know yet if
> there's really data missing after the bucket access has recovered or
> not, that still needs to be investigated. Since Nautilus already had
> dynamic resharding enabled, I wonder if they were just lucky until
> now, for example resharding happened while no data was being written
> to the buckets. Or if resharding just didn't happen until then, I have
> no access to the cluster so I don't have any bucket stats available
> right now. I found this thread [1] about an approach how to prevent
> blocked IO but it's from 2019 and I don't know how far that got.
>
> There are many users/operators on this list who use RGW more than me,
> how do you deal with this? Are your clients better prepared for these
> events? Any comments are appreciated!
>
> Thanks,
> Eugen
>
> [1]
> https://lists.ceph.io/hyperkitty/list/d...@ceph.io/thread/NG56XXAM5A4JONT4BGPQAZUTJAYMOSZ2/
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph quota qustion

2023-07-10 Thread Casey Bodley
On Mon, Jul 10, 2023 at 10:40 AM  wrote:
>
> Hi,
>
> yes, this is incomplete multiparts problem.
>
> Then, how do admin delete the incomplete multipart object?
> I mean
> 1. can admin find incomplete job and incomplete multipart object?
> 2. If first question is possible, then can admin delete all the job or object 
> at once?

you don't need to be an admin to do these things, they're part of the
S3 API. this cleanup can be automated using a bucket lifecycle policy
[1]. for manual cleanup with aws cli, for example, you can use 'aws
s3api list-multipart-uploads' [2] to discover all of the incomplete
uploads in a given bucket, and 'aws s3api abort-multipart-upload' [3]
to abort each of them. in addition to specifying the
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY environment variables, you'll
also need to point --endpoint-url at a running radosgw

[1] 
https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpu-abort-incomplete-mpu-lifecycle-config.html
[2] 
https://docs.aws.amazon.com/cli/latest/reference/s3api/list-multipart-uploads.html
[3] 
https://docs.aws.amazon.com/cli/latest/reference/s3api/abort-multipart-upload.html

> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Mgr/Dashboard Python depedencies: a new approach

2023-07-14 Thread Casey Bodley
On Wed, May 17, 2023 at 3:12 PM Ken Dreyer  wrote:
>
> Originally we had about a hundred packages in
> https://copr.fedorainfracloud.org/coprs/ceph/el9/ before they were
> wiped out in rhbz#2143742. I went back over the list of outstanding
> deps today. EPEL lacks only five packages now. I've built those into
> the Copr today.
>
> You can enable it with "dnf copr enable -y ceph/el9" . I think we
> should add this command to the container Dockerfile, Teuthology tasks,
> install-deps.sh, or whatever needs to run on el9 that is missing these
> packages.
>
> These tickets track moving the final five builds from the Copr into EPEL9:
>
> python-asyncssh - https://bugzilla.redhat.com/2196046

this one just moved to ON_QA

> python-pecan - https://bugzilla.redhat.com/2196045
> python-routes - https://bugzilla.redhat.com/2166620

pecan and routes are resolved

> python-repoze-lru - no BZ yet

Ken, do you know if there's any progress on this one?

> python-logutils - provide karma here:
> https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2023-6baae8389d

this one was resolved, https://bugzilla.redhat.com/show_bug.cgi?id=2196790

>
> I was interested to see almost all of these are already in progress .
> That final one (logutils) should go to EPEL's stable repo in a week
> (faster with karma).
>
> - Ken
>
>
>
>
> On Wed, Apr 26, 2023 at 11:00 AM Casey Bodley  wrote:
> >
> > are there any volunteers willing to help make these python packages
> > available upstream?
> >
> > On Tue, Mar 28, 2023 at 5:34 AM Ernesto Puerta  wrote:
> > >
> > > Hey Ken,
> > >
> > > This change doesn't not involve any further internet access other than 
> > > the already required for the "make dist" stage (e.g.: npm packages). That 
> > > said, where feasible, I also prefer to keep the current approach for a 
> > > minor version.
> > >
> > > Kind Regards,
> > > Ernesto
> > >
> > >
> > > On Mon, Mar 27, 2023 at 9:06 PM Ken Dreyer  wrote:
> > >>
> > >> I hope we don't backport such a big change to Quincy. That will have a
> > >> large impact on how we build in restricted environments with no
> > >> internet access.
> > >>
> > >> We could get the missing packages into EPEL.
> > >>
> > >> - Ken
> > >>
> > >> On Fri, Mar 24, 2023 at 7:32 AM Ernesto Puerta  
> > >> wrote:
> > >> >
> > >> > Hi Casey,
> > >> >
> > >> > The original idea was to leave this to Reef alone, but given that the 
> > >> > CentOS 9 Quincy release is also blocked by missing Python packages, I 
> > >> > think that it'd make sense to backport it.
> > >> >
> > >> > I'm coordinating with Pere (in CC) to expedite this. We may need help 
> > >> > to troubleshoot Shaman/rpmbuild issues. Who would be the best one to 
> > >> > help with that?
> > >> >
> > >> > Regarding your last question, I don't know who's the maintainer of 
> > >> > those packages in EPEL. There's this BZ 
> > >> > (https://bugzilla.redhat.com/2166620) requesting that specific 
> > >> > package, but that's only one out of the dozen of missing packages 
> > >> > (plus transitive dependencies)...
> > >> >
> > >> > Kind Regards,
> > >> > Ernesto
> > >> >
> > >> >
> > >> > On Thu, Mar 23, 2023 at 2:19 PM Casey Bodley  
> > >> > wrote:
> > >> >>
> > >> >> hi Ernesto and lists,
> > >> >>
> > >> >> > [1] https://github.com/ceph/ceph/pull/47501
> > >> >>
> > >> >> are we planning to backport this to quincy so we can support centos 9
> > >> >> there? enabling that upgrade path on centos 9 was one of the
> > >> >> conditions for dropping centos 8 support in reef, which i'm still keen
> > >> >> to do
> > >> >>
> > >> >> if not, can we find another resolution to
> > >> >> https://tracker.ceph.com/issues/58832? as i understand it, all of
> > >> >> those python packages exist in centos 8. do we know why they were
> > >> >> dropped for centos 9? have we looked into making those available in
> > >> >> epel? (cc Ken and Kaleb)
> > >> >>
> > &g

[ceph-users] Ceph Leadership Team Meeting, 2023-07-26 Minutes

2023-07-26 Thread Casey Bodley
Welcome to Aviv Caro as new Ceph NVMe-oF lead

Reef status:
* reef 18.1.3 built, gibba cluster upgraded, plan to publish this week
* https://pad.ceph.com/p/reef_final_blockers all resolved except for
bookworm builds https://tracker.ceph.com/issues/61845
* only blockers will merge to reef so the release matches final rc

Planning for distribution updates earlier in release process:
* centos 9 testing wasn't enabled for reef until very late
-- partly because of missing python dependencies
-- required fixes to test suites of every component so we couldn't
merge until everything was fixed
* also applies to major dependencies like boost and rocksdb
-- boost upgrade on main disrupted testing on other release branches
-- build containerization in CI would help a lot here. discussion
continues tomorrow in Ceph Infrastructure meeting

Improving the documentation/procedure for deploying a vstart cluster:
* including installation of dependencies and compilation
-- add test coverage on fresh distros to verify that all required
dependencies are installed
* README.md will be the canonical guide

CDS concluded yesterday:
* recordings at
https://ceph.io/en/community/events/2023/ceph-developer-summit-squid/
* component leads to update ceph backlog on trello
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ref v18.2.0 QE Validation status

2023-07-31 Thread Casey Bodley
On Sun, Jul 30, 2023 at 11:46 AM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/62231#note-1
>
> Seeking approvals/reviews for:
>
> smoke - Laura, Radek
> rados - Neha, Radek, Travis, Ernesto, Adam King
> rgw - Casey

the pacific upgrade test failed because it tried to use centos 9
packages. i reran the upgrade tests using
https://github.com/ceph/ceph/pull/52710 as the suite-branch and they
passed in 
https://pulpito.ceph.com/cbodley-2023-07-31_14:53:56-rgw:upgrade-main-distro-default-smithi/

rgw approved

> fs - Venky
> orch - Adam King
> rbd - Ilya
> krbd - Ilya
> upgrade-clients:client-upgrade* - in progress
> powercycle - Brad
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> bookworm distro support is an outstanding issue.
>
> TIA
> YuriW
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ref v18.2.0 QE Validation status

2023-07-31 Thread Casey Bodley
On Mon, Jul 31, 2023 at 11:38 AM Yuri Weinstein  wrote:
>
> Thx Casey
>
> If you agree I will merge https://github.com/ceph/ceph/pull/52710
> ?

yes please

>
> On Mon, Jul 31, 2023 at 8:34 AM Casey Bodley  wrote:
> >
> > On Sun, Jul 30, 2023 at 11:46 AM Yuri Weinstein  wrote:
> > >
> > > Details of this release are summarized here:
> > >
> > > https://tracker.ceph.com/issues/62231#note-1
> > >
> > > Seeking approvals/reviews for:
> > >
> > > smoke - Laura, Radek
> > > rados - Neha, Radek, Travis, Ernesto, Adam King
> > > rgw - Casey
> >
> > the pacific upgrade test failed because it tried to use centos 9
> > packages. i reran the upgrade tests using
> > https://github.com/ceph/ceph/pull/52710 as the suite-branch and they
> > passed in 
> > https://pulpito.ceph.com/cbodley-2023-07-31_14:53:56-rgw:upgrade-main-distro-default-smithi/
> >
> > rgw approved
> >
> > > fs - Venky
> > > orch - Adam King
> > > rbd - Ilya
> > > krbd - Ilya
> > > upgrade-clients:client-upgrade* - in progress
> > > powercycle - Brad
> > >
> > > Please reply to this email with approval and/or trackers of known
> > > issues/PRs to address them.
> > >
> > > bookworm distro support is an outstanding issue.
> > >
> > > TIA
> > > YuriW
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> >
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [ceph v16.2.10] radosgw crash

2023-08-16 Thread Casey Bodley
thanks Louis,

that looks like the same backtrace as
https://tracker.ceph.com/issues/61763. that issue has been on 'Need
More Info' because all of the rgw logging was disabled there. are you
able to share some more log output to help us figure this out?

under "--- begin dump of recent events ---", it dumps the last 1
lines of log output. i would love to see the final ~100 lines of that
output leading up to the crash (with any potentially-sensitive
information hidden)

On Tue, Aug 15, 2023 at 10:30 PM Louis Koo  wrote:
>
> 2023-08-15T18:15:55.356+ 7f7916ef3700 -1 *** Caught signal (Aborted) **
>  in thread 7f7916ef3700 thread_name:radosgw
>
>  ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific 
> (stable)
>  1: /lib64/libpthread.so.0(+0x12ce0) [0x7f79da065ce0]
>  2: gsignal()
>  3: abort()
>  4: /lib64/libstdc++.so.6(+0x9009b) [0x7f79d905809b]
>  5: /lib64/libstdc++.so.6(+0x9653c) [0x7f79d905e53c]
>  6: /lib64/libstdc++.so.6(+0x96597) [0x7f79d905e597]
>  7: /lib64/libstdc++.so.6(+0x9652e) [0x7f79d905e52e]
>  8: (spawn::detail::continuation_context::resume()+0x87) [0x7f79e4b70a17]
>  9: 
> (boost::asio::detail::executor_op  (*)(), boost::asio::strand >, 
> std::shared_lock
>  > >, std::tuple std::shared_lock
>  > > > >, std::allocator, 
> boost::asio::detail::scheduler_operation>::do_complete(void*, 
> boost::asio::detail::scheduler_operation*, boost::system::error_code const&, 
> unsigned long)+0x25a) [0x7f79e4b77b6a]
>  10: 
> (boost::asio::detail::strand_executor_service::invoker  const>::operator()()+0x8d) [0x7f79e4b7f93d]
>  11: 
> (boost::asio::detail::executor_op  const>, boost::asio::detail::recycling_allocator boost::asio::detail::thread_info_base::default_tag>, 
> boost::asio::detail::scheduler_operation>::do_complete(void*, 
> boost::asio::detail::scheduler_operation*, boost::system::error_code const&, 
> unsigned long)+0x96) [0x7f79e4b7fca6]
>  12: (boost::asio::detail::scheduler::run(boost::system::error_code&)+0x4f2) 
> [0x7f79e4b73ad2]
>  13: /lib64/libradosgw.so.2(+0x430376) [0x7f79e4b56376]
>  14: /lib64/libstdc++.so.6(+0xc2ba3) [0x7f79d908aba3]
>  15: /lib64/libpthread.so.0(+0x81ca) [0x7f79da05b1ca]
>  16: clone()
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
> interpret this.
>
> --- begin dump of recent events ---
>  -> 2023-08-15T18:14:25.987+ 7f79186f6700  2 req 11141123620750988595 
> 0.00106s s3:get_obj init permissions
>  -9998> 2023-08-15T18:14:25.987+ 7f79186f6700  2 req 11141123620750988595 
> 0.00106s s3:get_obj recalculating target
>  -9997> 2023-08-15T18:14:25.987+ 7f79186f6700  2 req 11141123620750988595 
> 0.00106s s3:get_obj reading permissions
>  -9996> 2023-08-15T18:14:25.988+ 7f79186f6700  2 req 11141123620750988595 
> 0.00213s s3:get_obj init op
>  -9995> 2023-08-15T18:14:25.988+ 7f79186f6700  2 req 11141123620750988595 
> 0.00213s s3:get_obj verifying op mask
>  -9994> 2023-08-15T18:14:25.988+ 7f79186f6700  2 req 11141123620750988595 
> 0.00213s s3:get_obj verifying op permissions
>  -9993> 2023-08-15T18:14:25.988+ 7f79186f6700  5 req 11141123620750988595 
> 0.00213s s3:get_obj Searching permissions for 
> identity=rgw::auth::SysReqApplier -> 
> rgw::auth::LocalApplier(acct_user=suliang, acct_name=suliang, subuser=, 
> perm_mask=15, is_admin=0) mask=49
>  -9992> 2023-08-15T18:14:25.988+ 7f79186f6700  5 req 11141123620750988595 
> 0.00213s s3:get_obj Searching permissions for uid=suliang
>  -9991> 2023-08-15T18:14:25.988+ 7f79186f6700  5 req 11141123620750988595 
> 0.00213s s3:get_obj Found permission: 15
>  -9990> 2023-08-15T18:14:25.988+ 7f79186f6700  5 req 11141123620750988595 
> 0.00213s s3:get_obj Searching permissions for group=1 mask=49
>  -9989> 2023-08-15T18:14:25.988+ 7f79186f6700  5 req 11141123620750988595 
> 0.00213s s3:get_obj Permissions for group not found
>  -9988> 2023-08-15T18:14:25.988+ 7f79186f6700  5 req 11141123620750988595 
> 0.00213s s3:get_obj Searching permissions for group=2 mask=49
>  -9987> 2023-08-15T18:14:25.988+ 7f79186f6700  5 req 11141123620750988595 
> 0.00213s s3:get_obj Permissions for group not found
>  -9986> 2023-08-15T18:14:25.988+ 7f79186f6700  5 req 11141123620750988595 
> 0.00213s s3:get_obj -- Getting permissions done for 
> identity=rgw::auth::SysReqApplier -> 
> rgw::auth::LocalApplier(acct_user=suliang, acct_name=suliang, subuser=, 
> perm_mask=15, is_admin=0), owner=suliang, perm=1
>  -9985> 2023-08-15T18:14:25.988+ 7f79186f6700  2 req 11141123620750988595 
> 0.00213s s3:get_obj verifying op params
>  -9984> 2023-08-15T18:14:25.988+ 7f79186f6700  2 req 11141123620750988595 
> 0.00213s s3:get_obj pre-executing
>  -9983> 2023-08-15T18:14:25.988+ 7f79186f6700  2 req 11141123620750988595 
> 0.00213s s3:get_obj executing
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send

[ceph-users] Re: Check allocated RGW bucket/object size after enabling Bluestore compression

2023-08-17 Thread Casey Bodley
On Thu, Aug 17, 2023 at 12:14 PM  wrote:
>
> Hello,
>
> Yes, I can see that there are metrics to check the size of the compressed 
> data stored in a pool with ceph df detail (relevant columns are USED COMPR 
> and UNDER COMPR)
>
> Also the size of compressed data can be checked on osd level using perf dump 
> (relevant values "bluestore_compressed_allocated": and 
> "bluestore_compressed_original")
>
> I would like to see the size of the compressed data per S3 bucket and not 
> only per pool or osd.
>
> Is that even possible ?

bluestore's compression is at a different layer than rgw's
compression: https://docs.ceph.com/en/reef/radosgw/compression/

'radosgw-admin bucket stats' can report on the latter (size vs.
size_utilized), but i don't believe rgw has any visibility into
bluestore's compression

>
> Thanks
> Yosr
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Rados object transformation

2023-08-23 Thread Casey Bodley
you could potentially create a cls_crypt object class that exposes
functions like crypt_read() and crypt_write() to do this. but your
application would have to use cls_crypt for all reads/writes instead
of the normal librados read/write operations. would that work for you?

On Wed, Aug 23, 2023 at 4:43 PM Yixin Jin  wrote:
>
> Hi folks,
> Is it possible through object classes to transform object content? For 
> example, I'd like this transformer to change the content of the object when 
> it is read and when it is written. In this way, I can potentially encrypt the 
> object content in storage without the need to make ceph/osd to do 
> encryption/decryption. It could be taken care of by the object class itself.
> Thanks,Yixin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.14 pacific QE validation status

2023-08-24 Thread Casey Bodley
On Wed, Aug 23, 2023 at 10:41 AM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/62527#note-1
> Release Notes - TBD
>
> Seeking approvals for:
>
> smoke - Venky
> rados - Radek, Laura
>   rook - Sébastien Han
>   cephadm - Adam K
>   dashboard - Ernesto
>
> rgw - Casey

rgw approved

> rbd - Ilya
> krbd - Ilya
> fs - Venky, Patrick
>
> upgrade/pacific-p2p - Laura
> powercycle - Brad (SELinux denials)
>
>
> Thx
> YuriW
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: openstack rgw swift -- reef vs quincy

2023-09-18 Thread Casey Bodley
thanks Shashi, this regression is tracked in
https://tracker.ceph.com/issues/62771. we're testing a fix

On Sat, Sep 16, 2023 at 7:32 PM Shashi Dahal  wrote:
>
> Hi All,
>
> We have 3 openstack clusters, each with their  own ceph.  The openstack
> versions are identical( using openstack-ansible) and all rgw-keystone
> related configs are also the same.
>
> The only difference is the ceph version  .. one is pacific, quincy while
> the other (new) one is reef.
>
> The issue with reef is:
>
> Horizon >> Object Storage >> Containers >> Create New Container
> In storage-policy , there is nothing in reef   vs   default-placement in
> quincy and pacific.
> without any policy selected ( due to the form being blank), the "submit"
> button to create the container is disabled.
>
> via openstack cli, we are able to create the container, and once created,
> we can use horizon to upload/download images etc.  When doing container
> show ( in the cli/horizon) it shows that the policy is default-placement.
>
> Can someone guide us on how to troubleshoot and correct this?
>
> --
> Cheers,
> Shashi
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: millions of hex 80 0_0000 omap keys in single index shard for single bucket

2023-09-20 Thread Casey Bodley
these keys starting with "<80>0_" appear to be replication log entries
for multisite. can you confirm that this is a multisite setup? is the
'bucket sync status' mostly caught up on each zone? in a healthy
multisite configuration, these log entries would eventually get
trimmed automatically

On Wed, Sep 20, 2023 at 7:08 PM Christopher Durham  wrote:
>
> I am using ceph 17.2.6 on Rocky 8.
> I have a system that started giving me large omap object warnings.
>
> I tracked this down to a specific index shard for a single s3 bucket.
>
> rados -p  listomapkeys .dir..bucketid.nn.shardid
> shows over 3 million keys for that shard. There are only about 2
> million objects in the entire bucket according to a listing of the bucket
> and radosgw-admin bucket stats --bucket bucketname. No other shard
> has anywhere near this many index objects. Perhaps it should be noted that 
> this
> shard is the highest numbered shard for this bucket. For a bucket with
> 16 shards, this is shard 15.
>
> If I look at the list of omapkeys generated, there are *many*
> beginning with "<80>0_", almost the entire set of the three + million
> keys in the shard. These are index objects in the so-called 'ugly' namespace. 
> The rest ofthey omapkeys appear to be normal.
>
> The 0_ after the <80> indicates some sort of 'bucket log index' according 
> to src/cls/rgw/cls_rgw.cc.
> However, using some sed magic previously discussed here, I ran:
>
> rados -p  getomapval .dir..bucketid.nn.shardid 
> --omap-key-file /tmp/key.txt
>
> Where /tmp/key.txt contains only the funny <80>0_ key name without a 
> newline
>
> The output of this shows, in a hex dump, the object name to which the index
> refers, which was at one time a valid object.
>
> However, that object no longer exists in the bucket, and based on expiration 
> policy, was
> previously deleted. Let's say, in the hex dump, that the object was:
>
> foo/bar/baz/object1.bin
>
> The prefix foo/bar/baz/ used to have 32 objects, say 
> foo/bar/baz/{object1.bin, object2.bin, ... }
> An s3api listing shows that those objects no longer exist (and that is OK, as 
> they  were previously deleted).
> BUT, now, there is a weirdo object left in the bucket:
>
> foo/bar/baz/ <- with the slash at the end, and it is an object not a PRE 
> (fix).
>
> All objects under foo/ have a 3 day lifecycle expiration. If I wait(at most) 
> 3 days, the weirdo object with '/'
> at the end will be deleted, or I can delete it manually using aws s3api. But 
> either way, the log index
> objects, <80>0_ remain.
>
> The bucket in question is heavily used. But with over 3 million of these 
> <80>0_ objects (and growing)
> in a single shard, I am currently at a loss as to what to do or how to stop 
> this from occuring.
> I've poked around at a few other buckets, and I found a few others that have 
> this problem, but not enoughto cause a large omap warning. (A few hundred 
> <80>0_000 index objects in a shard), no where near enoughto cause the 
> large omap warning that led me to this post.
>
> Any ideas?
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: millions of hex 80 0_0000 omap keys in single index shard for single bucket

2023-09-21 Thread Casey Bodley
On Thu, Sep 21, 2023 at 12:21 PM Christopher Durham  wrote:
>
>
> Hi Casey,
>
> This is indeed a multisite setup. The other side shows that for
>
> # radosgw-admin sync status
>
> the oldest incremental change not applied is about a minute old, and that is 
> consistent over a number of minutes, always the oldest incremental change a 
> minute or two old.
>
> However:
>
> # radosgw-admin bucket sync status --bucket bucket-in-question
>
> shows a number of shards always behind, although it varies.
>
> The number of objects on each side in that bucket is close, and  to this 
> point I have attributed that to the replication lag.
>
> One thing that came to mind is that the code that writes to say 
> foo/bar/baz/objects ...
>
> will often delete the objects quickly after creating them. Perhaps the 
> replication doesn't occur to
> the other side before they are deleted? Would that perhaps contribute to this?

sync should handle object deletion just fine. it'll see '404 Not
Found' errors when it tries to replicate them, and just continue on to
the next object. that shouldn't cause bucket sync to get stuck

>
> Not sure how this relates to the objects ending in '/' though, although they 
> are in the same prefix hierarchy.
>
> To get out of this situation, what do I need to do:
>
> 1. radosgw-admin bucket sync init --bucket bucket-in-question on both sides?

'bucket sync init' clears the bucket's sync status, but nothing would
trigger rgw to restart the sync on it. you could try 'bucket sync run'
instead, though it's not especially reliable until the reef release so
you may need to rerun the command several times before it catches up
completely. once the bucket sync catches up, the source zone's bilog
entries would be eligible for automatic trimming

> 2. manually delete the 0_ objects in rados? (yuk).

you can use the 'bilog trim' command on a bucket to delete its log
entries, but i'd only consider doing that if you're satisfied that all
of the objects you care about have already replicated

>
> I've done #1 before when I had the other side of a multi site down for awhile 
> before. I have not had that happen in the current situation (link down 
> between sites).
>
> Thanks for anything you or others can offer.

for rgw multisite users in particular, i highly recommend trying out
the reef release. in addition to multisite resharding support, we made
a lot of improvements to multisite stability/reliability that we won't
be able to backport to pacific/quincy

>
> -Chris
>
>
> On Wednesday, September 20, 2023 at 07:33:07 PM MDT, Casey Bodley 
>  wrote:
>
>
> these keys starting with "<80>0_" appear to be replication log entries
> for multisite. can you confirm that this is a multisite setup? is the
> 'bucket sync status' mostly caught up on each zone? in a healthy
> multisite configuration, these log entries would eventually get
> trimmed automatically
>
> On Wed, Sep 20, 2023 at 7:08 PM Christopher Durham  wrote:
> >
> > I am using ceph 17.2.6 on Rocky 8.
> > I have a system that started giving me large omap object warnings.
> >
> > I tracked this down to a specific index shard for a single s3 bucket.
> >
> > rados -p  listomapkeys .dir..bucketid.nn.shardid
> > shows over 3 million keys for that shard. There are only about 2
> > million objects in the entire bucket according to a listing of the bucket
> > and radosgw-admin bucket stats --bucket bucketname. No other shard
> > has anywhere near this many index objects. Perhaps it should be noted that 
> > this
> > shard is the highest numbered shard for this bucket. For a bucket with
> > 16 shards, this is shard 15.
> >
> > If I look at the list of omapkeys generated, there are *many*
> > beginning with "<80>0_", almost the entire set of the three + million
> > keys in the shard. These are index objects in the so-called 'ugly' 
> > namespace. The rest ofthey omapkeys appear to be normal.
> >
> > The 0_ after the <80> indicates some sort of 'bucket log index' 
> > according to src/cls/rgw/cls_rgw.cc.
> > However, using some sed magic previously discussed here, I ran:
> >
> > rados -p  getomapval .dir..bucketid.nn.shardid 
> > --omap-key-file /tmp/key.txt
> >
> > Where /tmp/key.txt contains only the funny <80>0_ key name without a 
> > newline
> >
> > The output of this shows, in a hex dump, the object name to which the index
> > refers, which was at one time a valid object.
> >
> > However, that object no longer e

[ceph-users] Re: S3website range requests - possible issue

2023-09-22 Thread Casey Bodley
hey Ondrej,

thanks for creating the tracker issue
https://tracker.ceph.com/issues/62938. i added a comment there, and
opened a fix in https://github.com/ceph/ceph/pull/53602 for the only
issue i was able to identify

On Wed, Sep 20, 2023 at 9:20 PM Ondřej Kukla  wrote:
>
> I was checking the tracker again and I found already fixed issue that seems 
> to be connected with this issue.
>
> https://tracker.ceph.com/issues/44508
>
> Here is the PR that fixes it https://github.com/ceph/ceph/pull/33807
>
> What I’m still not understanding is why this is only happening when using 
> s3website api.
>
> Is there someone who could shed some light on this?
>
> Regards,
>
> Ondrej
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: S3website range requests - possible issue

2023-09-22 Thread Casey Bodley
that first "read 0~4194304" is probably what i fixed in
https://github.com/ceph/ceph/pull/53602, but it's hard to tell from
osd log where these osd ops are coming from. why are there several
[read 1~10] requests after that? the rgw log would be more useful for
debugging, with --debug-rgw=20 and --debug-ms=1 to show the osd
ops/replies

On Fri, Sep 22, 2023 at 4:00 PM Ondřej Kukla  wrote:
>
> Hello Casey,
>
> Thanks a lot for that.
>
> I’ve forgot to mention that in my previous message that I was able to trigger 
> the prefetch by header bytes=1-10
>
> You can see the the read 1~10 in the osd logs I’ve sent here - 
> https://pastebin.com/nGQw4ugd
>
> Which is wierd as it seems that it is not the same you were able to replicate.
>
> Ondrej
>
> On 22. 9. 2023, at 21:52, Casey Bodley  wrote:
>
> hey Ondrej,
>
> thanks for creating the tracker issue
> https://tracker.ceph.com/issues/62938. i added a comment there, and
> opened a fix in https://github.com/ceph/ceph/pull/53602 for the only
> issue i was able to identify
>
> On Wed, Sep 20, 2023 at 9:20 PM Ondřej Kukla  wrote:
>
>
> I was checking the tracker again and I found already fixed issue that seems 
> to be connected with this issue.
>
> https://tracker.ceph.com/issues/44508
>
> Here is the PR that fixes it https://github.com/ceph/ceph/pull/33807
>
> What I’m still not understanding is why this is only happening when using 
> s3website api.
>
> Is there someone who could shed some light on this?
>
> Regards,
>
> Ondrej
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rgw: strong consistency for (bucket) policy settings?

2023-09-22 Thread Casey Bodley
each radosgw does maintain its own cache for certain metadata like
users and buckets. when one radosgw writes to a metadata object, it
broadcasts a notification (using rados watch/notify) to other radosgws
to update/invalidate their caches. the initiating radosgw waits for
all watch/notify responses before responding to the client. this way a
given client sees read-after-write consistency even if they read from
a different radosgw

On Fri, Sep 22, 2023 at 5:53 PM Matthias Ferdinand  wrote:
>
> On Tue, Sep 12, 2023 at 07:13:13PM +0200, Matthias Ferdinand wrote:
> > On Mon, Sep 11, 2023 at 02:37:59PM -0400, Matt Benjamin wrote:
> > > Yes, it's also strongly consistent.  It's also last writer wins, though, 
> > > so
> > > two clients somehow permitted to contend for updating policy could
> > > overwrite each other's changes, just as with objects.
>
> this would be a tremendous administrative bonus, but could also be a
> caching/performance problem.
>
> Amazon explicitly says they have eventual consistency for caching
> reasons:
> https://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_general.html#troubleshoot_general_eventual-consistency
>
> For Dell ECS I don't seem to find it mentioned in their docs, but they
> too are eventually consistent.
>
> I guess the bucket policies in Ceph get written to special rados
> objects (strongly consistent by design), but how are rgw daemons
> notified about these updates for immediate effect? Or do rgw daemons
> re-read the bucket policy for each and every request to this bucket?
>
> thanks in advance
> Matthias
>
> > > On Mon, Sep 11, 2023 at 2:21 PM Matthias Ferdinand 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > while I don't currently use rgw, I still am curious about consistency
> > > > guarantees.
> > > >
> > > > Usually, S3 has strong read-after-write consistency guarantees (for
> > > > requests that do not overlap). According to
> > > > https://docs.ceph.com/en/latest/dev/radosgw/bucket_index/
> > > > in Ceph this is also true for per-object ACLs.
> > > >
> > > > Is there also a strong consistency guarantee for (bucket) policies? The
> > > > documentation at
> > > > https://docs.ceph.com/en/latest/radosgw/bucketpolicy/
> > > > apparently does not say anything about this.
> > > >
> > > > How would multiple rgw instances synchronize a policy change? Is this
> > > > effective immediate with strong consistency or is there some propagation
> > > > delay (hopefully on with some upper bound)?
> > > >
> > > >
> > > > Best regards
> > > > Matthias
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > >
> > > >
> > >
> > > --
> > >
> > > Matt Benjamin
> > > Red Hat, Inc.
> > > 315 West Huron Street, Suite 140A
> > > Ann Arbor, Michigan 48103
> > >
> > > http://www.redhat.com/en/technologies/storage
> > >
> > > tel.  734-821-5101
> > > fax.  734-769-8938
> > > cel.  734-216-5309
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rgw: strong consistency for (bucket) policy settings?

2023-09-25 Thread Casey Bodley
On Sat, Sep 23, 2023 at 5:05 AM Matthias Ferdinand  wrote:
>
> On Fri, Sep 22, 2023 at 06:09:57PM -0400, Casey Bodley wrote:
> > each radosgw does maintain its own cache for certain metadata like
> > users and buckets. when one radosgw writes to a metadata object, it
> > broadcasts a notification (using rados watch/notify) to other radosgws
> > to update/invalidate their caches. the initiating radosgw waits for
> > all watch/notify responses before responding to the client. this way a
> > given client sees read-after-write consistency even if they read from
> > a different radosgw
>
>
> Very nice indeed. Does it completely eliminate any time window of
> incoherent behaviour among rgw daemons (one rgw applying old policy to
> requests, some other rgw already applying new policy), or will it just
> be a very short window?

this model only guarantees a strict ordering for the client that
writes. before we respond to the write request, there's a window where
other racing clients may either see the old or new bucket metadata

>
> thanks
> Matthias
>
> >
> > On Fri, Sep 22, 2023 at 5:53 PM Matthias Ferdinand  
> > wrote:
> > >
> > > On Tue, Sep 12, 2023 at 07:13:13PM +0200, Matthias Ferdinand wrote:
> > > > On Mon, Sep 11, 2023 at 02:37:59PM -0400, Matt Benjamin wrote:
> > > > > Yes, it's also strongly consistent.  It's also last writer wins, 
> > > > > though, so
> > > > > two clients somehow permitted to contend for updating policy could
> > > > > overwrite each other's changes, just as with objects.
> > >
> > > this would be a tremendous administrative bonus, but could also be a
> > > caching/performance problem.
> > >
> > > Amazon explicitly says they have eventual consistency for caching
> > > reasons:
> > > https://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_general.html#troubleshoot_general_eventual-consistency
> > >
> > > For Dell ECS I don't seem to find it mentioned in their docs, but they
> > > too are eventually consistent.
> > >
> > > I guess the bucket policies in Ceph get written to special rados
> > > objects (strongly consistent by design), but how are rgw daemons
> > > notified about these updates for immediate effect? Or do rgw daemons
> > > re-read the bucket policy for each and every request to this bucket?
> > >
> > > thanks in advance
> > > Matthias
> > >
> > > > > On Mon, Sep 11, 2023 at 2:21 PM Matthias Ferdinand 
> > > > > 
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > while I don't currently use rgw, I still am curious about 
> > > > > > consistency
> > > > > > guarantees.
> > > > > >
> > > > > > Usually, S3 has strong read-after-write consistency guarantees (for
> > > > > > requests that do not overlap). According to
> > > > > > https://docs.ceph.com/en/latest/dev/radosgw/bucket_index/
> > > > > > in Ceph this is also true for per-object ACLs.
> > > > > >
> > > > > > Is there also a strong consistency guarantee for (bucket) policies? 
> > > > > > The
> > > > > > documentation at
> > > > > > https://docs.ceph.com/en/latest/radosgw/bucketpolicy/
> > > > > > apparently does not say anything about this.
> > > > > >
> > > > > > How would multiple rgw instances synchronize a policy change? Is 
> > > > > > this
> > > > > > effective immediate with strong consistency or is there some 
> > > > > > propagation
> > > > > > delay (hopefully on with some upper bound)?
> > > > > >
> > > > > >
> > > > > > Best regards
> > > > > > Matthias
> > > > > > ___
> > > > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Matt Benjamin
> > > > > Red Hat, Inc.
> > > > > 315 West Huron Street, Suite 140A
> > > > > Ann Arbor, Michigan 48103
> > > > >
> > > > > http://www.redhat.com/en/technologies/storage
> > > > >
> > > > > tel.  734-821-5101
> > > > > fax.  734-769-8938
> > > > > cel.  734-216-5309
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: S3 user with more than 1000 buckets

2023-10-03 Thread Casey Bodley
On Tue, Oct 3, 2023 at 9:06 AM Thomas Bennett  wrote:
>
> Hi Jonas,
>
> Thanks :) that solved my issue.
>
> It would seem to me that this is heading towards something that the clients
> s3 should paginate, but I couldn't find any documentation on how to
> paginate bucket listings.

the s3 ListBuckets API
(https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListBuckets.html)
doesn't support pagination, so there's no way for clients to do that

but rgw itself should be able to paginate over the 'chunks' to return
more than rgw_list_buckets_max_chunk entries in a single ListBuckets
request. i opened a bug report for this at
https://tracker.ceph.com/issues/63080

> All the information points to paginating object
> listing - which makes sense.
>
> Just for competition of this thread:
>
> The rgw parameters are found at: Quincy radosgw config ref
> 
>
> I ran the following command to update the parameter for all running rgw
> daemons:
> ceph config set client.rgw rgw_list_buckets_max_chunk 1
>
> And then confirmed the running daemons were configured:
> ceph daemon /var/run/ceph/ceph-client.rgw.xxx.xxx.asok config show | grep
> rgw_list_buckets_max_chunk
> "rgw_list_buckets_max_chunk": "1",
>
> Kind regards,
> Tom
>
> On Tue, 3 Oct 2023 at 13:30, Jonas Nemeiksis  wrote:
>
> > Hi,
> >
> > You should increase these default settings:
> >
> > rgw_list_buckets_max_chunk // for buckets
> > rgw_max_listing_results // for objects
> >
> > On Tue, Oct 3, 2023 at 12:59 PM Thomas Bennett  wrote:
> >
> >> Hi,
> >>
> >> I'm running a Ceph 17.2.5 Rados Gateway and I have a user with more than
> >> 1000 buckets.
> >>
> >> When the client tries to list all their buckets using s3cmd, rclone and
> >> python boto3, they all three only ever return the first 1000 bucket names.
> >> I can confirm the buckets are all there (and more than 1000) by checking
> >> with the radosgw-admin command.
> >>
> >> Have I missed a pagination limit for listing user buckets in the rados
> >> gateway?
> >>
> >> Thanks,
> >> Tom
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >
> >
> > --
> > Jonas
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Next quincy point release 17.2.7

2023-10-05 Thread Casey Bodley
thanks Tobias, i see that https://github.com/ceph/ceph/pull/53414 had
a ton of test failures that don't look related. i'm working with Yuri
to reschedule them

On Thu, Oct 5, 2023 at 2:05 AM Tobias Urdin  wrote:
>
> Hello Yuri,
>
> On the RGW side I would very much like to get this [1] patch in that release
> that is already merged in reef [2] and pacific [3].
>
> Perhaps Casey can approve and merge that so you can bring it into your
> testing.
>
> Thanks!
>
> [1] https://github.com/ceph/ceph/pull/53414
> [2] https://github.com/ceph/ceph/pull/53413
> [3] https://github.com/ceph/ceph/pull/53416
>
> On 4 Oct 2023, at 22:57, Yuri Weinstein  wrote:
>
> Hello
>
> We are getting very close to the next Quincy point release 17.2.7
>
> Here is the list of must-have PRs https://pad.ceph.com/p/quincy_17.2.7_prs
> We will start the release testing/review/approval process as soon as
> all PRs from this list are merged.
>
> If you see something missing please speak up and the dev leads will
> make a decision on including it in this release.
>
> TIA
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [RGW] Is there a way for a user to change is secret key or create other keys ?

2023-10-09 Thread Casey Bodley
On Mon, Oct 9, 2023 at 9:16 AM Gilles Mocellin
 wrote:
>
> Hello Cephers,
>
> I was using Ceph with OpenStack, and users could add, remove credentials
> with `openstack ec2 credentials` commands.
> But, we are moving our Object Storage service to a new cluster, and
> didn't want to tie it with OpenStack.
>
> Is there a way to have a bit of self service for Rados Gateway, at leas
> for creating, deleting, changing S3 keys ?
>
> It does not seem to be part of S3 APIs.

right, user/role/key management is part of the IAM service in AWS, not
S3. IAM exposes APIs like
https://docs.aws.amazon.com/IAM/latest/APIReference/API_CreateAccessKey.html,
etc

radosgw supports some of the IAM APIs related to roles and role/user
policy, but not the ones for self-service user/key management. i'd
love to add those eventually once we have an s3 'account' feature to
base them on, but development there has been slow
(https://github.com/ceph/ceph/pull/46373 tracks the most recent
progress)

i'd agree that the radosgw admin APIs aren't a great fit because
they're targeted at admins, rather than delegating self-service
features to end users

> It's certainly doable with Ceph RGW admin API, but with which tool that
> a standard user can use ?
>
> The Ceph Dashboard does not seem a good idea. Roles are global, nothing
> that can be scoped to a tenant.
>
> Some S3 browsers exist (https://github.com/nimbis/s3commander), but
> never with some management like changing S3 keys.
> Certainly because it's not in the "standard" S3 API.
>
> Perhaps Ceph can provide a client-side dashboard, which can be exposed
> externally, aside the actual admin dashboard, which will stay inside ?
>
> Regards,
> --
> Gilles
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Copying big objects (>5GB) doesn't work after upgrade to Quincy on S3

2023-10-10 Thread Casey Bodley
hi Arvydas,

it looks like this change corresponds to
https://tracker.ceph.com/issues/48322 and
https://github.com/ceph/ceph/pull/38234. the intent was to enforce the
same limitation as AWS S3 and force clients to use multipart copy
instead. this limit is controlled by the config option
rgw_max_put_size which defaults to 5G. the same option controls other
operations like Put/PostObject, so i wouldn't recommend raising it as
a workaround for copy

this change really should have been mentioned in the release notes -
apologies for that omission

On Tue, Oct 10, 2023 at 10:58 AM Arvydas Opulskis  wrote:
>
> Hi all,
>
> after upgrading our cluster from Nautilus -> Pacific -> Quincy we noticed
> we can't copy bigger objects anymore via S3.
>
> An error we get:
> "Aws::S3::Errors::EntityTooLarge (Aws::S3::Errors::EntityTooLarge)"
>
> After some tests we have following findings:
> * Problems starts for objects bigger than 5 GB (multipart limit)
> * Issue starts after upgrading to Quincy (17.2.6). In latest Pacific
> (16.2.13) it works fine.
> * For Quincy it works ok with AWS S3 CLI "cp" command, but doesn't work
> using AWS Ruby3 SDK client with copy_object command.
> * For Pacific setup both clients work ok
> * From RGW logs seems like AWS S3 CLI client handles multipart copying
> "under the hood", so it is succesful.
>
> It is stated in AWS documentation, that for uploads (and copying) bigger
> than 5GB files we should use multi part API for AWS S3. For some reason it
> worked for years in Ceph and stopped working after Quincy release, even I
> couldn't find something in release notes addressing this change.
>
> So, is this change permanent and should be considered as bug fix?
>
> Both Pacific and Quincy clusters were running on Rocky 8.6 OS, using Beast
> frontend.
>
> Arvydas
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nothing provides libthrift-0.14.0.so()(64bit)

2023-10-10 Thread Casey Bodley
we're tracking this in https://tracker.ceph.com/issues/61882. my
understanding is that we're just waiting for the next quincy point
release builds to resolve this

On Tue, Oct 10, 2023 at 11:07 AM Graham Derryberry
 wrote:
>
> I have just started adding a ceph client on a rocky 9 system to our ceph
> cluster (we're on quincy 17.2.6) and just discovered that epel 9 now
> provides thrift-0.15.0-2.el9 not thrift-0.14.0-7.el9 as of June 21 2023.
> So the Nothing provides libthrift-0.14.0.so()(64bit) error has returned!
> Recommendations?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Dashboard and Object Gateway

2023-10-17 Thread Casey Bodley
hey Tim,

your changes to rgw_admin_entry probably aren't taking effect on the
running radosgws. you'd need to restart them in order to set up the
new route

there also seems to be some confusion about the need for a bucket
named 'default'. radosgw just routes requests with paths starting with
'/{rgw_admin_entry}' to a separate set of admin-related rest apis.
otherwise they fall back to the s3 api, which treats '/foo' as a
request for bucket foo - that's why you see NoSuchBucket errors when
it's misconfigured

also note that, because of how these apis are nested,
rgw_admin_entry='default' would prevent users from creating and
operating on a bucket named 'default'

On Tue, Oct 17, 2023 at 7:03 AM Tim Holloway  wrote:
>
> Thank you, Ondřej!
>
> Yes, I set the admin entry set to "default". It's just the latest
> result of failed attempts ("admin" didn't work for me either). I did
> say there were some horrors in there!
>
> If I got your sample URL pattern right, the results of a GET on
> "http://x.y.z/default"; return 404, NoSuchBucket. If that means that I
> didn't properly set rgw_enable_apis, then I probably don't know how to
> set it right.
>
>Best Regards,
>   Tim
>
> On Tue, 2023-10-17 at 08:35 +0200, Ondřej Kukla wrote:
> > Hello Tim,
> >
> > I was also struggling with this when I was configuring the object
> > gateway for the first time.
> >
> > There is a few things that you should check to make sure the
> > dashboard would work.
> >
> > 1. You need to have the admin api enabled on all rgws with the
> > rgw_enable_apis option. (As far as I know you are not able to force
> > the dashboard to use one rgw instance)
> > 2. It seems that you have the rgw_admin_entry set to a non default
> > value - the default is admin but it seems that you have “default" (by
> > the name of the bucket) make sure that you have this also set on all
> > rgws.
> >
> > You can confirm that both of these settings are set properly by
> > sending GET request to ${rgw-ip}:${port}/${rgw_admin_entry}
> > “default" in your case -> it should return 405 Method Not Supported
> >
> > Btw there is actually no bucket that you would be able to see in the
> > administration. It’s just abstraction on the rgw.
> >
> > Reagards,
> >
> > Ondrej
> >
> > > On 16. 10. 2023, at 22:00, Tim Holloway  wrote:
> > >
> > > First, an abject apology for the horrors I'm about to unveil. I
> > > made a
> > > cold migration from GlusterFS to Ceph a few months back, so it was
> > > a
> > > learn-/screwup/-as-you-go affair.
> > >
> > > For reasons of presumed compatibility with some of my older
> > > servers, I
> > > started with Ceph Octopus. Unfortunately, Octopus seems to have
> > > been a
> > > nexus of transitions from older Ceph organization and management to
> > > a
> > > newer (cephadm) system combined with a relocation of many ceph
> > > resources and compounded by stale bits of documentation (notably
> > > some
> > > references to SysV procedures and an obsolete installer that
> > > doesn't
> > > even come with Octopus).
> > >
> > > A far bigger problem was a known issue where actions would be
> > > scheduled
> > > but never executed if the system was even slightly dirty. And of
> > > course, since my system was hopelessly dirty, that was a major
> > > issue.
> > > Finally I took a risk and bumped up to Pacific, where that issue no
> > > longer exists. I won't say that I'm 100% clean even now, but at
> > > least
> > > the remaining crud is in areas where it cannot do any harm.
> > > Presumably.
> > >
> > > Given that, the only bar now remaining to total joy has been my
> > > inability to connect via the Ceph Dashboard to the Object Gateway.
> > >
> > > This seems to be an oft-reported problem, but generally referenced
> > > relative to higher-level administrative interfaces like Kubernetes
> > > and
> > > rook. I'm interfacing more directly, however. Regardless, the error
> > > reported is notably familiar:
> > >
> > > [quote]
> > > The Object Gateway Service is not configured
> > > Error connecting to Object Gateway: RGW REST API failed request
> > > with
> > > status code 404
> > > (b'{"Code":"NoSuchBucket","Message":"","BucketName":"default","Requ
> > > estI
> > > d":"tx00' b'000dd0c65b8bda685b4-00652d8e0f-5e3a9b-
> > > default","HostId":"5e3a9b-default-defa' b'ult"}')
> > > Please consult the documentation on how to configure and enable the
> > > Object Gateway management functionality.
> > > [/quote]
> > >
> > > In point of fact, what this REALLY means in my case is that the
> > > bucket
> > > that is supposed to contain the necessary information for the
> > > dashboard
> > > and rgw to communicate has not been created. Presumably that
> > > SHOULDhave
> > > been done by the "ceph dashboard set-rgw-credentials" command, but
> > > apparently isn't, because the default zone has no buckets at all,
> > > much
> > > less one named "default".
> > >
> > > By way of reference, the dashboard is definitely trying to interact
> > > with the rgw cont

[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-17 Thread Casey Bodley
On Mon, Oct 16, 2023 at 2:52 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/63219#note-2
> Release Notes - TBD
>
> Issue https://tracker.ceph.com/issues/63192 appears to be failing several 
> runs.
> Should it be fixed for this release?
>
> Seeking approvals/reviews for:
>
> smoke - Laura
> rados - Laura, Radek, Travis, Ernesto, Adam King
>
> rgw - Casey

rgw approved, thanks!

> fs - Venky
> orch - Adam King
>
> rbd - Ilya
> krbd - Ilya
>
> upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve
>
> client-upgrade-quincy-reef - Laura
>
> powercycle - Brad pls confirm
>
> ceph-volume - Guillaume pls take a look
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef release.
>
> Thx
> YuriW
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Dashboard and Object Gateway

2023-10-17 Thread Casey Bodley
you're right that many docs still mention ceph.conf, after the mimic
release added a centralized config database to ceph-mon. you can read
about the mon-based 'ceph config' commands in
https://docs.ceph.com/en/reef/rados/configuration/ceph-conf/#commands

to modify rgw_admin_entry for all radosgw instances, you'd use a command like:

$ ceph config set client.rgw rgw_admin_entry admin

then restart radosgws because they only read that value on startup

On Tue, Oct 17, 2023 at 9:54 AM Tim Holloway  wrote:
>
> Thanks, Casey!
>
> I'm not really certain where to set this option. While Ceph is very
> well-behaved once you know what to do, the nature of Internet-based
> documentation (and occasionally incompletely-updated manuals) is that
> stale information is often given equal weight to the obsolete
> information. It's a problem I had as support for JavaServer Faces, in
> fact. I spent literally years correcting people who'd got their
> examples from obsoleted sources.
>
> If I was to concoct a "Really, Really Newbies Intro to Ceph" I think
> that the two most fundamental items explained would be "Ceph as
> traditional services" versus "Ceph as Containerized services" (As far
> as I can tell, both are still viable but containerization - at least
> for me - is a preferable approach). And the ceph.conf file versus
> storing operational parameters within Ceph entities (e.g. buckets or
> pseudo-buckets like RGW is doing). While lots of stuff still reference
> ceph.conf for configuration, I'm feeling like it's actually no longer
> authoritative for some options, may be an alternative source for others
> (with which source has priority being unclear) and stuff that Ceph no
> longer even looks at because it has moved on.
>
> Such is my plight.
>
> I have no problem with making the administrative interface look
> "bucket-like". Or for that matter, having the RGW report it as a
> (missing) bucket if it isn't configured. But knowing where to inject
> the magic that activates that interface eludes me and whether to do it
> directly on the RGW container hos (and how) or on my master host is
> totally unclear to me. It doesn't help that this is an item that has
> multiple values, not just on/off or that by default the docs seem to
> imply it should be already preset to standard values out of the box.
>
>Thanks,
>   Tim
>
> On Tue, 2023-10-17 at 09:11 -0400, Casey Bodley wrote:
> > hey Tim,
> >
> > your changes to rgw_admin_entry probably aren't taking effect on the
> > running radosgws. you'd need to restart them in order to set up the
> > new route
> >
> > there also seems to be some confusion about the need for a bucket
> > named 'default'. radosgw just routes requests with paths starting
> > with
> > '/{rgw_admin_entry}' to a separate set of admin-related rest apis.
> > otherwise they fall back to the s3 api, which treats '/foo' as a
> > request for bucket foo - that's why you see NoSuchBucket errors when
> > it's misconfigured
> >
> > also note that, because of how these apis are nested,
> > rgw_admin_entry='default' would prevent users from creating and
> > operating on a bucket named 'default'
> >
> > On Tue, Oct 17, 2023 at 7:03 AM Tim Holloway 
> > wrote:
> > >
> > > Thank you, Ondřej!
> > >
> > > Yes, I set the admin entry set to "default". It's just the latest
> > > result of failed attempts ("admin" didn't work for me either). I
> > > did
> > > say there were some horrors in there!
> > >
> > > If I got your sample URL pattern right, the results of a GET on
> > > "http://x.y.z/default"; return 404, NoSuchBucket. If that means that
> > > I
> > > didn't properly set rgw_enable_apis, then I probably don't know how
> > > to
> > > set it right.
> > >
> > >Best Regards,
> > >   Tim
> > >
> > > On Tue, 2023-10-17 at 08:35 +0200, Ondřej Kukla wrote:
> > > > Hello Tim,
> > > >
> > > > I was also struggling with this when I was configuring the object
> > > > gateway for the first time.
> > > >
> > > > There is a few things that you should check to make sure the
> > > > dashboard would work.
> > > >
> > > > 1. You need to have the admin api enabled on all rgws with the
> > > > rgw_enable_apis option. (As far as I know you are not able to
> > > > force
> > > >

[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-18 Thread Casey Bodley
On Mon, Oct 16, 2023 at 2:52 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/63219#note-2
> Release Notes - TBD
>
> Issue https://tracker.ceph.com/issues/63192 appears to be failing several 
> runs.
> Should it be fixed for this release?
>
> Seeking approvals/reviews for:
>
> smoke - Laura
> rados - Laura, Radek, Travis, Ernesto, Adam King
>
> rgw - Casey
> fs - Venky
> orch - Adam King
>
> rbd - Ilya
> krbd - Ilya
>
> upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve

sorry, missed this part

these point-to-point upgrade tests are failing because they're running
s3-tests against older quincy releases that don't have fixes for the
bugs they're testing. we don't maintain separate tests for each point
release, so we can't expect these upgrade tests to pass in general

specifically:
test_post_object_wrong_bucket is failing because it requires the
17.2.7 fix from https://github.com/ceph/ceph/pull/53757
test_set_bucket_tagging is failing because it requires the 17.2.7 fix
from https://github.com/ceph/ceph/pull/50103

so the rgw failures are expected, but i can't tell whether they're
masking other important upgrade test coverage

>
> client-upgrade-quincy-reef - Laura
>
> powercycle - Brad pls confirm
>
> ceph-volume - Guillaume pls take a look
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef release.
>
> Thx
> YuriW
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Modify user op status=-125

2023-10-24 Thread Casey Bodley
errno 125 is ECANCELED, which is the code we use when we detect a
racing write. so it sounds like something else is modifying that user
at the same time. does it eventually succeed if you retry?

On Tue, Oct 24, 2023 at 9:21 AM mahnoosh shahidi
 wrote:
>
> Hi all,
>
> I couldn't understand what does the status -125 mean from the docs. I'm
> getting 500 response status code when I call rgw admin APIs and the only
> log in the rgw log files is as follows.
>
> s3:get_obj recalculating target
> initializing for trans_id =
> tx0aa90f570fb8281cf-006537bf9e-84395fa-default
> s3:get_obj reading permissions
> getting op 1
> s3:put_obj verifying requester
> s3:put_obj normalizing buckets and tenants
> s3:put_obj init permissions
> s3:put_obj recalculating target
> s3:put_obj reading permissions
> s3:put_obj init op
> s3:put_obj verifying op mask
> s3:put_obj verifying op permissions
> s3:put_obj verifying op params
> s3:put_obj pre-executing
> s3:put_obj executing
> :modify_user completing
> WARNING: set_req_state_err err_no=125 resorting to 500
> :modify_user op status=-125
> :modify_user http status=500
> == req done req=0x7f3f85a78620 op status=-125 http_status=500
> latency=0.076000459s ==
>
> Can anyone explain what this error means and why it's happening?
>
> Best Regards,
> Mahnoosh
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Modify user op status=-125

2023-10-24 Thread Casey Bodley
i don't suppose you're using sts roles with AssumeRole?
https://tracker.ceph.com/issues/59495 tracks a bug where each
AssumeRole request was writing to the user metadata unnecessarily,
which would race with your admin api requests

On Tue, Oct 24, 2023 at 9:56 AM mahnoosh shahidi
 wrote:
>
> Thanks Casey for your explanation,
>
> Yes it succeeded eventually. Sometimes after about 100 retries. It's odd that 
> it stays in racing condition for that much time.
>
> Best Regards,
> Mahnoosh
>
> On Tue, Oct 24, 2023 at 5:17 PM Casey Bodley  wrote:
>>
>> errno 125 is ECANCELED, which is the code we use when we detect a
>> racing write. so it sounds like something else is modifying that user
>> at the same time. does it eventually succeed if you retry?
>>
>> On Tue, Oct 24, 2023 at 9:21 AM mahnoosh shahidi
>>  wrote:
>> >
>> > Hi all,
>> >
>> > I couldn't understand what does the status -125 mean from the docs. I'm
>> > getting 500 response status code when I call rgw admin APIs and the only
>> > log in the rgw log files is as follows.
>> >
>> > s3:get_obj recalculating target
>> > initializing for trans_id =
>> > tx0aa90f570fb8281cf-006537bf9e-84395fa-default
>> > s3:get_obj reading permissions
>> > getting op 1
>> > s3:put_obj verifying requester
>> > s3:put_obj normalizing buckets and tenants
>> > s3:put_obj init permissions
>> > s3:put_obj recalculating target
>> > s3:put_obj reading permissions
>> > s3:put_obj init op
>> > s3:put_obj verifying op mask
>> > s3:put_obj verifying op permissions
>> > s3:put_obj verifying op params
>> > s3:put_obj pre-executing
>> > s3:put_obj executing
>> > :modify_user completing
>> > WARNING: set_req_state_err err_no=125 resorting to 500
>> > :modify_user op status=-125
>> > :modify_user http status=500
>> > == req done req=0x7f3f85a78620 op status=-125 http_status=500
>> > latency=0.076000459s ==
>> >
>> > Can anyone explain what this error means and why it's happening?
>> >
>> > Best Regards,
>> > Mahnoosh
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: owner locked out of bucket via bucket policy

2023-10-25 Thread Casey Bodley
if you have an administrative user (created with --admin), you should
be able to use its credentials with awscli to delete or overwrite this
bucket policy

On Wed, Oct 25, 2023 at 4:11 PM Wesley Dillingham  
wrote:
>
> I have a bucket which got injected with bucket policy which locks the
> bucket even to the bucket owner. The bucket now cannot be accessed (even
> get its info or delete bucket policy does not work) I have looked in the
> radosgw-admin command for a way to delete a bucket policy but do not see
> anything. I presume I will need to somehow remove the bucket policy from
> however it is stored in the bucket metadata / omap etc. If anyone can point
> me in the right direction on that I would appreciate it. Thanks
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: owner locked out of bucket via bucket policy

2023-10-25 Thread Casey Bodley
On Wed, Oct 25, 2023 at 4:59 PM Wesley Dillingham  
wrote:
>
> Thank you, I am not sure (inherited cluster). I presume such an admin user 
> created after-the-fact would work?

yes

> Is there a good way to discover an admin user other than iterate over all 
> users and retrieve user information? (I presume radosgw-admin user info 
> --uid=" would illustrate such administrative access?

not sure there's an easy way to search existing users, but you could
create a temporary admin user for this repair

>
> Respectfully,
>
> Wes Dillingham
> w...@wesdillingham.com
> LinkedIn
>
>
> On Wed, Oct 25, 2023 at 4:41 PM Casey Bodley  wrote:
>>
>> if you have an administrative user (created with --admin), you should
>> be able to use its credentials with awscli to delete or overwrite this
>> bucket policy
>>
>> On Wed, Oct 25, 2023 at 4:11 PM Wesley Dillingham  
>> wrote:
>> >
>> > I have a bucket which got injected with bucket policy which locks the
>> > bucket even to the bucket owner. The bucket now cannot be accessed (even
>> > get its info or delete bucket policy does not work) I have looked in the
>> > radosgw-admin command for a way to delete a bucket policy but do not see
>> > anything. I presume I will need to somehow remove the bucket policy from
>> > however it is stored in the bucket metadata / omap etc. If anyone can point
>> > me in the right direction on that I would appreciate it. Thanks
>> >
>> > Respectfully,
>> >
>> > *Wes Dillingham*
>> > w...@wesdillingham.com
>> > LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW access logs with bucket name

2023-10-30 Thread Casey Bodley
another option is to enable the rgw ops log, which includes the bucket
name for each request

the http access log line that's visible at log level 1 follows a known
apache format that users can scrape, so i've resisted adding extra
s3-specific stuff like bucket/object names there. there was some
recent discussion around this in
https://github.com/ceph/ceph/pull/50350, which had originally extended
that access log line

On Mon, Oct 30, 2023 at 6:03 AM Boris Behrens  wrote:
>
> Hi Dan,
>
> we are currently moving all the logging into lua scripts, so it is not an
> issue anymore for us.
>
> Thanks
>
> ps: the ceph analyzer is really cool. plusplus
>
> Am Sa., 28. Okt. 2023 um 22:03 Uhr schrieb Dan van der Ster <
> dan.vanders...@clyso.com>:
>
> > Hi Boris,
> >
> > I found that you need to use debug_rgw=10 to see the bucket name :-/
> >
> > e.g.
> > 2023-10-28T19:55:42.288+ 7f34dde06700 10 req 3268931155513085118
> > 0.0s s->object=... s->bucket=xyz-bucket-123
> >
> > Did you find a more convenient way in the meantime? I think we should
> > log bucket name at level 1.
> >
> > Cheers, Dan
> >
> > --
> > Dan van der Ster
> > CTO
> >
> > Clyso GmbH
> > p: +49 89 215252722 | a: Vancouver, Canada
> > w: https://clyso.com | e: dan.vanders...@clyso.com
> >
> > Try our Ceph Analyzer: https://analyzer.clyso.com
> >
> > On Thu, Mar 30, 2023 at 4:15 AM Boris Behrens  wrote:
> > >
> > > Sadly not.
> > > I only see the the path/query of a request, but not the hostname.
> > > So when a bucket is accessed via hostname (
> > https://bucket.TLD/object?query)
> > > I only see the object and the query (GET /object?query).
> > > When a bucket is accessed bia path (https://TLD/bucket/object?query) I
> > can
> > > see also the bucket in the log (GET bucket/object?query)
> > >
> > > Am Do., 30. März 2023 um 12:58 Uhr schrieb Szabo, Istvan (Agoda) <
> > > istvan.sz...@agoda.com>:
> > >
> > > > It has the full url begins with the bucket name in the beast logs http
> > > > requests, hasn’t it?
> > > >
> > > > Istvan Szabo
> > > > Staff Infrastructure Engineer
> > > > ---
> > > > Agoda Services Co., Ltd.
> > > > e: istvan.sz...@agoda.com
> > > > ---
> > > >
> > > > On 2023. Mar 30., at 17:44, Boris Behrens  wrote:
> > > >
> > > > Email received from the internet. If in doubt, don't click any link
> > nor
> > > > open any attachment !
> > > > 
> > > >
> > > > Bringing up that topic again:
> > > > is it possible to log the bucket name in the rgw client logs?
> > > >
> > > > currently I am only to know the bucket name when someone access the
> > bucket
> > > > via https://TLD/bucket/object instead of https://bucket.TLD/object.
> > > >
> > > > Am Di., 3. Jan. 2023 um 10:25 Uhr schrieb Boris Behrens  > >:
> > > >
> > > > Hi,
> > > >
> > > > I am looking forward to move our logs from
> > > >
> > > > /var/log/ceph/ceph-client...log to our logaggregator.
> > > >
> > > >
> > > > Is there a way to have the bucket name in the log file?
> > > >
> > > >
> > > > Or can I write the rgw_enable_ops_log into a file? Maybe I could work
> > with
> > > >
> > > > this.
> > > >
> > > >
> > > > Cheers and happy new year
> > > >
> > > > Boris
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> > im
> > > > groüen Saal.
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > >
> > > >
> > > > --
> > > > This message is confidential and is for the sole use of the intended
> > > > recipient(s). It may also be privileged or otherwise protected by
> > copyright
> > > > or other legal rules. If you have received it by mistake please let us
> > know
> > > > by reply email and delete it from your system. It is prohibited to copy
> > > > this message or disclose its content to anyone. Any confidentiality or
> > > > privilege is not waived or lost by any mistaken delivery or
> > unauthorized
> > > > disclosure of the message. All messages sent to and from Agoda may be
> > > > monitored to ensure compliance with company policies, to protect the
> > > > company's interests and to remove potential malware. Electronic
> > messages
> > > > may be intercepted, amended, lost or deleted, or contain viruses.
> > > >
> > >
> > >
> > > --
> > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> > > groüen Saal.
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe sen

[ceph-users] Ceph Leadership Team Meeting: 2023-11-1 Minutes

2023-11-01 Thread Casey Bodley
quincy 17.2.7: released!
* major 'dashboard v3' changes causing issues?
https://github.com/ceph/ceph/pull/54250 did not merge for 17.2.7
* planning a retrospective to discuss what kind of changes should go
in minor releases when members of the dashboard team are present

reef 18.2.1:
* most PRs already tested/merged
* possibly start validation next week?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-07 Thread Casey Bodley
On Mon, Nov 6, 2023 at 4:31 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/63443#note-1
>
> Seeking approvals/reviews for:
>
> smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE failures)
> rados - Neha, Radek, Travis, Ernesto, Adam King
> rgw - Casey

rgw results are approved. https://github.com/ceph/ceph/pull/54371
merged to reef but is needed on reef-release

> fs - Venky
> orch - Adam King
> rbd - Ilya
> krbd - Ilya
> upgrade/quincy-x (reef) - Laura PTL
> powercycle - Brad
> perf-basic - Laura, Prashant (POOL_APP_NOT_ENABLE failures)
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> TIA
> YuriW
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: owner locked out of bucket via bucket policy

2023-11-07 Thread Casey Bodley
On Tue, Nov 7, 2023 at 12:41 PM Jayanth Reddy
 wrote:
>
> Hello Wesley and Casey,
>
> We've ended up with the same issue and here it appears that even the user 
> with "--admin" isn't able to do anything. We're now unable to figure out if 
> it is due to bucket policies, ACLs or IAM of some sort. I'm seeing these IAM 
> errors in the logs
>
> ```
>
> Nov  7 00:02:00 ceph-05 radosgw[4054570]: req 8786689665323103851 
> 0.00368s s3:get_obj Error reading IAM Policy: Terminate parsing due to 
> Handler error.
>
> Nov  7 22:51:40 ceph-05 radosgw[4054570]: req 13293029267332025583 
> 0.0s s3:list_bucket Error reading IAM Policy: Terminate parsing due 
> to Handler error.

it's failing to parse the bucket policy document, but the error
message doesn't say what's wrong with it

disabling rgw_policy_reject_invalid_principals might help if it's
failing on the Principal

> Nov  7 22:51:40 ceph-05 radosgw[4054570]: req 13293029267332025583 
> 0.0s s3:list_bucket init_permissions on 
> :window-dev[1d0fa0b4-04eb-48f9-889b-a60de865ccd8.24143.10]) failed, ret=-13
> Nov  7 22:51:40 ceph-feed-05 radosgw[4054570]: req 13293029267332025583 
> 0.0s op->ERRORHANDLER: err_no=-13 new_err_no=-13
>
> ```
>
> Please help what's wrong here. We're in Ceph v17.2.7.
>
> Regards,
> Jayanth
>
> On Thu, Oct 26, 2023 at 7:14 PM Wesley Dillingham  
> wrote:
>>
>> Thank you, this has worked to remove the policy.
>>
>> Respectfully,
>>
>> *Wes Dillingham*
>> w...@wesdillingham.com
>> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>>
>>
>> On Wed, Oct 25, 2023 at 5:10 PM Casey Bodley  wrote:
>>
>> > On Wed, Oct 25, 2023 at 4:59 PM Wesley Dillingham 
>> > wrote:
>> > >
>> > > Thank you, I am not sure (inherited cluster). I presume such an admin
>> > user created after-the-fact would work?
>> >
>> > yes
>> >
>> > > Is there a good way to discover an admin user other than iterate over
>> > all users and retrieve user information? (I presume radosgw-admin user info
>> > --uid=" would illustrate such administrative access?
>> >
>> > not sure there's an easy way to search existing users, but you could
>> > create a temporary admin user for this repair
>> >
>> > >
>> > > Respectfully,
>> > >
>> > > Wes Dillingham
>> > > w...@wesdillingham.com
>> > > LinkedIn
>> > >
>> > >
>> > > On Wed, Oct 25, 2023 at 4:41 PM Casey Bodley  wrote:
>> > >>
>> > >> if you have an administrative user (created with --admin), you should
>> > >> be able to use its credentials with awscli to delete or overwrite this
>> > >> bucket policy
>> > >>
>> > >> On Wed, Oct 25, 2023 at 4:11 PM Wesley Dillingham <
>> > w...@wesdillingham.com> wrote:
>> > >> >
>> > >> > I have a bucket which got injected with bucket policy which locks the
>> > >> > bucket even to the bucket owner. The bucket now cannot be accessed
>> > (even
>> > >> > get its info or delete bucket policy does not work) I have looked in
>> > the
>> > >> > radosgw-admin command for a way to delete a bucket policy but do not
>> > see
>> > >> > anything. I presume I will need to somehow remove the bucket policy
>> > from
>> > >> > however it is stored in the bucket metadata / omap etc. If anyone can
>> > point
>> > >> > me in the right direction on that I would appreciate it. Thanks
>> > >> >
>> > >> > Respectfully,
>> > >> >
>> > >> > *Wes Dillingham*
>> > >> > w...@wesdillingham.com
>> > >> > LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>> > >> > ___
>> > >> > ceph-users mailing list -- ceph-users@ceph.io
>> > >> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> > >> >
>> > >>
>> >
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: owner locked out of bucket via bucket policy

2023-11-08 Thread Casey Bodley
i've opened https://tracker.ceph.com/issues/63485 to allow
admin/system users to override policy parsing errors like this. i'm
not sure yet where this parsing regression was introduced. in reef,
https://github.com/ceph/ceph/pull/49395 added better error messages
here, along with a rgw_policy_reject_invalid_principals option to be
strict about principal names


to remove a bucket policy that fails to parse with "Error reading IAM
Policy", you can follow these steps:

1. find the bucket's instance id using the 'bucket stats' command

$ radosgw-admin bucket stats --bucket {bucketname} | grep id

2. use the rados tool to remove the bucket policy attribute
(user.rgw.iam-policy) from the bucket instance metadata object

$ rados -p default.rgw.meta -N root rmxattr
.bucket.meta.{bucketname}:{bucketid} user.rgw.iam-policy

3. radosgws may be caching the existing bucket metadata and xattrs, so
you'd either need to restart them or clear their metadata caches

$ ceph daemon client.rgw.xyz cache zap

On Wed, Nov 8, 2023 at 9:06 AM Jayanth Reddy  wrote:
>
> Hello Wesley,
> Thank you for the response. I tried the same but ended up with 403.
>
> Regards,
> Jayanth
>
> On Wed, Nov 8, 2023 at 7:34 PM Wesley Dillingham  
> wrote:
>>
>> Jaynath:
>>
>> Just to be clear with the "--admin" user's key's you have attempted to 
>> delete the bucket policy using the following method: 
>> https://docs.aws.amazon.com/cli/latest/reference/s3api/delete-bucket-policy.html
>>
>> This is what worked for me (on a 16.2.14 cluster). I didn't attempt to 
>> interact with the affected bucket in any way other than "aws s3api 
>> delete-bucket-policy"
>>
>> Respectfully,
>>
>> Wes Dillingham
>> w...@wesdillingham.com
>> LinkedIn
>>
>>
>> On Wed, Nov 8, 2023 at 8:30 AM Jayanth Reddy  
>> wrote:
>>>
>>> Hello Casey,
>>>
>>> We're totally stuck at this point and none of the options seem to work. 
>>> Please let us know if there is something in metadata or index to remove 
>>> those applied bucket policies. We downgraded to v17.2.6 and encountering 
>>> the same.
>>>
>>> Regards,
>>> Jayanth
>>>
>>> On Wed, Nov 8, 2023 at 7:14 AM Jayanth Reddy  
>>> wrote:
>>>>
>>>> Hello Casey,
>>>>
>>>> And on further inspection, we identified that there were bucket policies 
>>>> set from the initial days; we were in v16.2.12.
>>>> We upgraded the cluster to v17.2.7 two days ago and it seems obvious that 
>>>> the IAM error logs are generated the next minute rgw daemon upgraded from 
>>>> v16.2.12 to v17.2.7. Looks like there is some issue with parsing.
>>>>
>>>> I'm thinking to downgrade back to v17.2.6 and earlier, please let me know 
>>>> if this is a good option for now.
>>>>
>>>> Thanks,
>>>> Jayanth
>>>> 
>>>> From: Jayanth Reddy 
>>>> Sent: Tuesday, November 7, 2023 11:59:38 PM
>>>> To: Casey Bodley 
>>>> Cc: Wesley Dillingham ; ceph-users 
>>>> ; Adam Emerson 
>>>> Subject: Re: [ceph-users] Re: owner locked out of bucket via bucket policy
>>>>
>>>> Hello Casey,
>>>>
>>>> Thank you for the quick response. I see 
>>>> `rgw_policy_reject_invalid_principals` is not present in v17.2.7. Please 
>>>> let me know.
>>>>
>>>> Regards
>>>> Jayanth
>>>>
>>>> On Tue, Nov 7, 2023 at 11:50 PM Casey Bodley  wrote:
>>>>
>>>> On Tue, Nov 7, 2023 at 12:41 PM Jayanth Reddy
>>>>  wrote:
>>>> >
>>>> > Hello Wesley and Casey,
>>>> >
>>>> > We've ended up with the same issue and here it appears that even the 
>>>> > user with "--admin" isn't able to do anything. We're now unable to 
>>>> > figure out if it is due to bucket policies, ACLs or IAM of some sort. 
>>>> > I'm seeing these IAM errors in the logs
>>>> >
>>>> > ```
>>>> >
>>>> > Nov  7 00:02:00 ceph-05 radosgw[4054570]: req 8786689665323103851 
>>>> > 0.00368s s3:get_obj Error reading IAM Policy: Terminate parsing due 
>>>> > to Handler error.
>>>> >
>>>> > Nov  7 22:51:40 ceph-05 radosgw[4054570]: req 13293029267332025583 
>>>> > 0.0s s3:list_b

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-09 Thread Casey Bodley
On Wed, Nov 8, 2023 at 11:10 AM Yuri Weinstein  wrote:
>
> We merged 3 PRs and rebuilt "reef-release" (Build 2)
>
> Seeking approvals/reviews for:
>
> smoke - Laura, Radek 2 jobs failed in "objectstore/bluestore" tests
> (see Build 2)
> rados - Neha, Radek, Travis, Ernesto, Adam King
> rgw - Casey reapprove on Build 2

rgw reapproved

> fs - Venky, approve on Build 2
> orch - Adam King
> upgrade/quincy-x (reef) - Laura PTL
> powercycle - Brad (known issues)
>
> We need to close
> https://tracker.ceph.com/issues/63391
> (https://github.com/ceph/ceph/pull/54392) - Travis, Guillaume
> https://tracker.ceph.com/issues/63151 - Adam King do we need anything for 
> this?
>
> On Wed, Nov 8, 2023 at 6:33 AM Travis Nielsen  wrote:
> >
> > Yuri, we need to add this issue as a blocker for 18.2.1. We discovered this 
> > issue after the release of 17.2.7, and don't want to hit the same blocker 
> > in 18.2.1 where some types of OSDs are failing to be created in new 
> > clusters, or failing to start in upgraded clusters.
> > https://tracker.ceph.com/issues/63391
> >
> > Thanks!
> > Travis
> >
> > On Wed, Nov 8, 2023 at 4:41 AM Venky Shankar  wrote:
> >>
> >> Hi Yuri,
> >>
> >> On Wed, Nov 8, 2023 at 2:32 AM Yuri Weinstein  wrote:
> >> >
> >> > 3 PRs above mentioned were merged and I am returning some tests:
> >> > https://pulpito.ceph.com/?sha1=55e3239498650453ff76a9b06a37f1a6f488c8fd
> >> >
> >> > Still seeing approvals.
> >> > smoke - Laura, Radek, Prashant, Venky in progress
> >> > rados - Neha, Radek, Travis, Ernesto, Adam King
> >> > rgw - Casey in progress
> >> > fs - Venky
> >>
> >> There's a failure in the fs suite
> >>
> >> 
> >> https://pulpito.ceph.com/vshankar-2023-11-07_05:14:36-fs-reef-release-distro-default-smithi/7450325/
> >>
> >> Seems to be related to nfs-ganesha. I've reached out to Frank Filz
> >> (#cephfs on ceph slack) to have a look. WIll update as soon as
> >> possible.
> >>
> >> > orch - Adam King
> >> > rbd - Ilya approved
> >> > krbd - Ilya approved
> >> > upgrade/quincy-x (reef) - Laura PTL
> >> > powercycle - Brad
> >> > perf-basic - in progress
> >> >
> >> >
> >> > On Tue, Nov 7, 2023 at 8:38 AM Casey Bodley  wrote:
> >> > >
> >> > > On Mon, Nov 6, 2023 at 4:31 PM Yuri Weinstein  
> >> > > wrote:
> >> > > >
> >> > > > Details of this release are summarized here:
> >> > > >
> >> > > > https://tracker.ceph.com/issues/63443#note-1
> >> > > >
> >> > > > Seeking approvals/reviews for:
> >> > > >
> >> > > > smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE failures)
> >> > > > rados - Neha, Radek, Travis, Ernesto, Adam King
> >> > > > rgw - Casey
> >> > >
> >> > > rgw results are approved. https://github.com/ceph/ceph/pull/54371
> >> > > merged to reef but is needed on reef-release
> >> > >
> >> > > > fs - Venky
> >> > > > orch - Adam King
> >> > > > rbd - Ilya
> >> > > > krbd - Ilya
> >> > > > upgrade/quincy-x (reef) - Laura PTL
> >> > > > powercycle - Brad
> >> > > > perf-basic - Laura, Prashant (POOL_APP_NOT_ENABLE failures)
> >> > > >
> >> > > > Please reply to this email with approval and/or trackers of known
> >> > > > issues/PRs to address them.
> >> > > >
> >> > > > TIA
> >> > > > YuriW
> >> > > > ___
> >> > > > ceph-users mailing list -- ceph-users@ceph.io
> >> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >> > > >
> >> > >
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >>
> >>
> >> --
> >> Cheers,
> >> Venky
> >> ___
> >> Dev mailing list -- d...@ceph.io
> >> To unsubscribe send an email to dev-le...@ceph.io
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW: user modify default_storage_class does not work

2023-11-13 Thread Casey Bodley
my understanding is that default placement is stored at the bucket
level, so changes to the user's default placement only take effect for
newly-created buckets

On Sun, Nov 12, 2023 at 9:48 PM Huy Nguyen  wrote:
>
> Hi community,
> I'm using Ceph version 16.2.13. I tried to set default_storage_class but 
> seems like it didn't work.
>
> Here is steps I did:
> I already had a storage class name COLD, then I modify the user 
> default_storage_class like this:
> radosgw-admin user modify --uid testuser --placement-id default-placement 
> --storage-class COLD
>
> after that, user info has show correctly:
> radosgw-admin user info --uid testuser
> {
> ...
> "op_mask": "read, write, delete",
> "default_placement": "default-placement",
> "default_storage_class": "COLD",
> ...
>
> Then I put a file using boto3, without specify any storage class:
> s3.Object(bucket_name, 'testdefault-object').put(Body="0"*1000))
>
> But the object still jump into the STANDARD storage class. I don't know if 
> this is a bug or did I miss something?
>
> Thanks
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Help on rgw metrics (was rgw_user_counters_cache)

2024-01-31 Thread Casey Bodley
On Wed, Jan 31, 2024 at 3:43 AM garcetto  wrote:
>
> good morning,
>   i was struggling trying to understand why i cannot find this setting on
> my reef version, is it because is only on latest dev ceph version and not
> before?

that's right, this new feature will be part of the squid release. we
don't plan to backport it to reef

>
> https://docs.ceph.com/en/*latest*
> /radosgw/metrics/#user-bucket-counter-caches
>
> Reef gives 404
> https://docs.ceph.com/en/reef/radosgw/metrics/
>
> thank you!
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific 16.2.15 QE validation status

2024-01-31 Thread Casey Bodley
On Mon, Jan 29, 2024 at 4:39 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/64151#note-1
>
> Seeking approvals/reviews for:
>
> rados - Radek, Laura, Travis, Ernesto, Adam King
> rgw - Casey

rgw approved, thanks

> fs - Venky
> rbd - Ilya
> krbd - in progress
>
> upgrade/nautilus-x (pacific) - Casey PTL (regweed tests failed)
> upgrade/octopus-x (pacific) - Casey PTL (regweed tests failed)
>
> upgrade/pacific-x (quincy) - in progress
> upgrade/pacific-p2p - Ilya PTL (maybe rbd related?)
>
> ceph-volume - Guillaume
>
> TIA
> YuriW
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-02-02 Thread Casey Bodley
On Fri, Feb 2, 2024 at 11:21 AM Chris Palmer  wrote:
>
> Hi Matthew
>
> AFAIK the upgrade from quincy/deb11 to reef/deb12 is not possible:
>
>   * The packaging problem you can work around, and a fix is pending
>   * You have to upgrade both the OS and Ceph in one step
>   * The MGR will not run under deb12 due to the PyO3 lack of support for
> subinterpreters.
>
> If you do attempt an upgrade, you will end up stuck with a partially
> upgraded cluster. The MONs will be on deb12/reef and cannot be
> downgraded, and the MGR will be stuck on deb11/quincy, We have a test
> cluster in that state with no way forward or back.
>
> I fear the MGR problem will spread as time goes on and PyO3 updates
> occur. And it's not good that it can silently corrupt in the existing
> apparently-working installations.
>
> No-one has picked up issue 64213 that I raised yet.
>
> I'm tempted to raise another issue for qa : the debian 12 package cannot
> have been tested as it just won't work either as an upgrade or a new
> install.

you're right that the debian packages don't get tested:

https://docs.ceph.com/en/reef/start/os-recommendations/#platforms

>
> Regards, Chris
>
>
> On 02/02/2024 14:40, Matthew Darwin wrote:
> > Chris,
> >
> > Thanks for all the investigations you are doing here. We're on
> > quincy/debian11.  Is there any working path at this point to
> > reef/debian12?  Ideally I want to go in two steps.  Upgrade ceph first
> > or upgrade debian first, then do the upgrade to the other one. Most of
> > our infra is already upgraded to debian 12, except ceph.
> >
> > On 2024-01-29 07:27, Chris Palmer wrote:
> >> I have logged this as https://tracker.ceph.com/issues/64213
> >>
> >> On 16/01/2024 14:18, DERUMIER, Alexandre wrote:
> >>> Hi,
> >>>
> > ImportError: PyO3 modules may only be initialized once per
> > interpreter
> > process
> >
> > and ceph -s reports "Module 'dashboard' has failed dependency: PyO3
> > modules may only be initialized once per interpreter process
> >>> We have the same problem on proxmox8 (based on debian12) with ceph
> >>> quincy or reef.
> >>>
> >>> It seem to be related to python version on debian12
> >>>
> >>> (we have no fix for this currently)
> >>>
> >>>
> >>>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-08 Thread Casey Bodley
thanks, i've created https://tracker.ceph.com/issues/64360 to track
these backports to pacific/quincy/reef

On Thu, Feb 8, 2024 at 7:50 AM Stefan Kooman  wrote:
>
> Hi,
>
> Is this PR: https://github.com/ceph/ceph/pull/54918 included as well?
>
> You definitely want to build the Ubuntu / debian packages with the
> proper CMAKE_CXX_FLAGS. The performance impact on RocksDB is _HUGE_.
>
> Thanks,
>
> Gr. Stefan
>
> P.s. Kudos to Mark Nelson for figuring it out / testing.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to solve data fixity

2024-02-09 Thread Casey Bodley
i've cc'ed Matt who's working on the s3 object integrity feature
https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html,
where rgw compares the generated checksum with the client's on ingest,
then stores it with the object so clients can read it back for later
integrity checks. you can track the progress in
https://tracker.ceph.com/issues/63951

On Fri, Feb 9, 2024 at 8:49 AM Josh Baergen  wrote:
>
> MPU etags are an MD5-of-MD5s, FWIW. If the users knows how the parts are
> uploaded then it can be used to verify contents, both just after upload and
> then at download time (both need to be validated if you want end-to-end
> validation - but then you're trusting the system to not change the etag
> underneath you).
>
> Josh
>
> On Fri, Feb 9, 2024, 6:16 a.m. Michal Strnad 
> wrote:
>
> > Thank you for your response.
> >
> > We have already done some Lua scripting in the past, and it wasn't
> > entirely enjoyable :-), but we may have to do it again. Scrubbing is
> > still enabled, and turning it off definitely won't be an option.
> > However, due to the project requirements, it would be great if
> > Ceph could, on upload completion, initiate and compute hash (
> > md5, sha256) and store it to object's metadata, so that user later
> > could validate if the downloaded data are correct.
> >
> > We can't use Etag for that as it is does not contain md5 in case of
> > multipart upload.
> >
> > Michal
> >
> >
> > On 2/9/24 13:53, Anthony D'Atri wrote:
> > > You could use Lua scripting perhaps to do this at ingest, but I'm very
> > curious about scrubs -- you have them turned off completely?
> > >
> > >
> > >> On Feb 9, 2024, at 04:18, Michal Strnad 
> > wrote:
> > >>
> > >> Hi all!
> > >>
> > >> In the context of a repository-type project, we need to address a
> > situation where we cannot use periodic checks in Ceph (scrubbing) due to
> > the project's nature. Instead, we need the ability to write a checksum into
> > the metadata of the uploaded file via API. In this context, we are not
> > concerned about individual file parts, but rather the file as a whole.
> > Users will calculate the checksum and write it. Based on this hash, we
> > should be able to trigger a check of the given files. We are aware that
> > tools like s3cmd can write MD5 hashes to file metadata, but is there a more
> > general approach? Does anyone have experience with this, or can you suggest
> > a tool that can accomplish this?
> > >>
> > >> Thx
> > >> Michal
> > >> ___
> > >> ceph-users mailing list -- ceph-users@ceph.io
> > >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-21 Thread Casey Bodley
On Tue, Feb 20, 2024 at 10:58 AM Yuri Weinstein  wrote:
>
> We have restarted QE validation after fixing issues and merging several PRs.
> The new Build 3 (rebase of pacific) tests are summarized in the same
> note (see Build 3 runs) https://tracker.ceph.com/issues/64151#note-1
>
> Seeking approvals:
>
> rados - Radek, Junior, Travis, Ernesto, Adam King
> rgw - Casey

rgw approved

> fs - Venky
> rbd - Ilya
> krbd - Ilya
>
> upgrade/octopus-x (pacific) - Adam King, Casey PTL
>
> upgrade/pacific-p2p - Casey PTL

Yuri and i managed to get a green run here, approved

>
> ceph-volume - Guillaume, fixed by
> https://github.com/ceph/ceph/pull/55658 retesting
>
> On Thu, Feb 8, 2024 at 8:43 AM Casey Bodley  wrote:
> >
> > thanks, i've created https://tracker.ceph.com/issues/64360 to track
> > these backports to pacific/quincy/reef
> >
> > On Thu, Feb 8, 2024 at 7:50 AM Stefan Kooman  wrote:
> > >
> > > Hi,
> > >
> > > Is this PR: https://github.com/ceph/ceph/pull/54918 included as well?
> > >
> > > You definitely want to build the Ubuntu / debian packages with the
> > > proper CMAKE_CXX_FLAGS. The performance impact on RocksDB is _HUGE_.
> > >
> > > Thanks,
> > >
> > > Gr. Stefan
> > >
> > > P.s. Kudos to Mark Nelson for figuring it out / testing.
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> >
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Leadership Team Meeting: 2024-2-21 Minutes

2024-02-21 Thread Casey Bodley
Estimate on release timeline for 17.2.8?
- after pacific 16.2.15 and reef 18.2.2 hotfix
(https://tracker.ceph.com/issues/64339,
https://tracker.ceph.com/issues/64406)

Estimate on release timeline for 19.2.0?
- target April, depending on testing and RCs
- Testing plan for Squid beyond dev freeze (regression and upgrade
tests, performance tests, RCs)

Can we fix old.ceph.com?
- continued discussion about the need to revive the pg calc tool

T release name?
- please add and vote for suggestions in https://pad.ceph.com/p/t
- need name before we can open "t kickoff" pr
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: list topic shows endpoint url and username e password

2024-02-23 Thread Casey Bodley
thanks Giada, i see that you created
https://tracker.ceph.com/issues/64547 for this

unfortunately, this topic metadata doesn't really have a permission
model at all. topics are shared across the entire tenant, and all
users have access to read/overwrite those topics

a lot of work was done for https://tracker.ceph.com/issues/62727 to
add topic ownership and permission policy, and those changes will be
in the squid release

i've cc'ed Yuval and Krunal who worked on that - could these changes
be reasonably backported to quincy and reef?

On Fri, Feb 23, 2024 at 9:59 AM Giada Malatesta
 wrote:
>
> Hello everyone,
>
> we are facing a problem regarding the topic operations to send
> notification, particularly when using amqp protocol.
>
> We are using Ceph version 18.2.1. We have created a topic by giving as
> attributes all needed information and so the push-endpoint (in our case
> a rabbit endpoint that is used to collect notification messages). Then
> we have configured all the buckets in our cluster Ceph so that it is
> possible to send notification when some changes occur.
>
> The problem regards particularly the list_topic operation: we noticed
> that any authenticated user is able to get a full list of the created
> topics and with them to get all the information, including endpoint,
> and so username and password and IP and port, when using the
> boto3.set_stream_logger(), which is not good for our goal since we do
> not want the users to know implementation details.
>
> There is the possibility to solve this problem? Any help would be useful.
>
> Thanks and best regards.
>
> GM.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Hanging request in S3

2024-03-06 Thread Casey Bodley
hey Christian, i'm guessing this relates to
https://tracker.ceph.com/issues/63373 which tracks a deadlock in s3
DeleteObjects requests when multisite is enabled.
rgw_multi_obj_del_max_aio can be set to 1 as a workaround until the
reef backport lands

On Wed, Mar 6, 2024 at 2:41 PM Christian Kugler  wrote:
>
> Hi,
>
> I am having some trouble with some S3 requests and I am at a loss.
>
> After upgrading to reef a couple of weeks ago some requests get stuck and
> never
> return. The two Ceph clusters are set up to sync the S3 realm
> bidirectionally.
> The bucket has 479 shards (dynamic resharding) at the moment.
>
> Putting an object (/etc/services) into the bucket via s3cmd works, and
> deleting
> it works as well. So I know it is not just the entire bucket that is somehow
> faulty.
>
> When I try to delete a specific prefix it the request for listing all
> objects
> never comes back. In the example below I only included the request in
> question
> which I aborted with ^C.
>
> $ s3cmd rm -r
> s3://sql20/pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/ -d
> [...snip...]
> DEBUG: Canonical Request:
> GET
> /sql20/
> prefix=pgbackrest%2Fbackup%2Fadrpb%2F20240130-200410F%2Fpg_data%2Fbase%2F16560%2F
> host:[...snip...]
> x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> x-amz-date:20240306T183435Z
>
> host;x-amz-content-sha256;x-amz-date
> e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> --
> DEBUG: signature-v4 headers: {'x-amz-date': '20240306T183435Z',
> 'Authorization': 'AWS4-HMAC-SHA256
> Credential=VL0FRB7CYGMHBGCD419M/20240306/[...snip...]/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=45b133675535ab611bbf2b9a7a6e40f9f510c0774bf155091dc9a05b76856cb7',
> 'x-amz-content-sha256':
> 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'}
> DEBUG: Processing request, please wait...
> DEBUG: get_hostname(sql20): [...snip...]
> DEBUG: ConnMan.get(): re-using connection: [...snip...]#1
> DEBUG: format_uri():
> /sql20/?prefix=pgbackrest%2Fbackup%2Fadrpb%2F20240130-200410F%2Fpg_data%2Fbase%2F16560%2F
> DEBUG: Sending request method_string='GET',
> uri='/sql20/?prefix=pgbackrest%2Fbackup%2Fadrpb%2F20240130-200410F%2Fpg_data%2Fbase%2F16560%2F',
> headers={'x-amz-date': '20240306T183435Z', 'Authorization':
> 'AWS4-HMAC-SHA256
> Credential=VL0FRB7CYGMHBGCD419M/20240306/[...snip...]/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=45b133675535ab611bbf2b9a7a6e40f9f510c0774bf155091dc9a05b76856cb7',
> 'x-amz-content-sha256':
> 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'},
> body=(0 bytes)
> ^CDEBUG: Response:
> {}
> See ya!
>
> The request did not show up normally in the logs so I set debug_rgw=20 and
> debug_ms=20 via ceph config set.
>
> I tried to isolate the request and looked for its request id:
> 13321243250692796422
> The following is a grep for the request id:
>
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.0s
> s3:list_bucket verifying op params
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.0s
> s3:list_bucket pre-executing
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.0s
> s3:list_bucket check rate limiting
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.0s
> s3:list_bucket executing
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.0s
> s3:list_bucket list_objects_ordered: starting attempt 1
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.0s
> s3:list_bucket cls_bucket_list_ordered: request from each of 479 shard(s)
> for 8 entries to get 1001 total entries
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
> s3:list_bucket cls_bucket_list_ordered: currently processing
> pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101438318.gz
> from shard 437
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
> s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860
> obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101438318.gz
> state=0x55d4237419e8 s->prefetch_data=0
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
> s3:list_bucket cls_bucket_list_ordered: skipping
> pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101438318.gz[]
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
> s3:list_bucket cls_bucket_list_ordered: currently processing
> pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457659_fsm.gz
> from shard 202
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
> s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860
> obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457659_fsm.gz
> state=0x55d4237419e8 s->prefetch_data=0
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
> s3:list_bucket cls_bucket_list_ordered: skippin

[ceph-users] Re: Disable signature url in ceph rgw

2024-03-07 Thread Casey Bodley
anything we can do to narrow down the policy issue here? any of the
Principal, Action, Resource, or Condition matches could be failing
here. you might try replacing each with a wildcard, one at a time,
until you see the policy take effect

On Wed, Dec 13, 2023 at 5:04 AM Marc Singer  wrote:
>
> Hi
>
> As my attachment is very messy, I cleaned it up and provide a much
> simpler version for your tests bellow.
> These policies seem to get ignored when the URL is presigned.
>
> {
> "Version":"2012-10-17",
> "Id":"userbucket%%%policy",
> "Statement":[
>{
>   "Sid":"username%%%read",
>   "Effect":"Allow",
>   "Principal":{
>  "AWS":"arn:aws:iam:::user/username"
>   },
>   "Action":[
>  "s3:ListBucket",
>  "s3:ListBucketVersions",
>  "s3:GetObject",
>  "s3:GetObjectVersion"
>   ],
>   "Resource":[
>  "arn:aws:s3:::userbucket",
>  "arn:aws:s3:::userbucket/*"
>   ],
>   "Condition":{
>  "IpAddress":{
> "aws:SourceIp":[
>"redacted"
> ]
>  }
>   }
>},
>{
>   "Sid":"username%%%write",
>   "Effect":"Allow",
>   "Principal":{
>  "AWS":"arn:aws:iam:::user/username"
>   },
>   "Action":[
>  "s3:PutObject",
>  "s3:DeleteObject",
>  "s3:DeleteObjectVersion",
>  "s3:ListBucketMultipartUploads",
>  "s3:ListMultipartUploadParts",
>  "s3:AbortMultipartUpload"
>   ],
>   "Resource":[
>  "arn:aws:s3:::userbucket",
>  "arn:aws:s3:::userbucket/*"
>   ],
>   "Condition":{
>  "IpAddress":{
> "aws:SourceIp":[
>"redacted"
> ]
>  }
>   }
>},
>{
>   "Sid":"username%%%policy_control",
>   "Effect":"Deny",
>   "Principal":{
>  "AWS":"arn:aws:iam:::user/username"
>   },
>   "Action":[
>  "s3:PutObjectAcl",
>  "s3:GetObjectAcl",
>  "s3:PutBucketAcl",
>  "s3:GetBucketPolicy",
>  "s3:DeleteBucketPolicy",
>  "s3:PutBucketPolicy"
>   ],
>   "Resource":[
>  "arn:aws:s3:::userbucket",
>  "arn:aws:s3:::userbucket/*"
>   ]
>}
> ]
> }
>
> Thanks and yours sincerely
>
> Marc Singer
>
> On 2023-12-12 10:24, Marc Singer wrote:
> > Hi
> >
> > First, all requests with presigned URLs should be restricted.
> >
> > This is how the request is blocked with the nginx sidecar (it's just a
> > simple parameter in the URL that is forbidden):
> >
> > if ($arg_Signature) { return 403 'Signature parameter forbidden';
> >
> > }
> >
> > Our bucket policies are created automatically with a custom
> > microservice. You find an example in attachment from a random "managed"
> > bucket. These buckets are affected by the issue.
> >
> > There is a policy that stops users from changing the policy.
> >
> > I might have done a mistake when redacting replacing a user with the
> > same values.
> >
> > Thanks you and have a great day
> >
> > Marc
> >
> > On 12/9/23 00:37, Robin H. Johnson wrote:
> >> On Fri, Dec 08, 2023 at 10:41:59AM +0100,marc@singer.services  wrote:
> >>> Hi Ceph users
> >>>
> >>> We are using Ceph Pacific (16) in this specific deployment.
> >>>
> >>> In our use case we do not want our users to be able to generate
> >>> signature v4 URLs because they bypass the policies that we set on
> >>> buckets (e.g IP restrictions).
> >>> Currently we have a sidecar reverse proxy running that filters
> >>> requests with signature URL specific request parameters.
> >>> This is obviously not very efficient and we are looking to replace
> >>> this somehow in the future.
> >>>
> >>> 1. Is there an option in RGW to disable this signed URLs (e.g
> >>> returning status 403)?
> >>> 2. If not is this planned or would it make sense to add it as a
> >>> configuration option?
> >>> 3. Or is the behaviour of not respecting bucket policies in RGW with
> >>> signature v4 URLs a bug and they should be actually applied?
> >> Trying to clarify your ask:
> >> - you want ALL requests, including presigned URLs, to be subject to
> >> the
> >>IP restrictions encoded in your bucket policy?
> >>e.g. auth (signature AND IP-list)
> >>
> >> That should be possible with bucket policy.
> >>
> >> Can you post the current bucket policy that you have? (redact with
> >> distinct values the IPs, userids, bucket name, any paths, but
> >> otherwise
> >> keep it complete).
> >>
> >> You cannot fundamentally stop anybody from generating presigned URLs,
> >> because that's purely a client-side operation. Generating presigned
> >> URLs
> >> requires an access key and secret key, a

[ceph-users] v17.2.7 Quincy now supports Ubuntu 22.04 (Jammy Jellyfish)

2024-03-29 Thread Casey Bodley
Ubuntu 22.04 packages are now available for the 17.2.7 Quincy release.

The upcoming Squid release will not support Ubuntu 20.04 (Focal
Fossa). Ubuntu users planning to upgrade from Quincy to Squid will
first need to perform a distro upgrade to 22.04.

Getting Ceph

* Git at git://github.com/ceph/ceph.git
* Tarball at https://download.ceph.com/tarballs/ceph-17.2.7.tar.gz
* Containers at https://quay.io/repository/ceph/ceph
* For packages, see https://docs.ceph.com/en/latest/install/get-packages/
* Release git sha1: b12291d110049b2f35e32e0de30d70e9a4c060d2
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgraded to Quincy 17.2.7: some S3 buckets inaccessible

2024-04-03 Thread Casey Bodley
On Wed, Apr 3, 2024 at 11:58 AM Lorenz Bausch  wrote:
>
> Hi everybody,
>
> we upgraded our containerized Red Hat Pacific cluster to the latest
> Quincy release (Community Edition).

i'm afraid this is not an upgrade path that we try to test or support.
Red Hat makes its own decisions about what to backport into its
releases. my understanding is that Red Hat's pacific-based 5.3 release
includes all of the rgw multisite resharding changes which were not
introduced upstream until the Reef release. this includes changes to
data formats that an upstream Quincy release would not understand. in
this case, you might have more luck upgrading to Reef?

> The upgrade itself went fine, the cluster is HEALTH_OK, all daemons run
> the upgraded version:
>
>  %< 
> $ ceph -s
>cluster:
>  id: 68675a58-cf09-4ebd-949c-b9fcc4f2264e
>  health: HEALTH_OK
>
>services:
>  mon: 5 daemons, quorum node02,node03,node04,node05,node01 (age 25h)
>  mgr: node03.ztlair(active, since 25h), standbys: node01.koymku,
> node04.uvxgvp, node02.znqnhg, node05.iifmpc
>  osd: 408 osds: 408 up (since 22h), 408 in (since 7d)
>  rgw: 19 daemons active (19 hosts, 1 zones)
>
>data:
>  pools:   11 pools, 8481 pgs
>  objects: 236.99M objects, 544 TiB
>  usage:   1.6 PiB used, 838 TiB / 2.4 PiB avail
>  pgs: 8385 active+clean
>   79   active+clean+scrubbing+deep
>   17   active+clean+scrubbing
>
>io:
>  client:   42 MiB/s rd, 439 MiB/s wr, 2.15k op/s rd, 1.64k op/s wr
>
> ---
>
> $ ceph versions | jq .overall
> {
>"ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
> (stable)": 437
> }
>  >% 
>
> After all the daemons were upgraded we started noticing some RGW buckets
> which are inaccessible.
> s3cmd failed with NoSuchKey:
>
>  %< 
> $ s3cmd la -l
> ERROR: S3 error: 404 (NoSuchKey)
>  >% 
>
> The buckets still exists according to "radosgw-admin bucket list".
> Out of the ~600 buckets, 13 buckets are unaccessible at the moment:
>
>  %< 
> $ radosgw-admin bucket radoslist --tenant xy --uid xy --bucket xy
> 2024-04-03T12:13:40.607+0200 7f0dbf4c4680  0 int
> RGWRados::cls_bucket_list_ordered(const DoutPrefixProvider*,
> RGWBucketInfo&, int, const rgw_obj_index_key&, const string&, const
> string&, uint32_t, bool, uint16_t, RGWRados::ent_map_t&, bool*, bool*,
> rgw_obj_index_key*, optional_yield, RGWBucketListNameFilter):
> CLSRGWIssueBucketList for
> xy:xy[6955f50e-5b23-4534-9b77-c7078f60f0d0.171713434.3]) failed
> 2024-04-03T12:13:40.609+0200 7f0dbf4c4680  0 int
> RGWRados::cls_bucket_list_ordered(const DoutPrefixProvider*,
> RGWBucketInfo&, int, const rgw_obj_index_key&, const string&, const
> string&, uint32_t, bool, uint16_t, RGWRados::ent_map_t&, bool*, bool*,
> rgw_obj_index_key*, optional_yield, RGWBucketListNameFilter):
> CLSRGWIssueBucketList for
> xy:xy[6955f50e-5b23-4534-9b77-c7078f60f0d0.171713434.3]) failed
>  >% 
>
> The affected buckets are comparatively large, around 4 - 7 TB,
> but not all buckets of that size are affected.
>
> Using "rados -p rgw.buckets.data ls" it seems like all the objects are
> still there,
> although "rados -p rgw.buckets.data get objectname -" only prints
> unusable (?) binary data,
> even for objects of intact buckets.
>
> Overall we're facing around 60 TB of customer data which are just gone
> at the moment.
> Is there a way to recover from this situation or further narrowing down
> the root cause of the problem?
>
> Kind regards,
> Lorenz
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgraded to Quincy 17.2.7: some S3 buckets inaccessible

2024-04-03 Thread Casey Bodley
to expand on this diagnosis: with multisite resharding, we changed how
buckets name/locate their bucket index shard objects. any buckets that
were resharded under this Red Hat pacific release would be using the
new object names. after upgrading to the Quincy release, rgw would
look at the wrong object names when trying to list those buckets. 404
NoSuchKey is the response i would expect in that case

On Wed, Apr 3, 2024 at 12:20 PM Casey Bodley  wrote:
>
> On Wed, Apr 3, 2024 at 11:58 AM Lorenz Bausch  wrote:
> >
> > Hi everybody,
> >
> > we upgraded our containerized Red Hat Pacific cluster to the latest
> > Quincy release (Community Edition).
>
> i'm afraid this is not an upgrade path that we try to test or support.
> Red Hat makes its own decisions about what to backport into its
> releases. my understanding is that Red Hat's pacific-based 5.3 release
> includes all of the rgw multisite resharding changes which were not
> introduced upstream until the Reef release. this includes changes to
> data formats that an upstream Quincy release would not understand. in
> this case, you might have more luck upgrading to Reef?
>
> > The upgrade itself went fine, the cluster is HEALTH_OK, all daemons run
> > the upgraded version:
> >
> >  %< 
> > $ ceph -s
> >cluster:
> >  id: 68675a58-cf09-4ebd-949c-b9fcc4f2264e
> >  health: HEALTH_OK
> >
> >services:
> >  mon: 5 daemons, quorum node02,node03,node04,node05,node01 (age 25h)
> >  mgr: node03.ztlair(active, since 25h), standbys: node01.koymku,
> > node04.uvxgvp, node02.znqnhg, node05.iifmpc
> >  osd: 408 osds: 408 up (since 22h), 408 in (since 7d)
> >  rgw: 19 daemons active (19 hosts, 1 zones)
> >
> >data:
> >  pools:   11 pools, 8481 pgs
> >  objects: 236.99M objects, 544 TiB
> >  usage:   1.6 PiB used, 838 TiB / 2.4 PiB avail
> >  pgs: 8385 active+clean
> >   79   active+clean+scrubbing+deep
> >   17   active+clean+scrubbing
> >
> >io:
> >  client:   42 MiB/s rd, 439 MiB/s wr, 2.15k op/s rd, 1.64k op/s wr
> >
> > ---
> >
> > $ ceph versions | jq .overall
> > {
> >"ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
> > (stable)": 437
> > }
> >  >% 
> >
> > After all the daemons were upgraded we started noticing some RGW buckets
> > which are inaccessible.
> > s3cmd failed with NoSuchKey:
> >
> >  %< 
> > $ s3cmd la -l
> > ERROR: S3 error: 404 (NoSuchKey)
> >  >% 
> >
> > The buckets still exists according to "radosgw-admin bucket list".
> > Out of the ~600 buckets, 13 buckets are unaccessible at the moment:
> >
> >  %< 
> > $ radosgw-admin bucket radoslist --tenant xy --uid xy --bucket xy
> > 2024-04-03T12:13:40.607+0200 7f0dbf4c4680  0 int
> > RGWRados::cls_bucket_list_ordered(const DoutPrefixProvider*,
> > RGWBucketInfo&, int, const rgw_obj_index_key&, const string&, const
> > string&, uint32_t, bool, uint16_t, RGWRados::ent_map_t&, bool*, bool*,
> > rgw_obj_index_key*, optional_yield, RGWBucketListNameFilter):
> > CLSRGWIssueBucketList for
> > xy:xy[6955f50e-5b23-4534-9b77-c7078f60f0d0.171713434.3]) failed
> > 2024-04-03T12:13:40.609+0200 7f0dbf4c4680  0 int
> > RGWRados::cls_bucket_list_ordered(const DoutPrefixProvider*,
> > RGWBucketInfo&, int, const rgw_obj_index_key&, const string&, const
> > string&, uint32_t, bool, uint16_t, RGWRados::ent_map_t&, bool*, bool*,
> > rgw_obj_index_key*, optional_yield, RGWBucketListNameFilter):
> > CLSRGWIssueBucketList for
> > xy:xy[6955f50e-5b23-4534-9b77-c7078f60f0d0.171713434.3]) failed
> >  >% 
> >
> > The affected buckets are comparatively large, around 4 - 7 TB,
> > but not all buckets of that size are affected.
> >
> > Using "rados -p rgw.buckets.data ls" it seems like all the objects are
> > still there,
> > although "rados -p rgw.buckets.data get objectname -" only prints
> > unusable (?) binary data,
> > even for objects of intact buckets.
> >
> > Overall we're facing around 60 TB of customer data which are just gone
> > at the moment.
> > Is there a way to recover from this situation or further narrowing down
> > the root cause of the problem?
> >
> > Kind regards,
> > Lorenz
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


  1   2   >