[ceph-users] Re: RGW Bucket Notifications and MultiPart Uploads

2022-07-18 Thread Yuval Lifshitz
Hi Mark,
It is in quincy but wasn't backported to pacific yet.
I can do this backport, but I'm not sure when is the next pacific release.

Yuval

On Tue, Jul 19, 2022 at 5:04 AM Mark Selby  wrote:

> I am trying to use RGW Bucket Notifications to trigger events on object
> creation and have into a bit of an issue when multipart uploads come into
> play for large objects.
>
>
>
> With a small object only a single notification is generated ->
> ObjectCreated:Put
>
>
>
> When a multipart upload is performed a string of Notifications are sent:
>
> ObjectCreated:Post
>
> ObjectCreated:Put
>
> ObjectCreated:Put
>
> ObjectCreated:Put
>
> …
>
> ObjectCreated:CompleteMultipartUpload
>
>
>
> I can ignore the Post, but all of the Put notifications look the same as a
> single part upload message but the object will not actually be created
> until the CompleteMultipartUpload notification happens.
>
>
>
> There is this is https://tracker.ceph.com/issues/51520 that seems to fix
> this issue – I can not tell if this was actually backported or not. Does
> anyone know if this actually was backported?
>
>
>
> Thanks!
>
>
>
> --
>
> Mark Selby
>
> Sr Linux Administrator, The Voleon Group
>
> mse...@voleon.com
>
>
>
>  This email is subject to important conditions and disclosures that are
> listed on this web page: https://voleon.com/disclaimer/.
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW Bucket Notifications and MultiPart Uploads

2022-07-18 Thread Mark Selby
I am trying to use RGW Bucket Notifications to trigger events on object 
creation and have into a bit of an issue when multipart uploads come into play 
for large objects.

 

With a small object only a single notification is generated -> ObjectCreated:Put

 

When a multipart upload is performed a string of Notifications are sent:

ObjectCreated:Post

ObjectCreated:Put

ObjectCreated:Put

ObjectCreated:Put

…

ObjectCreated:CompleteMultipartUpload

 

I can ignore the Post, but all of the Put notifications look the same as a 
single part upload message but the object will not actually be created until 
the CompleteMultipartUpload notification happens.

 

There is this is https://tracker.ceph.com/issues/51520 that seems to fix this 
issue – I can not tell if this was actually backported or not. Does anyone know 
if this actually was backported?

 

Thanks!

 

-- 

Mark Selby

Sr Linux Administrator, The Voleon Group

mse...@voleon.com 

 

 This email is subject to important conditions and disclosures that are listed 
on this web page: https://voleon.com/disclaimer/.

 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph User + Dev Monthly July Meetup

2022-07-18 Thread Neha Ojha
Hi everyone,

This month's Ceph User + Dev Monthly meetup is on July 21, 14:00-15:00
UTC. Please add topics to the agenda:
https://pad.ceph.com/p/ceph-user-dev-monthly-minutes.

Hope to see you there!

Thanks,
Neha

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mgr service restarted by package install?

2022-07-18 Thread Matthias Ferdinand
On Mon, Jul 18, 2022 at 09:27:37AM +0200, Dan van der Ster wrote:
> Hi,
> 
> It probably wasn't restarted by the package, but the mgr itself
> respawned because the set of enabled modules changed.
> E.g. this happens when upgrading from octopus to pacific, just after
> the pacific mons get a quorum:

at that point the running mons were still on octopus.

I restored the test cluster to the previous state, ran the updates
again:

root@mceph05:~# grep ceph-mgr:  /var/log/dpkg.log
2022-07-18 12:53:48 upgrade ceph-mgr:amd64 15.2.16-0ubuntu0.20.04.1 
17.2.1-1focal
2022-07-18 12:53:48 status half-configured ceph-mgr:amd64 
15.2.16-0ubuntu0.20.04.1
2022-07-18 12:53:48 status unpacked ceph-mgr:amd64 15.2.16-0ubuntu0.20.04.1
2022-07-18 12:53:48 status half-installed ceph-mgr:amd64 
15.2.16-0ubuntu0.20.04.1
2022-07-18 12:53:49 status unpacked ceph-mgr:amd64 17.2.1-1focal
2022-07-18 12:54:41 configure ceph-mgr:amd64 17.2.1-1focal 
2022-07-18 12:54:41 status unpacked ceph-mgr:amd64 17.2.1-1focal
2022-07-18 12:54:41 status half-configured ceph-mgr:amd64 17.2.1-1focal
2022-07-18 12:54:42 status installed ceph-mgr:amd64 17.2.1-1focal


at 12:54:42, something restarted the ceph-mgr systemd unit:

root@mceph05:~# journalctl -xe -u ceph-mgr@mceph05 | sed -ne  '/^Jul 18 
12:5/,$p' | head -n 20
Jul 18 12:54:42 mceph05 systemd[1]: Stopping Ceph cluster manager daemon...
-- Subject: A stop job for unit ceph-mgr@mceph05.service has begun execution
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- A stop job for unit ceph-mgr@mceph05.service has begun execution.
-- 
-- The job identifier is 1758.
Jul 18 12:54:42 mceph05 systemd[1]: ceph-mgr@mceph05.service: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- The unit ceph-mgr@mceph05.service has successfully entered the 'dead' 
state.
Jul 18 12:54:42 mceph05 systemd[1]: Stopped Ceph cluster manager daemon.
-- Subject: A stop job for unit ceph-mgr@mceph05.service has finished
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- A stop job for unit ceph-mgr@mceph05.service has finished.


Looks more like a service restart at install time than a ceph-mgr self-respawn.


Regards
Matthias


> 
> 2022-07-13T11:43:41.308+0200 7f71c0c86700  1 mgr handle_mgr_map
> respawning because set of enabled modules changed!
> 
> Cheers, Dan
> 
> 
> On Sat, Jul 16, 2022 at 4:34 PM Matthias Ferdinand  
> wrote:
> >
> > Hi,
> >
> > while updating a test cluster (Ubuntu 20.04) from octopus (ubuntu repos)
> > to quincy (ceph repos), I noticed that mgr service gets restarted during
> > package install.
> >
> > Right after package install (no manual restarts yet) on 3 combined
> > mon/mgr hosts:
> >
> > # ceph versions
> > {
> > "mon": {
> > "ceph version 15.2.16 
> > (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable)": 3
> > },
> > "mgr": {
> > "ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) 
> > quincy (stable)": 3
> > },
> > "osd": {
> > "ceph version 15.2.16 
> > (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable)": 8
> > },
> > "mds": {},
> > "overall": {
> > "ceph version 15.2.16 
> > (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable)": 11,
> > "ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) 
> > quincy (stable)": 3
> > }
> > }
> >
> >
> > Not sure how problematic this is, but AFAIK it was claimed that ceph
> > package installs would not restart ceph services by themselves.
> >
> >
> > Regards
> > Matthias Ferdinand
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PGs stuck deep-scrubbing for weeks - 16.2.9

2022-07-18 Thread Wesley Dillingham
Yes these seems consistent with what we are experiencing. We have
definitely toggled the noscrub flags in various scenarios in the recent
past. Thanks for tracking down and fixing.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Fri, Jul 15, 2022 at 10:16 PM David Orman  wrote:

> Apologies, backport link should be:
> https://github.com/ceph/ceph/pull/46845
>
> On Fri, Jul 15, 2022 at 9:14 PM David Orman  wrote:
>
>> I think you may have hit the same bug we encountered. Cory submitted a
>> fix, see if it fits what you've encountered:
>>
>> https://github.com/ceph/ceph/pull/46727 (backport to Pacific here:
>> https://github.com/ceph/ceph/pull/46877 )
>> https://tracker.ceph.com/issues/54172
>>
>> On Fri, Jul 15, 2022 at 8:52 AM Wesley Dillingham 
>> wrote:
>>
>>> We have two clusters one 14.2.22 -> 16.2.7 -> 16.2.9
>>>
>>> Another 16.2.7 -> 16.2.9
>>>
>>> Both with a multi disk (spinner block / ssd block.db) and both CephFS
>>> around 600 OSDs each with combo of rep-3 and 8+3 EC data pools. Examples
>>> of
>>> stuck scrubbing PGs from all of the pools.
>>>
>>> They have generally been behind on scrubbing which we attributed to
>>> simply
>>> being large disks (10TB) with a heavy write load and the OSDs just having
>>> trouble keeping up. On closer inspection it appears we have many PGs that
>>> have been lodged in a deep scrubbing state on one cluster for 2 weeks and
>>> another for 7 weeks. Wondering if others have been experiencing anything
>>> similar. The only example of PGs being stuck scrubbing I have seen in the
>>> past has been related to snaptrim PG state but we arent doing anything
>>> with
>>> snapshots in these new clusters.
>>>
>>> Granted my cluster has been warning me with "pgs not deep-scrubbed in
>>> time"
>>> and its on me for not looking more closely into why. Perhaps a separate
>>> warning of "PG Stuck Scrubbing for greater than 24 hours" or similar
>>> might
>>> be helpful to an operator.
>>>
>>> In any case I was able to get scrubs proceeding again by restarting the
>>> primary OSD daemon in the PGs which were stuck. Will monitor closely for
>>> additional stuck scrubs.
>>>
>>>
>>> Respectfully,
>>>
>>> *Wes Dillingham*
>>> w...@wesdillingham.com
>>> LinkedIn 
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Haproxy error for rgw service

2022-07-18 Thread Robert Reihs
Hi everyone,
I have a problem with the haproxy settings for the rgw service. I specified
the service in the service specification:
---
service_type: rgw
service_id: rgw
placement:
  count: 3
  label: "rgw"
---
service_type: ingress
service_id: rgw.rgw
placement:
  count: 3
  label: "ingress"
spec:
  backend_service: rgw.rgw
  virtual_ip: :::404::dd:ff:10/64
  virtual_interface_networks: :::404/64
  frontend_port: 8998
  monitor_port: 8999

The keepalived services are all started, the haproxy, only one is started,
the other two are in error state:
systemd[1]: Starting Ceph haproxy.rgw.rgw.fsn1-ceph-01.ulnhyo for
40ddf3a6-36f1-42d2-9bf7-2fd50045e5dc...
podman[3616202]: 2022-07-18 13:03:25.738014313 + UTC m=+0.052607969
container create
25f90c4e26ebf6fc44efe12eae2c6b9d54811bfde744a78f756469e32c3f461f (image=
docker.io/library/haproxy:2.3, name=ceph-40ddf3>
podman[3616202]: 2022-07-18 13:03:25.787788203 + UTC m=+0.102381880
container init
25f90c4e26ebf6fc44efe12eae2c6b9d54811bfde744a78f756469e32c3f461f (image=
docker.io/library/haproxy:2.3, name=ceph-40ddf3a6>
podman[3616202]: 2022-07-18 13:03:25.790577637 + UTC m=+0.105171323
container start
25f90c4e26ebf6fc44efe12eae2c6b9d54811bfde744a78f756469e32c3f461f (image=
docker.io/library/haproxy:2.3, name=ceph-40ddf3a>
bash[3616202]:
25f90c4e26ebf6fc44efe12eae2c6b9d54811bfde744a78f756469e32c3f461f
conmon[3616235]: [NOTICE] 198/130325 (2) : haproxy version is 2.3.20-2c8082e
conmon[3616235]: [NOTICE] 198/130325 (2) : path to executable is
/usr/local/sbin/haproxy
conmon[3616235]: [ALERT] 198/130325 (2) : Starting frontend stats: cannot
bind socket (Cannot assign requested address)
[:::404::dd:ff:10:8999]
conmon[3616235]: [ALERT] 198/130325 (2) : Starting frontend frontend:
cannot bind socket (Cannot assign requested address)
[:::404::dd:ff:10:8998]
conmon[3616235]: [ALERT] 198/130325 (2) : [haproxy.main()] Some protocols
failed to start their listeners! Exiting.

I can access the IP in the browser and get the XML S3 response.
ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy
(stable) installed with cephadm.

Any idea where the problem could be?
Thanks
Robert Reihs
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw API issues

2022-07-18 Thread Casey Bodley
there's a shell script in
https://github.com/ceph/ceph/blob/main/examples/rgw_admin_curl.sh.
there are also some client libraries listed in
https://docs.ceph.com/en/pacific/radosgw/adminops/#binding-libraries

On Mon, Jul 18, 2022 at 7:06 AM Marcus Müller  wrote:
>
> Thank you! We are running Pacific, that was my issue here.
>
> Can someone share a example of a full API request and answer with curl? I’m 
> still having issues, now getting 401 or 403 answers (but providing Auth-User 
> and Auth-Key).
>
> Regards
> Marcus
>
>
>
> Am 15.07.2022 um 15:23 schrieb Casey Bodley :
>
> are you running quincy? it looks like this '/admin/info' API was new
> to that release
>
> https://docs.ceph.com/en/quincy/radosgw/adminops/#info
>
> On Fri, Jul 15, 2022 at 7:04 AM Marcus Müller  
> wrote:
>
>
> Hi all,
>
> I’ve created a test user on our radosgw to work with the API. I’ve done the 
> following:
>
> ~#radosgw-admin user create --uid=testuser--display-name=„testuser"
>
> ~#radosgw-admin caps add --uid=testuser --caps={caps}
>"caps": [
>{
>"type": "amz-cache",
>"perm": "*"
>},
>{
>"type": "bilog",
>"perm": "*"
>},
>{
>"type": "buckets",
>"perm": "*"
>},
>{
>"type": "datalog",
>"perm": "*"
>},
>{
>"type": "mdlog",
>"perm": "*"
>},
>{
>"type": "metadata",
>"perm": "*"
>},
>{
>"type": "oidc-provider",
>"perm": "*"
>},
>{
>"type": "roles",
>"perm": "*"
>},
>{
>"type": "usage",
>"perm": "*"
>},
>{
>"type": "user-policy",
>"perm": "*"
>},
>{
>"type": "users",
>"perm": "*"
>},
>{
>"type": "zone",
>"perm": "*"
>}
>],
>
>
> But for my GET request (with Authorization Header) I only get a "405 - Method 
> not Allowed" answer. This is my request url: 
> https://s3.example.de/admin/info?format=json 
> 
>
> Where is the issue here?
>
>
> Regards,
> Marcus
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw API issues

2022-07-18 Thread Marcus Müller
Thank you! We are running Pacific, that was my issue here.

Can someone share a example of a full API request and answer with curl? I’m 
still having issues, now getting 401 or 403 answers (but providing Auth-User 
and Auth-Key).

Regards
Marcus



> Am 15.07.2022 um 15:23 schrieb Casey Bodley :
> 
> are you running quincy? it looks like this '/admin/info' API was new
> to that release
> 
> https://docs.ceph.com/en/quincy/radosgw/adminops/#info
> 
> On Fri, Jul 15, 2022 at 7:04 AM Marcus Müller  
> wrote:
>> 
>> Hi all,
>> 
>> I’ve created a test user on our radosgw to work with the API. I’ve done the 
>> following:
>> 
>> ~#radosgw-admin user create --uid=testuser--display-name=„testuser"
>> 
>> ~#radosgw-admin caps add --uid=testuser --caps={caps}
>>"caps": [
>>{
>>"type": "amz-cache",
>>"perm": "*"
>>},
>>{
>>"type": "bilog",
>>"perm": "*"
>>},
>>{
>>"type": "buckets",
>>"perm": "*"
>>},
>>{
>>"type": "datalog",
>>"perm": "*"
>>},
>>{
>>"type": "mdlog",
>>"perm": "*"
>>},
>>{
>>"type": "metadata",
>>"perm": "*"
>>},
>>{
>>"type": "oidc-provider",
>>"perm": "*"
>>},
>>{
>>"type": "roles",
>>"perm": "*"
>>},
>>{
>>"type": "usage",
>>"perm": "*"
>>},
>>{
>>"type": "user-policy",
>>"perm": "*"
>>},
>>{
>>"type": "users",
>>"perm": "*"
>>},
>>{
>>"type": "zone",
>>"perm": "*"
>>}
>>],
>> 
>> 
>> But for my GET request (with Authorization Header) I only get a "405 - 
>> Method not Allowed" answer. This is my request url: 
>> https://s3.example.de/admin/info?format=json 
>> 
>> 
>> Where is the issue here?
>> 
>> 
>> Regards,
>> Marcus
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph on FreeBSD

2022-07-18 Thread Ilya Dryomov
On Fri, Jul 15, 2022 at 8:49 AM Olivier Nicole  wrote:
>
> Hi,
>
> I would like to try Ceph on FreeBSD (because I mostly use FreeBSD) but
> before I invest too much time in it, it seems that the current version
> of Ceph for FreeBSD is quite old. Is it still being taken care of or
> not?

Adding Willem who was driving the FreeBSD port.

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW error Coundn't init storage provider (RADOS)

2022-07-18 Thread Janne Johansson
No, rgw should have the ability to create its own pools. Check the caps on
tve keys used by the rgw daemon.

Den mån 18 juli 2022 09:59Robert Reihs  skrev:

> Hi,
> I had to manually create the pools, than the service automatically started
> and is now available.
> pools:
> .rgw.root
> default.rgw.log
> default.rgw.control
> default.rgw.meta
> default.rgw.buckets.index
> default.rgw.buckets.data
> default.rgw.buckets.non-ec
>
> Is this normal behavior? Should then the error message be changed? Or is
> this a bug?
> Best
> Robert Reihs
>
>
> On Fri, Jul 15, 2022 at 3:47 PM Robert Reihs 
> wrote:
>
> > Hi,
> > When I have no luck yet solving the issue, but I can add some
> > more information. The system pools ".rgw.root" and "default.rgw.log" are
> > not created. I have created them manually, Now there is more log
> activity,
> > but still getting the same error message in the log:
> > rgw main: rgw_init_ioctx ERROR: librados::Rados::pool_create returned
> (34)
> > Numerical result out of range (this can be due to a pool or placement
> group
> > misconfiguration, e.g. pg_num < pgp_num or mon_max_pg_per_osd exceeded)
> > I can't find the correct pool to create manually.
> > Thanks for any help
> > Best
> > Robert
> >
> > On Tue, Jul 12, 2022 at 5:22 PM Robert Reihs 
> > wrote:
> >
> >> Hi,
> >>
> >> We have a problem with deloing radosgw vi cephadm. We have a Ceph
> cluster
> >> with 3 nodes deployed via cephadm. Pool creation, cephfs and block
> storage
> >> are working.
> >>
> >> ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy
> >> (stable)
> >>
> >> The service specs is like this for the rgw:
> >>
> >> ---
> >>
> >> service_type: rgw
> >>
> >> service_id: rgw
> >>
> >> placement:
> >>
> >>   count: 3
> >>
> >>   label: "rgw"
> >>
> >> ---
> >>
> >> service_type: ingress
> >>
> >> service_id: rgw.rgw
> >>
> >> placement:
> >>
> >>   count: 3
> >>
> >>   label: "ingress"
> >>
> >> spec:
> >>
> >>   backend_service: rgw.rgw
> >>
> >>   virtual_ip: [IPV6]
> >>
> >>   virtual_interface_networks: [IPV6 CIDR]
> >>
> >>   frontend_port: 8080
> >>
> >>   monitor_port: 1967
> >>
> >> The error I get in the logfiles:
> >>
> >> 0 deferred set uid:gid to 167:167 (ceph:ceph)
> >>
> >> 0 ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy
> >> (stable), process radosgw, pid 2
> >>
> >> 0 framework: beast
> >>
> >> 0 framework conf key: port, val: 80
> >>
> >> 1 radosgw_Main not setting numa affinity
> >>
> >> 1 rgw_d3n: rgw_d3n_l1_local_datacache_enabled=0
> >>
> >> 1 D3N datacache enabled: 0
> >>
> >> 0 rgw main: rgw_init_ioctx ERROR: librados::Rados::pool_create returned
> >> (34) Numerical result out of range (this can be due to a pool or
> placement
> >> group misconfiguration, e.g. pg_num < pgp_num or mon_max_pg_per_osd
> >> exceeded)
> >>
> >> 0 rgw main: failed reading realm info: ret -34 (34) Numerical result out
> >> of range
> >>
> >> 0 rgw main: ERROR: failed to start notify service ((34) Numerical result
> >> out of range
> >>
> >> 0 rgw main: ERROR: failed to init services (ret=(34) Numerical result
> out
> >> of range)
> >>
> >> -1 Couldn't init storage provider (RADOS)
> >>
> >> I have for testing set the pg_num and pgp_num to 16 and the
> >> mon_max_pg_per_osd to 1000 and still getting the same error. I have also
> >> tried creating the rgw with ceph command, same error. Pool creation is
> >> working, I created multiple other pools and there was no problem.
> >>
> >> Thanks for any help.
> >>
> >> Best
> >>
> >> Robert
> >>
> >> The 5 fails services are 3 from the rgw and 2 haproxy for the rgw, there
> >> is only one running:
> >>
> >> ceph -s
> >>
> >>   cluster:
> >>
> >> id: 40ddf
> >>
> >> health: HEALTH_WARN
> >>
> >> 5 failed cephadm daemon(s)
> >>
> >>
> >>
> >>   services:
> >>
> >> mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03 (age 4d)
> >>
> >> mgr: ceph-01.hbvyqi(active, since 4d), standbys: ceph-02.pqtxbv
> >>
> >> mds: 1/1 daemons up, 3 standby
> >>
> >> osd: 6 osds: 6 up (since 4d), 6 in (since 4d)
> >>
> >>
> >>
> >>   data:
> >>
> >> volumes: 1/1 healthy
> >>
> >> pools:   5 pools, 65 pgs
> >>
> >> objects: 87 objects, 170 MiB
> >>
> >> usage:   1.4 GiB used, 19 TiB / 19 TiB avail
> >>
> >> pgs: 65 active+clean
> >>
> >>
> >
> > --
> > Robert Reihs
> > Jakobsweg 22
> > 8046 Stattegg
> > AUSTRIA
> >
> > mobile: +43 (664) 51 035 90
> > robert.re...@gmail.com
> >
>
>
> --
> Robert Reihs
> Jakobsweg 22
> 8046 Stattegg
> AUSTRIA
>
> mobile: +43 (664) 51 035 90
> robert.re...@gmail.com
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW error Coundn't init storage provider (RADOS)

2022-07-18 Thread Robert Reihs
Hi,
I had to manually create the pools, than the service automatically started
and is now available.
pools:
.rgw.root
default.rgw.log
default.rgw.control
default.rgw.meta
default.rgw.buckets.index
default.rgw.buckets.data
default.rgw.buckets.non-ec

Is this normal behavior? Should then the error message be changed? Or is
this a bug?
Best
Robert Reihs


On Fri, Jul 15, 2022 at 3:47 PM Robert Reihs  wrote:

> Hi,
> When I have no luck yet solving the issue, but I can add some
> more information. The system pools ".rgw.root" and "default.rgw.log" are
> not created. I have created them manually, Now there is more log activity,
> but still getting the same error message in the log:
> rgw main: rgw_init_ioctx ERROR: librados::Rados::pool_create returned (34)
> Numerical result out of range (this can be due to a pool or placement group
> misconfiguration, e.g. pg_num < pgp_num or mon_max_pg_per_osd exceeded)
> I can't find the correct pool to create manually.
> Thanks for any help
> Best
> Robert
>
> On Tue, Jul 12, 2022 at 5:22 PM Robert Reihs 
> wrote:
>
>> Hi,
>>
>> We have a problem with deloing radosgw vi cephadm. We have a Ceph cluster
>> with 3 nodes deployed via cephadm. Pool creation, cephfs and block storage
>> are working.
>>
>> ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy
>> (stable)
>>
>> The service specs is like this for the rgw:
>>
>> ---
>>
>> service_type: rgw
>>
>> service_id: rgw
>>
>> placement:
>>
>>   count: 3
>>
>>   label: "rgw"
>>
>> ---
>>
>> service_type: ingress
>>
>> service_id: rgw.rgw
>>
>> placement:
>>
>>   count: 3
>>
>>   label: "ingress"
>>
>> spec:
>>
>>   backend_service: rgw.rgw
>>
>>   virtual_ip: [IPV6]
>>
>>   virtual_interface_networks: [IPV6 CIDR]
>>
>>   frontend_port: 8080
>>
>>   monitor_port: 1967
>>
>> The error I get in the logfiles:
>>
>> 0 deferred set uid:gid to 167:167 (ceph:ceph)
>>
>> 0 ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy
>> (stable), process radosgw, pid 2
>>
>> 0 framework: beast
>>
>> 0 framework conf key: port, val: 80
>>
>> 1 radosgw_Main not setting numa affinity
>>
>> 1 rgw_d3n: rgw_d3n_l1_local_datacache_enabled=0
>>
>> 1 D3N datacache enabled: 0
>>
>> 0 rgw main: rgw_init_ioctx ERROR: librados::Rados::pool_create returned
>> (34) Numerical result out of range (this can be due to a pool or placement
>> group misconfiguration, e.g. pg_num < pgp_num or mon_max_pg_per_osd
>> exceeded)
>>
>> 0 rgw main: failed reading realm info: ret -34 (34) Numerical result out
>> of range
>>
>> 0 rgw main: ERROR: failed to start notify service ((34) Numerical result
>> out of range
>>
>> 0 rgw main: ERROR: failed to init services (ret=(34) Numerical result out
>> of range)
>>
>> -1 Couldn't init storage provider (RADOS)
>>
>> I have for testing set the pg_num and pgp_num to 16 and the
>> mon_max_pg_per_osd to 1000 and still getting the same error. I have also
>> tried creating the rgw with ceph command, same error. Pool creation is
>> working, I created multiple other pools and there was no problem.
>>
>> Thanks for any help.
>>
>> Best
>>
>> Robert
>>
>> The 5 fails services are 3 from the rgw and 2 haproxy for the rgw, there
>> is only one running:
>>
>> ceph -s
>>
>>   cluster:
>>
>> id: 40ddf
>>
>> health: HEALTH_WARN
>>
>> 5 failed cephadm daemon(s)
>>
>>
>>
>>   services:
>>
>> mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03 (age 4d)
>>
>> mgr: ceph-01.hbvyqi(active, since 4d), standbys: ceph-02.pqtxbv
>>
>> mds: 1/1 daemons up, 3 standby
>>
>> osd: 6 osds: 6 up (since 4d), 6 in (since 4d)
>>
>>
>>
>>   data:
>>
>> volumes: 1/1 healthy
>>
>> pools:   5 pools, 65 pgs
>>
>> objects: 87 objects, 170 MiB
>>
>> usage:   1.4 GiB used, 19 TiB / 19 TiB avail
>>
>> pgs: 65 active+clean
>>
>>
>
> --
> Robert Reihs
> Jakobsweg 22
> 8046 Stattegg
> AUSTRIA
>
> mobile: +43 (664) 51 035 90
> robert.re...@gmail.com
>


-- 
Robert Reihs
Jakobsweg 22
8046 Stattegg
AUSTRIA

mobile: +43 (664) 51 035 90
robert.re...@gmail.com
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] access to a pool hangs, only on one node

2022-07-18 Thread Jarett DeAngelis
hi,

I have a 3-node Proxmox Ceph cluster that's been acting up whenever I try to 
get it to do anything with one of the pools (fastwrx) on the cluster.

`rbd pool stats fastwrx` just hangs on one node, but on the other two, responds 
instantaneously.

`ceph -s` looks like this:

root@ibnmajid:~# ceph -s
  cluster:
id: 310af567-1607-402b-bc5d-c62286a129d5
health: HEALTH_WARN
insufficient standby MDS daemons available
 
  services:
mon: 3 daemons, quorum ibnmajid,ganges,riogrande (age 47h)
mgr: riogrande(active, since 47h)
mds: 2/2 daemons up, 1 hot standby
osd: 18 osds: 18 up (since 47h), 18 in (since 47h)
 
  data:
volumes: 2/2 healthy
pools:   7 pools, 1537 pgs
objects: 793.24k objects, 1.9 TiB
usage:   4.1 TiB used, 10 TiB / 14 TiB avail
pgs: 1537 active+clean
 
  io:
client:   1.5 MiB/s rd, 243 KiB/s wr, 3 op/s rd, 19 op/s wr

I don't really know where to begin here. nothing jumps out at me in syslog. 
it's like the rbd client, not even anything involved in serving data on the 
node, is just somehow broken.

ceph status on that node works fine. it appears to be a problem limited to only 
the one pool.
root@ibnmajid:~# ceph status

cluster: id: 310af567-1607-402b-bc5d-c62286a129d5 health:
HEALTH_WARN insufficient standby MDS daemons available

services:
mon: 3 daemons, quorum ibnmajid,ganges,riogrande (age 2d)
mgr: riogrande(active, since 2d) mds: 2/2 daemons up, 1 hot standby
osd: 18 osds: 18 up (since 2d), 18 in (since 2d)

data:
volumes: 2/2 healthy pools:   7 pools, 1537 pgs objects: 793.28k objects, 1.9 
TiB usage:   4.1 TiB used, 10 TiB / 14 TiB avail
pgs: 1537 active+clean

io:
client:   2.3 MiB/s rd, 137 KiB/s wr, 2 op/s rd, 18 op/s wr
if I try a different pool, that works fine on the same node:
root@ibnmajid:~# rbd pool stats largewrx
Total Images: 0
Total Snapshots: 0
Provisioned Size: 0 B
(those statistics are correct, it's not directly in use but rather in use with 
cephfs)
similarly, the FS pools related to fastwrx don't work on this node either, but 
others do:
root@ibnmajid:~# rbd pool stats fastwrxFS_data
^C
root@ibnmajid:~# rbd pool stats fastwrxFS_metadata
^C
root@ibnmajid:~# rbd pool stats largewrxFS_data
Total Images: 0
Total Snapshots: 0
Provisioned Size: 0 B
root@ibnmajid:~# rbd pool stats largewrxFS_metadata
Total Images: 0
Total Snapshots: 0
Provisioned Size: 0 B
root@ibnmajid:~#
on another node, everything returns results instantly, but fastwrxFS is 
definitely in use so I'm not sure why it says this:
root@ganges:~# rbd pool stats fastwrx
Total Images: 17
Total Snapshots: 0
Provisioned Size: 1.3 TiB
root@ganges:~# rbd pool stats fastwrxFS_data
Total Images: 0
Total Snapshots: 0
Provisioned Size: 0 B
root@ganges:~# rbd pool stats fastwrxFS_metadata
Total Images: 0
Total Snapshots: 0
Provisioned Size: 0 B
here's what happens if I try ceph osd pool stats on a "good" node:
root@ganges:~# ceph osd pool stats
pool fastwrx id 9
  client io 0 B/s rd, 105 KiB/s wr, 0 op/s rd, 14 op/s wr

pool largewrx id 10
  nothing is going on

pool fastwrxFS_data id 17
  nothing is going on

pool fastwrxFS_metadata id 18
  client io 852 B/s rd, 1 op/s rd, 0 op/s wr

pool largewrxFS_data id 20
  client io 2.9 MiB/s rd, 2 op/s rd, 0 op/s wr

pool largewrxFS_metadata id 21
  nothing is going on

pool .mgr id 22
  nothing is going on
and on the broken node:
root@ibnmajid:~# ceph osd pool stats
pool fastwrx id 9
  client io 0 B/s rd, 93 KiB/s wr, 0 op/s rd, 5 op/s wr

pool largewrx id 10
  nothing is going on

pool fastwrxFS_data id 17
  nothing is going on

pool fastwrxFS_metadata id 18
  client io 852 B/s rd, 1 op/s rd, 0 op/s wr

pool largewrxFS_data id 20
  client io 1.9 MiB/s rd, 0 op/s rd, 0 op/s wr

pool largewrxFS_metadata id 21
  nothing is going on

pool .mgr id 22
  nothing is going on
so whatever interface that uses seems to interact with the pool fine, I guess.

how do I get started fixing this?

thanks!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mgr service restarted by package install?

2022-07-18 Thread Dan van der Ster
Hi,

It probably wasn't restarted by the package, but the mgr itself
respawned because the set of enabled modules changed.
E.g. this happens when upgrading from octopus to pacific, just after
the pacific mons get a quorum:

2022-07-13T11:43:41.308+0200 7f71c0c86700  1 mgr handle_mgr_map
respawning because set of enabled modules changed!

Cheers, Dan


On Sat, Jul 16, 2022 at 4:34 PM Matthias Ferdinand  wrote:
>
> Hi,
>
> while updating a test cluster (Ubuntu 20.04) from octopus (ubuntu repos)
> to quincy (ceph repos), I noticed that mgr service gets restarted during
> package install.
>
> Right after package install (no manual restarts yet) on 3 combined
> mon/mgr hosts:
>
> # ceph versions
> {
> "mon": {
> "ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) 
> octopus (stable)": 3
> },
> "mgr": {
> "ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) 
> quincy (stable)": 3
> },
> "osd": {
> "ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) 
> octopus (stable)": 8
> },
> "mds": {},
> "overall": {
> "ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) 
> octopus (stable)": 11,
> "ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) 
> quincy (stable)": 3
> }
> }
>
>
> Not sure how problematic this is, but AFAIK it was claimed that ceph
> package installs would not restart ceph services by themselves.
>
>
> Regards
> Matthias Ferdinand
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Slow osdmaptool upmap performance

2022-07-18 Thread Dan van der Ster
Hi,

Can you try with the fix for this? https://tracker.ceph.com/issues/54180
(https://github.com/ceph/ceph/pull/44925)

It hasn't been backported to any releases, but we could request that
if it looks useful.

Cheers, Dan


On Mon, Jul 18, 2022 at 12:44 AM stuart.anderson
 wrote:
>
> I am seeing very long run times for osdmaptool --upmap running Ceph 15.2.16 
> and I am wondering how to speed that up?
>
> When run on a large pool (PG=8k and OSD=725) I am seeing the following run 
> times for,
> # osdmaptool om.hdd --upmap upmap.mirror.hdd.ec.8 --upmap-pool 
> fs.data.mirror.hdd.ec --upmap-max 1000 --upmap-deviation 8
>
> upmap-deviation time
> --- 
> 15  6m
> 10  11m
> 9   12m
> 8   76m/19m
> 5   killed after 30 hours
>
> I really want to run with --upmap-deviation 1 to blanance this unbalanced 
> pool that currently ranges from 25-90% utilization on individual OSD, 
> however, it is not clear osdmaptool would ever complete.
>
> By comparison another similarly sized pool (PG=4k OSD=1538) in the same 
> cluster only takes a few seconds to run with upmap-deviation=1.
>
> Any suggestions on how to get osdmaptool --upmap to run faster?
>
> Thanks.
>
> ---
> ander...@ligo.caltech.edu
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io