[ceph-users] [ceph v16.2.10] radosgw crash

2023-08-15 Thread Louis Koo
2023-08-15T18:15:55.356+ 7f7916ef3700 -1 *** Caught signal (Aborted) **
 in thread 7f7916ef3700 thread_name:radosgw

 ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific 
(stable)
 1: /lib64/libpthread.so.0(+0x12ce0) [0x7f79da065ce0]
 2: gsignal()
 3: abort()
 4: /lib64/libstdc++.so.6(+0x9009b) [0x7f79d905809b]
 5: /lib64/libstdc++.so.6(+0x9653c) [0x7f79d905e53c]
 6: /lib64/libstdc++.so.6(+0x96597) [0x7f79d905e597]
 7: /lib64/libstdc++.so.6(+0x9652e) [0x7f79d905e52e]
 8: (spawn::detail::continuation_context::resume()+0x87) [0x7f79e4b70a17]
 9: 
(boost::asio::detail::executor_op >, 
std::shared_lock
 > >, std::tuple
 > > > >, std::allocator, 
boost::asio::detail::scheduler_operation>::do_complete(void*, 
boost::asio::detail::scheduler_operation*, boost::system::error_code const&, 
unsigned long)+0x25a) [0x7f79e4b77b6a]
 10: 
(boost::asio::detail::strand_executor_service::invoker::operator()()+0x8d) [0x7f79e4b7f93d]
 11: 
(boost::asio::detail::executor_op, boost::asio::detail::recycling_allocator, 
boost::asio::detail::scheduler_operation>::do_complete(void*, 
boost::asio::detail::scheduler_operation*, boost::system::error_code const&, 
unsigned long)+0x96) [0x7f79e4b7fca6]
 12: (boost::asio::detail::scheduler::run(boost::system::error_code&)+0x4f2) 
[0x7f79e4b73ad2]
 13: /lib64/libradosgw.so.2(+0x430376) [0x7f79e4b56376]
 14: /lib64/libstdc++.so.6(+0xc2ba3) [0x7f79d908aba3]
 15: /lib64/libpthread.so.0(+0x81ca) [0x7f79da05b1ca]
 16: clone()
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

--- begin dump of recent events ---
 -> 2023-08-15T18:14:25.987+ 7f79186f6700  2 req 11141123620750988595 
0.00106s s3:get_obj init permissions
 -9998> 2023-08-15T18:14:25.987+ 7f79186f6700  2 req 11141123620750988595 
0.00106s s3:get_obj recalculating target
 -9997> 2023-08-15T18:14:25.987+ 7f79186f6700  2 req 11141123620750988595 
0.00106s s3:get_obj reading permissions
 -9996> 2023-08-15T18:14:25.988+ 7f79186f6700  2 req 11141123620750988595 
0.00213s s3:get_obj init op
 -9995> 2023-08-15T18:14:25.988+ 7f79186f6700  2 req 11141123620750988595 
0.00213s s3:get_obj verifying op mask
 -9994> 2023-08-15T18:14:25.988+ 7f79186f6700  2 req 11141123620750988595 
0.00213s s3:get_obj verifying op permissions
 -9993> 2023-08-15T18:14:25.988+ 7f79186f6700  5 req 11141123620750988595 
0.00213s s3:get_obj Searching permissions for 
identity=rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=suliang, 
acct_name=suliang, subuser=, perm_mask=15, is_admin=0) mask=49
 -9992> 2023-08-15T18:14:25.988+ 7f79186f6700  5 req 11141123620750988595 
0.00213s s3:get_obj Searching permissions for uid=suliang
 -9991> 2023-08-15T18:14:25.988+ 7f79186f6700  5 req 11141123620750988595 
0.00213s s3:get_obj Found permission: 15
 -9990> 2023-08-15T18:14:25.988+ 7f79186f6700  5 req 11141123620750988595 
0.00213s s3:get_obj Searching permissions for group=1 mask=49
 -9989> 2023-08-15T18:14:25.988+ 7f79186f6700  5 req 11141123620750988595 
0.00213s s3:get_obj Permissions for group not found
 -9988> 2023-08-15T18:14:25.988+ 7f79186f6700  5 req 11141123620750988595 
0.00213s s3:get_obj Searching permissions for group=2 mask=49
 -9987> 2023-08-15T18:14:25.988+ 7f79186f6700  5 req 11141123620750988595 
0.00213s s3:get_obj Permissions for group not found
 -9986> 2023-08-15T18:14:25.988+ 7f79186f6700  5 req 11141123620750988595 
0.00213s s3:get_obj -- Getting permissions done for 
identity=rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=suliang, 
acct_name=suliang, subuser=, perm_mask=15, is_admin=0), owner=suliang, perm=1
 -9985> 2023-08-15T18:14:25.988+ 7f79186f6700  2 req 11141123620750988595 
0.00213s s3:get_obj verifying op params
 -9984> 2023-08-15T18:14:25.988+ 7f79186f6700  2 req 11141123620750988595 
0.00213s s3:get_obj pre-executing
 -9983> 2023-08-15T18:14:25.988+ 7f79186f6700  2 req 11141123620750988595 
0.00213s s3:get_obj executing
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Tech Talk for August 2023: Making Tehthology Friendly

2023-08-15 Thread Mike Perez
Hi everyone,

Join us tomorrow at 15:00 UTC to hear from our Google Summer of Code and 
Outreachy interns, Devansh Singh and Medhavi Singh, in this next Ceph Tech Talk 
on Making Teuthology Friendly.

https://ceph.io/en/community/tech-talks/

If you want to give atechnicalpresentation for CephTechTalks, please contact me 
directly with a title and description. Thank you!

--
Mike Perez
Community Manager
Ceph Foundation
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CEPHADM_STRAY_DAEMON

2023-08-15 Thread tyler . jurgens
I did find this question:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/XGLIXRMA5YUVG6P2W6WOQVTN4GJMX3GI/#XGLIXRMA5YUVG6P2W6WOQVTN4GJMX3GI

Seems "ceph mgr fail" worked for me in this case.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Multisite s3 website slow period update

2023-08-15 Thread Ondřej Kukla
Hello,

I’ve had a quite unpleasant experience today that I would like to share.

In our setup we use two set’s of RGW one that has only s3 and admin API and a 
second set with s3website and admin API. I was changing the global quota 
setting which means that I’ve then needed to commit the updated period.

The first set of s3 RGWs updated without issue, but the s3website RGWs not. 
They were somehow stuck at the period update took minutes in some cases meaning 
service disruption for MINUTES!

When I’ve looked at the RGW logs I’ve found this line that seems to show the 
issue.

rgw realm reloader: Pausing frontends for realm update...
req 3935991300378050219 61.212696075s s3:get_obj iterate_obj() failed with -104

We are running on ceph version 17.2.6 
(d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)

Did someone had a similar issue? Is there a something we can do about it? Is 
there an option to update the period machine by machine or some other way that 
would let us to update the period and not disrupt the service?

Your help is much appreciated.

Kind regards,

Ondrej
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] User + Dev Monthly Meeting happening next week

2023-08-15 Thread Laura Flores
Hi everyone,

The User + Dev Monthly Meeting is happening next week on* Thursday, August
24th* *@ 2:00 PM UTC *at this link:
https://meet.jit.si/ceph-user-dev-monthly

(Note that the date has been rescheduled from the original date, August
17th.)

Please add any topics you'd like to discuss to the agenda:
https://pad.ceph.com/p/ceph-user-dev-monthly-minutes

Thanks,
Laura Flores

-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm adoption - service reconfiguration changes container image

2023-08-15 Thread Adam King
you could maybe try running "ceph config set global container
quay.io/ceph/ceph:v16.2.9" before running the adoption. It seems it still
thinks it should be deploying mons with the default image (
docker.io/ceph/daemon-base:latest-pacific-devel ) for some reason and maybe
that config option is why.

On Tue, Aug 15, 2023 at 7:29 AM Iain Stott 
wrote:

> Hi Everyone,
>
> We are looking at migrating all our production clusters from ceph-ansible
> to cephadm. We are currently experiencing an issue where when reconfiguring
> a service through ceph orch, it will change the running container image for
> that service which has led to the mgr services running an earlier version
> than the rest of the cluster, this has caused cephadm/ceph orch to not be
> able to manage services in the cluster.
>
> Does anyone have any help with this as I cannot find anything in the docs
> that would correct this.
>
> Cheers
> Iain
>
> [root@de1-ceph-mon-ceph-site-a-1 ~]# cephadm --image
> quay.io/ceph/ceph:v16.2.9 adopt --style legacy --name mon.$(hostname -s)
>
> [root@de1-ceph-mon-ceph-site-a-1 ~]# cat ./mon.yaml
> service_type: mon
> service_name: mon
> placement:
>   host_pattern: '*mon*'
> extra_container_args:
> - -v
> - /etc/ceph/ceph.client.admin.keyring:/etc/ceph/ceph.client.admin.keyring
>
>
> [root@de1-ceph-mon-ceph-site-a-1 ~]# cephadm shell -- ceph config get mgr
> mgr/cephadm/container_image_base
> quay.io/ceph/ceph
>
> [root@de1-ceph-mon-ceph-site-a-1 ~]# podman ps -a | grep mon
> 55fe8bc10476  quay.io/ceph/ceph:v16.2.9
>   -n mgr.de1-ceph-m...  5 minutes ago  Up 5 minutes
> ceph-da9e9837-a3cf-4482-9a13-790a721598cd-mgr-de1-ceph-mon-ceph-site-a-1
>
> [root@de1-ceph-mon-ceph-site-a-1 ~]# cat ./mon.yaml | cephadm --image
> quay.io/ceph/ceph:v16.2.9 shell -- ceph orch apply -i -
> Inferring fsid da9e9837-a3cf-4482-9a13-790a721598cd
> Scheduled mon update...
>
> [root@de1-ceph-mon-ceph-site-a-1 ~]# podman ps -a | grep mon
> ecec1d62c719  docker.io/ceph/daemon-base:latest-pacific-devel
>   -n mon.de1-ceph-m...  25 seconds ago  Up 26 seconds
> ceph-da9e9837-a3cf-4482-9a13-790a721598cd-mon-de1-ceph-mon-ceph-site-a-1
>
> [root@de1-ceph-mon-ceph-site-a-2 ~]# ceph versions
> {
> "mon": {
> "ceph version 16.2.5-387-g7282d81d
> (7282d81d2c500b5b0e929c07971b72444c6ac424) pacific (stable)": 3
> },
> "mgr": {
> "ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830)
> pacific (stable)": 3
> },
> "osd": {
> "ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830)
> pacific (stable)": 8
> },
> "mds": {},
> "rgw": {
> "ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830)
> pacific (stable)": 3
> },
> "overall": {
> "ceph version 16.2.5-387-g7282d81d
> (7282d81d2c500b5b0e929c07971b72444c6ac424) pacific (stable)": 3,
> "ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830)
> pacific (stable)": 14
> }
> }
>
>
> Iain Stott
> OpenStack Engineer
> iain.st...@thehutgroup.com
> [THG Ingenuity Logo]
> [https://i.imgur.com/wbpVRW6.png]<
> https://www.linkedin.com/company/thgplc/?originalSubdomain=uk>  [
> https://i.imgur.com/c3040tr.png] 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph orch upgrade stuck between 16.2.7 and 16.2.13

2023-08-15 Thread Adam King
with the log to cluster level already on debug, if you do a "ceph mgr fail"
what does cephadm log to the cluster before it reports sleeping? It should
at least be doing something if it's responsive at all. Also, in "ceph orch
ps"  and "ceph orch device ls" are the REFRESHED columns reporting that
they've refreshed the info recently (last 10 minutes for daemons, last 30
minutes for devices)?

On Tue, Aug 15, 2023 at 3:46 AM Robert Sander 
wrote:

> Hi,
>
> A healthy 16.2.7 cluster should get an upgrade to 16.2.13.
>
> ceph orch upgrade start --ceph-version 16.2.13
>
> did upgrade MONs, MGRs and 25% of the OSDs and is now stuck.
>
> We tried several "ceph orch upgrade stop" and starts again.
> We "failed" the active MGR but no progress.
> We set the debug logging with "ceph config set mgr
> mgr/cephadm/log_to_cluster_level debug" but it only tells that it starts:
>
> 2023-08-15T09:05:58.548896+0200 mgr.cephmon01 [INF] Upgrade: Started with
> target quay.io/ceph/ceph:v16.2.13
>
> How can we check what is happening (or not happening) here?
> How do we get cephadm to complete the task?
>
> Current status is:
>
> # ceph orch upgrade status
> {
>  "target_image": "quay.io/ceph/ceph:v16.2.13",
>  "in_progress": true,
>  "which": "Upgrading all daemon types on all hosts",
>  "services_complete": [],
>  "progress": "",
>  "message": "",
>  "is_paused": false
> }
>
> # ceph -s
>cluster:
>  id: 3098199a-c7f5-4baf-901c-f178131be6f4
>  health: HEALTH_WARN
>  There are daemons running an older version of ceph
>
>services:
>  mon: 5 daemons, quorum
> cephmon02,cephmon01,cephmon03,cephmon04,cephmon05 (age 4d)
>  mgr: cephmon03(active, since 8d), standbys: cephmon01, cephmon02
>  mds: 2/2 daemons up, 1 standby, 2 hot standby
>  osd: 202 osds: 202 up (since 11d), 202 in (since 13d)
>  rgw: 2 daemons active (2 hosts, 1 zones)
>
>data:
>  volumes: 2/2 healthy
>  pools:   11 pools, 4961 pgs
>  objects: 98.84M objects, 347 TiB
>  usage:   988 TiB used, 1.3 PiB / 2.3 PiB avail
>  pgs: 4942 active+clean
>   19   active+clean+scrubbing+deep
>
>io:
>  client:   89 MiB/s rd, 598 MiB/s wr, 25 op/s rd, 157 op/s wr
>
>progress:
>  Upgrade to quay.io/ceph/ceph:v16.2.13 (0s)
>[]
>
> # ceph versions
> {
>  "mon": {
>  "ceph version 16.2.13 (5378749ba6be3a0868b51803968ee9cde4833a3e)
> pacific (stable)": 5
>  },
>  "mgr": {
>  "ceph version 16.2.13 (5378749ba6be3a0868b51803968ee9cde4833a3e)
> pacific (stable)": 3
>  },
>  "osd": {
>  "ceph version 16.2.13 (5378749ba6be3a0868b51803968ee9cde4833a3e)
> pacific (stable)": 48,
>  "ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503)
> pacific (stable)": 154
>  },
>  "mds": {
>  "ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503)
> pacific (stable)": 5
>  },
>  "rgw": {
>  "ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503)
> pacific (stable)": 2
>  },
>  "overall": {
>  "ceph version 16.2.13 (5378749ba6be3a0868b51803968ee9cde4833a3e)
> pacific (stable)": 56,
>  "ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503)
> pacific (stable)": 161
>  }
> }
>
> Regards
> --
> Robert Sander
> Heinlein Consulting GmbH
> Schwedter Str. 8/9b, 10119 Berlin
>
> https://www.heinlein-support.de
>
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
>
> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> Geschäftsführer: Peer Heinlein - Sitz: Berlin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Announcing go-ceph v0.23.0

2023-08-15 Thread Sven Anderson
We are happy to announce another release of the go-ceph API library. This
is a
regular release following our every-two-months release cadence.

https://github.com/ceph/go-ceph/releases/tag/v0.23.0

The library includes bindings that aim to play a similar role to the
"pybind"
python bindings in the ceph tree but for the Go language. The library also
includes additional APIs that can be used to administer cephfs, rbd, and rgw
subsystems.
There are already a few consumers of this library in the wild, including the
ceph-csi project.

Sven
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] OSD containers lose connectivity after change from Rocky 8.7->9.2

2023-08-15 Thread Dan O'Brien
I recently updated one of the hosts (an older Dell PowerEdge R515) in my Ceph 
Quincy (17.2.6) cluster. I needed to change the IP address, so I removed the 
host from the cluster (gracefully removed OSDs and daemons, then removed the 
host). I also took the opportunity to upgrade the host from Rocky 8.7 to 9.2 
before re-joining it to the cluster with cephadm. I zapped the storage, so for 
all intents and purposes, it should have been a completely clean instance and 
went smoothly. I have 2 other hosts (new Dell PowerEdge R450s) using Rocky 9.2 
with no problems. Before the upgrade, the R515 host was well-behaved and 
unremarkable.

Our cluster is connected to our internal network, and has a 10G private network 
used for interconnect between the nodes.

Since the upgrade, the OSDs on the R515 host regularly, after a period of 
minutes to hours (usually a few hours), drop out of the cluster. I can restart 
the OSDs and they immediately reconnect and join the cluster, which returns to 
a HEALTHY state after a short period. The OSD logs show

Aug 15 06:53:57 ceph99.cecnet.gmu.edu ceph-osd[193725]: log_channel(cluster) 
log [WRN] : Monitor daemon marked osd.9 down, but it is still running
Aug 15 06:53:57 ceph99.cecnet.gmu.edu ceph-osd[193725]: log_channel(cluster) 
log [DBG] : map e17993 wrongly marked me down at e17988
Aug 15 06:53:57 ceph99.cecnet.gmu.edu ceph-osd[193725]: osd.9 17993 
start_waiting_for_healthy
Aug 15 06:53:57 ceph99.cecnet.gmu.edu ceph-osd[193725]: osd.9 pg_epoch: 17988 
pg[16.f( v 16315'2765702 (15197'2757704,16315'2765702] 
local-lis/les=17902/17903 n=188 ec=211/211 lis/c=17902/17902 
les/c/f=17903/17903/0 sis=17988 pruub=8.000660896s) [23,18] r=-1 lpr=1798>
Aug 15 06:53:57 ceph99.cecnet.gmu.edu ceph-osd[193725]: osd.9 17993 is_healthy 
false -- only 0/10 up peers (less than 33%)
Aug 15 06:53:57 ceph99.cecnet.gmu.edu ceph-osd[193725]: osd.9 17993 not 
healthy; waiting to boot

The MON logs show

Aug 15 06:53:53 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: osd.9 reported 
immediately failed by osd.3
Aug 15 06:53:53 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: osd.9 failed 
(root=default,pod=openstack,host=ceph99) (connection refused reported by osd.3)
Aug 15 06:53:53 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: osd.11 reported 
immediately failed by osd.3
Aug 15 06:53:53 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: osd.11 failed 
(root=default,pod=openstack,host=ceph99) (connection refused reported by osd.3)
Aug 15 06:53:53 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: osd.12 reported 
immediately failed by osd.3
Aug 15 06:53:53 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: osd.12 failed 
(root=default,pod=openstack,host=ceph99) (connection refused reported by osd.3)
Aug 15 06:53:53 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: osd.10 reported 
immediately failed by osd.3
Aug 15 06:53:53 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: osd.10 failed 
(root=default,pod=openstack,host=ceph99) (connection refused reported by osd.3)
Aug 15 06:53:53 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: osd.9 reported 
immediately failed by osd.3
Aug 15 06:53:53 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: osd.10 reported 
immediately failed by osd.3
Aug 15 06:53:53 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: osd.11 reported 
immediately failed by osd.3
Aug 15 06:53:53 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: osd.12 reported 
immediately failed by osd.3
Aug 15 06:53:54 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: 
mon.os-storage-1@1(peon).osd e17989 e17989: 26 total, 23 up, 26 in
Aug 15 06:53:54 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: 
mon.os-storage-1@1(peon).osd e17989 _set_new_cache_sizes cache_size:1020054731 
inc_alloc: 339738624 full_alloc: 356515840 kv_alloc: 318767104
Aug 15 06:53:54 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: 15.13 scrub starts
Aug 15 06:53:54 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: Health check 
failed: 4 osds down (OSD_DOWN)
Aug 15 06:53:54 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: Health check 
failed: 2 hosts (4 osds) down (OSD_HOST_DOWN)
Aug 15 06:53:54 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: osdmap e17988: 26 
total, 22 up, 26 in
Aug 15 06:53:54 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: osd.11 marked 
itself dead as of e17988
Aug 15 06:53:54 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: from='mgr.16700599 
10.192.126.85:0/2567473893' entity='mgr.os-storage.cecnet.gmu.edu.mouglb' 
cmd=[{"prefix": "config dump", "format": "json"}]: dispatch
Aug 15 06:53:54 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: pgmap v121406: 374 
pgs: 3 peering, 20 stale+active+clean, 3 active+remapped+backfilling, 348 
active+clean; 2.1 TiB data, 4.1 TiB used, 76 TiB / 80 TiB avail; 102 B/s rd, 
338 KiB/s wr, 6 op/s; 36715/1268774 objects >
Aug 15 06:53:54 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: Health check 
cleared: OSD_HOST_DOWN (was: 2 hosts (4 osds) down)
Aug 15 06:53:54 os-storage-1.cecnet.gmu.edu ceph-mon[4684]: from='mgr.16700599 
10.192.126.85:0/2567473893' 

[ceph-users] Cephadm adoption - service reconfiguration changes container image

2023-08-15 Thread Iain Stott
Hi Everyone,

We are looking at migrating all our production clusters from ceph-ansible to 
cephadm. We are currently experiencing an issue where when reconfiguring a 
service through ceph orch, it will change the running container image for that 
service which has led to the mgr services running an earlier version than the 
rest of the cluster, this has caused cephadm/ceph orch to not be able to manage 
services in the cluster.

Does anyone have any help with this as I cannot find anything in the docs that 
would correct this.

Cheers
Iain

[root@de1-ceph-mon-ceph-site-a-1 ~]# cephadm --image quay.io/ceph/ceph:v16.2.9 
adopt --style legacy --name mon.$(hostname -s)

[root@de1-ceph-mon-ceph-site-a-1 ~]# cat ./mon.yaml
service_type: mon
service_name: mon
placement:
  host_pattern: '*mon*'
extra_container_args:
- -v
- /etc/ceph/ceph.client.admin.keyring:/etc/ceph/ceph.client.admin.keyring


[root@de1-ceph-mon-ceph-site-a-1 ~]# cephadm shell -- ceph config get mgr 
mgr/cephadm/container_image_base
quay.io/ceph/ceph

[root@de1-ceph-mon-ceph-site-a-1 ~]# podman ps -a | grep mon
55fe8bc10476  quay.io/ceph/ceph:v16.2.9 
   -n mgr.de1-ceph-m...  5 minutes ago  Up 5 minutes  
ceph-da9e9837-a3cf-4482-9a13-790a721598cd-mgr-de1-ceph-mon-ceph-site-a-1

[root@de1-ceph-mon-ceph-site-a-1 ~]# cat ./mon.yaml | cephadm --image 
quay.io/ceph/ceph:v16.2.9 shell -- ceph orch apply -i -
Inferring fsid da9e9837-a3cf-4482-9a13-790a721598cd
Scheduled mon update...

[root@de1-ceph-mon-ceph-site-a-1 ~]# podman ps -a | grep mon
ecec1d62c719  docker.io/ceph/daemon-base:latest-pacific-devel   
   -n mon.de1-ceph-m...  25 seconds ago  Up 26 seconds  
ceph-da9e9837-a3cf-4482-9a13-790a721598cd-mon-de1-ceph-mon-ceph-site-a-1

[root@de1-ceph-mon-ceph-site-a-2 ~]# ceph versions
{
"mon": {
"ceph version 16.2.5-387-g7282d81d 
(7282d81d2c500b5b0e929c07971b72444c6ac424) pacific (stable)": 3
},
"mgr": {
"ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific 
(stable)": 3
},
"osd": {
"ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific 
(stable)": 8
},
"mds": {},
"rgw": {
"ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific 
(stable)": 3
},
"overall": {
"ceph version 16.2.5-387-g7282d81d 
(7282d81d2c500b5b0e929c07971b72444c6ac424) pacific (stable)": 3,
"ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific 
(stable)": 14
}
}


Iain Stott
OpenStack Engineer
iain.st...@thehutgroup.com
[THG Ingenuity Logo]
[https://i.imgur.com/wbpVRW6.png]
  [https://i.imgur.com/c3040tr.png] 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: v18.2.0 Reef released

2023-08-15 Thread Chris Palmer

I'd like to try reef, but we are on debian 11 (bullseye).
In the ceph repos, there is debian-quincy/bullseye and 
debian-quincy/focal, but under reef there is only focal & jammy.


Is there a reason why there is no reef/bullseye build? I had thought 
that the blocker only affected debian-bookworm builds.


Thanks, Chris

On 07/08/2023 19:37, Yuri Weinstein wrote:

We're very happy to announce the first stable release of the Reef series.

We express our gratitude to all members of the Ceph community who
contributed by proposing pull requests, testing this release,
providing feedback, and offering valuable suggestions.

Major Changes from Quincy:
- RADOS: RocksDB has been upgraded to version 7.9.2.
- RADOS: There have been significant improvements to RocksDB iteration
overhead and performance.
- RADOS: The perf dump and perf schema commands have been deprecated
in favor of the new counter dump and counter schema commands.
- RADOS: Cache tiering is now deprecated.
- RADOS: A new feature, the "read balancer", is now available, which
allows users to balance primary PGs per pool on their clusters.
- RGW: Bucket resharding is now supported for multi-site configurations.
- RGW: There have been significant improvements to the stability and
consistency of multi-site replication.
- RGW: Compression is now supported for objects uploaded with
Server-Side Encryption.
- Dashboard: There is a new Dashboard page with improved layout.
Active alerts and some important charts are now displayed inside
cards.
- RBD: Support for layered client-side encryption has been added.
- Telemetry: Users can now opt in to participate in a leaderboard in
the telemetry public dashboards.

We encourage you to read the full release notes at
https://ceph.io/en/news/blog/2023/v18-2-0-reef-released/

Getting Ceph

* Git at git://github.com/ceph/ceph.git
* Tarball at https://download.ceph.com/tarballs/ceph-18.2.0.tar.gz
* Containers at https://quay.io/repository/ceph/ceph
* For packages, see https://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: 5dd24139a1eada541a3bc16b6941c5dde975e26d

Did you know? Every Ceph release is built and tested on resources
funded directly by the non-profit Ceph Foundation.
If you would like to support this and our other efforts, please
consider joining now https://ceph.io/en/foundation/.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph orch upgrade stuck between 16.2.7 and 16.2.13

2023-08-15 Thread Robert Sander

On 8/15/23 11:16, Curt wrote:
Probably not the issue, but do all your osd servers have internet 
access?  I've had a similar experience when one of our osd servers 
default gateway got changed, so it was just waiting to download and took 
a bit to timeout.


Yes, all nodes can manually pull the image from quay.io.

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph orch upgrade stuck between 16.2.7 and 16.2.13

2023-08-15 Thread Robert Sander

On 8/15/23 11:02, Eugen Block wrote:


I guess I would start looking on the nodes where it failed to upgrade
OSDs and check out the cephadm.log as well as syslog. Did you see
progress messages in the mgr log for the successfully updated OSDs (or
MON/MGR)?


The issue is that there is no information on which OSD cephadm tries to 
upgrade next. There is no failure reported. It seems to just sit there 
and wait for something.


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph orch upgrade stuck between 16.2.7 and 16.2.13

2023-08-15 Thread Eugen Block

Hi,

literally minutes before your email popped up in my inbox I had  
announced that I would upgrade our cluster from 16.2.10 to 16.2.13  
tomorrow. Now I'm hesitating. ;-)
I guess I would start looking on the nodes where it failed to upgrade  
OSDs and check out the cephadm.log as well as syslog. Did you see  
progress messages in the mgr log for the successfully updated OSDs (or  
MON/MGR)?


Zitat von Robert Sander :


Hi,

A healthy 16.2.7 cluster should get an upgrade to 16.2.13.

ceph orch upgrade start --ceph-version 16.2.13

did upgrade MONs, MGRs and 25% of the OSDs and is now stuck.

We tried several "ceph orch upgrade stop" and starts again.
We "failed" the active MGR but no progress.
We set the debug logging with "ceph config set mgr  
mgr/cephadm/log_to_cluster_level debug" but it only tells that it  
starts:


2023-08-15T09:05:58.548896+0200 mgr.cephmon01 [INF] Upgrade: Started  
with target quay.io/ceph/ceph:v16.2.13


How can we check what is happening (or not happening) here?
How do we get cephadm to complete the task?

Current status is:

# ceph orch upgrade status
{
"target_image": "quay.io/ceph/ceph:v16.2.13",
"in_progress": true,
"which": "Upgrading all daemon types on all hosts",
"services_complete": [],
"progress": "",
"message": "",
"is_paused": false
}

# ceph -s
  cluster:
id: 3098199a-c7f5-4baf-901c-f178131be6f4
health: HEALTH_WARN
There are daemons running an older version of ceph
   services:
mon: 5 daemons, quorum  
cephmon02,cephmon01,cephmon03,cephmon04,cephmon05 (age 4d)

mgr: cephmon03(active, since 8d), standbys: cephmon01, cephmon02
mds: 2/2 daemons up, 1 standby, 2 hot standby
osd: 202 osds: 202 up (since 11d), 202 in (since 13d)
rgw: 2 daemons active (2 hosts, 1 zones)
   data:
volumes: 2/2 healthy
pools:   11 pools, 4961 pgs
objects: 98.84M objects, 347 TiB
usage:   988 TiB used, 1.3 PiB / 2.3 PiB avail
pgs: 4942 active+clean
 19   active+clean+scrubbing+deep
   io:
client:   89 MiB/s rd, 598 MiB/s wr, 25 op/s rd, 157 op/s wr
   progress:
Upgrade to quay.io/ceph/ceph:v16.2.13 (0s)
  []

# ceph versions
{
"mon": {
"ceph version 16.2.13  
(5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)": 5

},
"mgr": {
"ceph version 16.2.13  
(5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)": 3

},
"osd": {
"ceph version 16.2.13  
(5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)": 48,
"ceph version 16.2.7  
(dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)": 154

},
"mds": {
"ceph version 16.2.7  
(dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)": 5

},
"rgw": {
"ceph version 16.2.7  
(dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)": 2

},
"overall": {
"ceph version 16.2.13  
(5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)": 56,
"ceph version 16.2.7  
(dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)": 161

}
}

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph orch upgrade stuck between 16.2.7 and 16.2.13

2023-08-15 Thread Robert Sander

Hi,

A healthy 16.2.7 cluster should get an upgrade to 16.2.13.

ceph orch upgrade start --ceph-version 16.2.13

did upgrade MONs, MGRs and 25% of the OSDs and is now stuck.

We tried several "ceph orch upgrade stop" and starts again.
We "failed" the active MGR but no progress.
We set the debug logging with "ceph config set mgr mgr/cephadm/log_to_cluster_level 
debug" but it only tells that it starts:

2023-08-15T09:05:58.548896+0200 mgr.cephmon01 [INF] Upgrade: Started with 
target quay.io/ceph/ceph:v16.2.13

How can we check what is happening (or not happening) here?
How do we get cephadm to complete the task?

Current status is:

# ceph orch upgrade status
{
"target_image": "quay.io/ceph/ceph:v16.2.13",
"in_progress": true,
"which": "Upgrading all daemon types on all hosts",
"services_complete": [],
"progress": "",
"message": "",
"is_paused": false
}

# ceph -s
  cluster:
id: 3098199a-c7f5-4baf-901c-f178131be6f4
health: HEALTH_WARN
There are daemons running an older version of ceph
 
  services:

mon: 5 daemons, quorum cephmon02,cephmon01,cephmon03,cephmon04,cephmon05 
(age 4d)
mgr: cephmon03(active, since 8d), standbys: cephmon01, cephmon02
mds: 2/2 daemons up, 1 standby, 2 hot standby
osd: 202 osds: 202 up (since 11d), 202 in (since 13d)
rgw: 2 daemons active (2 hosts, 1 zones)
 
  data:

volumes: 2/2 healthy
pools:   11 pools, 4961 pgs
objects: 98.84M objects, 347 TiB
usage:   988 TiB used, 1.3 PiB / 2.3 PiB avail
pgs: 4942 active+clean
 19   active+clean+scrubbing+deep
 
  io:

client:   89 MiB/s rd, 598 MiB/s wr, 25 op/s rd, 157 op/s wr
 
  progress:

Upgrade to quay.io/ceph/ceph:v16.2.13 (0s)
  []

# ceph versions
{
"mon": {
"ceph version 16.2.13 (5378749ba6be3a0868b51803968ee9cde4833a3e) pacific 
(stable)": 5
},
"mgr": {
"ceph version 16.2.13 (5378749ba6be3a0868b51803968ee9cde4833a3e) pacific 
(stable)": 3
},
"osd": {
"ceph version 16.2.13 (5378749ba6be3a0868b51803968ee9cde4833a3e) pacific 
(stable)": 48,
"ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific 
(stable)": 154
},
"mds": {
"ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific 
(stable)": 5
},
"rgw": {
"ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific 
(stable)": 2
},
"overall": {
"ceph version 16.2.13 (5378749ba6be3a0868b51803968ee9cde4833a3e) pacific 
(stable)": 56,
"ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific 
(stable)": 161
}
}

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io