[ceph-users] Re: ERROR: Distro uos version 20 not supported

2023-05-08 Thread Ben
Get one of the time services up and running, you then get through this. The
error message is quite of misleading?

Ben  于2023年4月26日周三 15:07写道:

> Hi,
> This seems not very relevant since all ceph components are running in
> containers. Any ideas to get over this issue? Any other ideas or
> suggestions on this kind of deployment?
>
> sudo ./cephadm --image 10.21.22.1:5000/ceph:v17.2.5-20230316 --docker
> bootstrap --mon-ip 10.21.22.1 --skip-monitoring-stack
>
> Creating directory /etc/ceph for ceph.conf
>
> Verifying podman|docker is present...
>
> Verifying lvm2 is present...
>
> Verifying time synchronization is in place...
>
> No time sync service is running; checked for ['chrony.service',
> 'chronyd.service', 'systemd-timesyncd.service', 'ntpd.service',
> 'ntp.service', 'ntpsec.service', 'openntpd.service']
>
> ERROR: Distro uos version 20 not supported
>
>
> uname -a
>
> Linux  4.19.0-91.82.42.uelc20.x86_64 #1 SMP Sat May 15 13:50:04 CST
> 2021 x86_64 x86_64 x86_64 GNU/Linux
>
> Thank you in advance
> Ben
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: non root deploy ceph 17.2.5 failed

2023-05-08 Thread Ben
Hi, It is uos v20(with kernel 4.19), one linux distribution among others.
no matter since cephadm deploys things in containers by default. cephadm is
pulled by curl from Quincy branch of github.

I think you could see some sort of errors if you remove parameter
--single-host-defaults.

More investigation shows it looks like a bug with cephadm.
during the deploying procedure
,/tmp/var/lib/ceph/ad3a132e-e9ee-11ed-8a19-043f72fb8bf9/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e.new
is created through sudo ssh session remotely(with owner of root) and
/tmp/var/lib/ceph/ad3a132e-e9ee-11ed-8a19-043f72fb8bf9/ is changed to owner
of ssh user deployer. The correct thing to do instead is,  /tmp/var/ be
changed to the owner deployer recursively so that following scp can have
access permission.
I will see if having time to wire up a PR to fix it.

Thanks for help on this.
Ben


Eugen Block  于2023年5月8日周一 21:01写道:

> Hi,
>
> could you provide some more details about your host OS? Which cephadm
> version is it? I was able to bootstrap a one-node cluster with both
> 17.2.5 and 17.2.6 with a non-root user with no such error on openSUSE
> Leap 15.4:
>
> quincy:~ # rpm -qa | grep cephadm
> cephadm-17.2.6.248+gad656d572cb-lp154.2.1.noarch
>
> deployer@quincy:~> sudo cephadm --image quay.io/ceph/ceph:v17.2.5
> bootstrap --mon-ip 172.17.2.3 --skip-monitoring-stack --ssh-user
> deployer --single-host-defaults
> Verifying ssh connectivity ...
> Adding key to deployer@localhost authorized_keys...
> Verifying podman|docker is present...
> Verifying lvm2 is present...
> Verifying time synchronization is in place...
> Unit chronyd.service is enabled and running
> Repeating the final host check...
> podman (/usr/bin/podman) version 4.4.4 is present
> [...]
> Ceph version: ceph version 17.2.5
> (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
> Extracting ceph user uid/gid from container image...
> Creating initial keys...
> Creating initial monmap...
> Creating mon...
> Waiting for mon to start...
> Waiting for mon...
> mon is available
> [...]
> Adding key to deployer@localhost authorized_keys...
> Adding host quincy...
> Deploying mon service with default placement...
> Deploying mgr service with default placement...
> [...]
> Bootstrap complete.
>
> Zitat von Ben :
>
> > Hi,
> >
> > with following command:
> >
> > sudo cephadm  --docker bootstrap --mon-ip 10.1.32.33
> --skip-monitoring-stack
> >   --ssh-user deployer
> > the user deployer has passwordless sudo configuration.
> > I can see the error below:
> >
> > debug 2023-05-04T12:46:43.268+ 7fc5ddc2e700  0 [cephadm ERROR
> > cephadm.ssh] Unable to write
> >
> szhyf-xx1d002-hx15w:/var/lib/ceph/ad3a132e-e9ee-11ed-8a19-043f72fb8bf9/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e:
> > scp:
> >
> /tmp/var/lib/ceph/ad3a132e-e9ee-11ed-8a19-043f72fb8bf9/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e.new:
> > Permission denied
> >
> > Traceback (most recent call last):
> >
> >   File "/usr/share/ceph/mgr/cephadm/ssh.py", line 222, in
> _write_remote_file
> >
> > await asyncssh.scp(f.name, (conn, tmp_path))
> >
> >   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in scp
> >
> > await source.run(srcpath)
> >
> >   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 458, in run
> >
> > self.handle_error(exc)
> >
> >   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in
> > handle_error
> >
> > raise exc from None
> >
> >   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in run
> >
> > await self._send_files(path, b'')
> >
> >   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in
> > _send_files
> >
> > self.handle_error(exc)
> >
> >   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in
> > handle_error
> >
> > raise exc from None
> >
> >   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in
> > _send_files
> >
> > await self._send_file(srcpath, dstpath, attrs)
> >
> >   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 365, in
> > _send_file
> >
> > await self._make_cd_request(b'C', attrs, size, srcpath)
> >
> >   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 343, in
> > _make_cd_request
> >
> > self._fs.basename(path))
> >
> >   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 224, in
> > make_request
> >
> > raise exc
> >
> > Any ideas on this?
> >
> > Thanks,
> > Ben
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-08 Thread Radoslaw Zarzynski
rados approved.

On Sun, May 7, 2023 at 11:24 PM Yuri Weinstein  wrote:
>
> All PRs were cherry-picked and the new RC1 build is:
>
> https://shaman.ceph.com/builds/ceph/pacific-release/8f93a58b82b94b6c9ac48277cc15bd48d4c0a902/
>
> Rados, fs and rgw were rerun and results are summarized here:
> https://tracker.ceph.com/issues/59542#note-1
>
> Seeking final approvals:
>
> rados - Radek
> fs - Venky
> rgw - Casey
>
> On Fri, May 5, 2023 at 8:27 AM Yuri Weinstein  wrote:
> >
> > I got verbal approvals for the listed PRs:
> >
> > https://github.com/ceph/ceph/pull/51232 -- Venky approved
> > https://github.com/ceph/ceph/pull/51344  -- Venky approved
> > https://github.com/ceph/ceph/pull/51200 -- Casey approved
> > https://github.com/ceph/ceph/pull/50894  -- Radek approved
> >
> > Suites rados and fs will need to be retested on updates pacific-release 
> > branch.
> >
> >
> > On Thu, May 4, 2023 at 9:13 AM Yuri Weinstein  wrote:
> > >
> > > In summary:
> > >
> > > Release Notes:  https://github.com/ceph/ceph/pull/51301
> > >
> > > We plan to finish this release next week and we have the following PRs
> > > planned to be added:
> > >
> > > https://github.com/ceph/ceph/pull/51232 -- Venky approved
> > > https://github.com/ceph/ceph/pull/51344  -- Venky in progress
> > > https://github.com/ceph/ceph/pull/51200 -- Casey approved
> > > https://github.com/ceph/ceph/pull/50894  -- Radek in progress
> > >
> > > As soon as these PRs are finalized, I will cherry-pick them and
> > > rebuild "pacific-release" and rerun appropriate suites.
> > >
> > > On Thu, May 4, 2023 at 9:07 AM Radoslaw Zarzynski  
> > > wrote:
> > > >
> > > > If we get some time, I would like to include:
> > > >
> > > >   https://github.com/ceph/ceph/pull/50894.
> > > >
> > > > Regards,
> > > > Radek
> > > >
> > > > On Thu, May 4, 2023 at 5:56 PM Venky Shankar  
> > > > wrote:
> > > > >
> > > > > Hi Yuri,
> > > > >
> > > > > On Wed, May 3, 2023 at 7:10 PM Venky Shankar  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, May 2, 2023 at 8:25 PM Yuri Weinstein  
> > > > > > wrote:
> > > > > > >
> > > > > > > Venky, I did plan to cherry-pick this PR if you approve this 
> > > > > > > (this PR
> > > > > > > was used for a rerun)
> > > > > >
> > > > > > OK. The fs suite failure is being looked into
> > > > > > (https://tracker.ceph.com/issues/59626).
> > > > >
> > > > > Fix is being tracked by
> > > > >
> > > > > https://github.com/ceph/ceph/pull/51344
> > > > >
> > > > > Once ready, it needs to be included in 16.2.13 and would require a fs
> > > > > suite re-run (although re-renning the failed tests should suffice,
> > > > > however, I'm a bit inclined in putting it through the fs suite).
> > > > >
> > > > > >
> > > > > > >
> > > > > > > On Tue, May 2, 2023 at 7:51 AM Venky Shankar 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hi Yuri,
> > > > > > > >
> > > > > > > > On Fri, Apr 28, 2023 at 2:53 AM Yuri Weinstein 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > Details of this release are summarized here:
> > > > > > > > >
> > > > > > > > > https://tracker.ceph.com/issues/59542#note-1
> > > > > > > > > Release Notes - TBD
> > > > > > > > >
> > > > > > > > > Seeking approvals for:
> > > > > > > > >
> > > > > > > > > smoke - Radek, Laura
> > > > > > > > > rados - Radek, Laura
> > > > > > > > >   rook - Sébastien Han
> > > > > > > > >   cephadm - Adam K
> > > > > > > > >   dashboard - Ernesto
> > > > > > > > >
> > > > > > > > > rgw - Casey
> > > > > > > > > rbd - Ilya
> > > > > > > > > krbd - Ilya
> > > > > > > > > fs - Venky, Patrick
> > > > > > > >
> > > > > > > > There are a couple of new failures which are qa/test related - 
> > > > > > > > I'll
> > > > > > > > have a look at those (they _do not_ look serious).
> > > > > > > >
> > > > > > > > Also, Yuri, do you plan to merge
> > > > > > > >
> > > > > > > > https://github.com/ceph/ceph/pull/51232
> > > > > > > >
> > > > > > > > into the pacific-release branch although it's tagged with one 
> > > > > > > > of your
> > > > > > > > other pacific runs?
> > > > > > > >
> > > > > > > > > upgrade/octopus-x (pacific) - Laura (look the same as in 
> > > > > > > > > 16.2.8)
> > > > > > > > > upgrade/pacific-p2p - Laura
> > > > > > > > > powercycle - Brad (SELinux denials)
> > > > > > > > > ceph-volume - Guillaume, Adam K
> > > > > > > > >
> > > > > > > > > Thx
> > > > > > > > > YuriW
> > > > > > > > > ___
> > > > > > > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > > > > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Cheers,
> > > > > > > > Venky
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Cheers,
> > > > > > Venky
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Cheers,
> > > > > Venky
> > > > > ___
> > > > > Dev mailing list -- d...@ceph.io
> > > > 

[ceph-users] Upgrade Ceph cluster + radosgw from 14.2.18 to latest 15

2023-05-08 Thread viplanghe6
Hi, I want to upgrade my old Ceph cluster + Radosgw from v14 to v15. But I'm 
not using cephadm and I'm not sure how to limit errors as much as possible 
during the upgrade process?

Here is my upgrade steps:
Firstly, upgrade from 14.2.18 to 14.2.22 (latest nautilus version)
Then, upgrade it from 14.2.22 to 15.2.17 (latest octopus version)

service restart order:
- restart monitors (sleep ~10)
- restart managers (sleep ~5)
- restart metadata OSDs (sleep ~30)
- restart data OSDs (sleep ~30)
- restart radosgw (sleep ~20)

Is there anything wrong with these steps?
Does 15.2.17 is the most stable version of Octopus?

Thanks.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pg deep-scrub issue

2023-05-08 Thread Peter
Hi Janne,

Thank you for your response. 

I use `ceph pg deep-scrub ` command, and all returns are point the 
osd.166.  
I check SMART data and syslog on osd.166 , the disk are fine.
Now the late deep-scrub PG numbers are lower, however it been 5 days since last 
post. 
I attached the perf dump for that osd. There are tons of values, do you have 
any suggest for how to read and analysis the values in the report?
Are there any key values worth to notice on below dump result?


ceph daemon osd.166 perf dump
{
"AsyncMessenger::Worker-0": {
"msgr_recv_messages": 19341158,
"msgr_send_messages": 18796191,
"msgr_recv_bytes": 349806218697,
"msgr_send_bytes": 280579883194,
"msgr_created_connections": 45848,
"msgr_active_connections": 371,
"msgr_running_total_time": 3472.846508390,
"msgr_running_send_time": 1303.229556187,
"msgr_running_recv_time": 2666.103910588,
"msgr_running_fast_dispatch_time": 279.453905222,
"msgr_send_messages_queue_lat": {
"avgcount": 18796198,
"sum": 2186.559714335,
"avgtime": 0.000116329
},
"msgr_handle_ack_lat": {
"avgcount": 17717151,
"sum": 8.286058140,
"avgtime": 0.00467
}
},
"AsyncMessenger::Worker-1": {
"msgr_recv_messages": 15162337,
"msgr_send_messages": 14791166,
"msgr_recv_bytes": 200258563166,
"msgr_send_bytes": 133207293543,
"msgr_created_connections": 44152,
"msgr_active_connections": 377,
"msgr_running_total_time": 2560.268490568,
"msgr_running_send_time": 952.458957466,
"msgr_running_recv_time": 1217.955146530,
"msgr_running_fast_dispatch_time": 232.240019379,
"msgr_send_messages_queue_lat": {
"avgcount": 14791125,
"sum": 1897.607826992,
"avgtime": 0.000128293
},
"msgr_handle_ack_lat": {
"avgcount": 15269688,
"sum": 7.098990800,
"avgtime": 0.00464
}
},
"AsyncMessenger::Worker-2": {
"msgr_recv_messages": 15677023,
"msgr_send_messages": 15358783,
"msgr_recv_bytes": 228508634479,
"msgr_send_bytes": 211006631139,
"msgr_created_connections": 45406,
"msgr_active_connections": 383,
"msgr_running_total_time": 2759.930104879,
"msgr_running_send_time": 1053.707455802,
"msgr_running_recv_time": 4334.833363876,
"msgr_running_fast_dispatch_time": 239.419774153,
"msgr_send_messages_queue_lat": {
"avgcount": 15358747,
"sum": 2015.660278745,
"avgtime": 0.000131238
},
"msgr_handle_ack_lat": {
"avgcount": 16139329,
"sum": 7.595875666,
"avgtime": 0.00470
}
},
"bluefs": {
"gift_bytes": 0,
"reclaim_bytes": 0,
"db_total_bytes": 307090153472,
"db_used_bytes": 7901011968,
"wal_total_bytes": 0,
"wal_used_bytes": 0,
"slow_total_bytes": 400033120256,
"slow_used_bytes": 0,
"num_files": 124,
"log_bytes": 14528512,
"log_compactions": 12,
"logged_bytes": 190578688,
"files_written_wal": 2,
"files_written_sst": 2852,
"bytes_written_wal": 295528103936,
"bytes_written_sst": 121776099328,
"bytes_written_slow": 0,
"max_bytes_wal": 0,
"max_bytes_db": 9753845760,
"max_bytes_slow": 0,
"read_random_count": 2269991,
"read_random_bytes": 135270214328,
"read_random_disk_count": 1190220,
"read_random_disk_bytes": 130232185489,
"read_random_buffer_count": 1086502,
"read_random_buffer_bytes": 5038028839,
"read_count": 333292,
"read_bytes": 13432447950,
"read_prefetch_count": 330425,
"read_prefetch_bytes": 13419652916,
"read_zeros_candidate": 0,
"read_zeros_errors": 0
},
"bluestore": {
"kv_flush_lat": {
"avgcount": 13561020,
"sum": 44.317007002,
"avgtime": 0.03267
},
"kv_commit_lat": {
"avgcount": 13561020,
"sum": 8141.086529679,
"avgtime": 0.000600329
},
"kv_sync_lat": {
"avgcount": 13561020,
"sum": 8185.403536681,
"avgtime": 0.000603597
},
"kv_final_lat": {
"avgcount": 13548150,
"sum": 852.970428455,
"avgtime": 0.62958
},
"state_prepare_lat": {
"avgcount": 18292548,
"sum": 243976.346193841,
"avgtime": 0.013337471
},
"state_aio_wait_lat": {
"avgcount": 18292543,
"sum": 878753.135015570,
"avgtime": 0.048038872
},
"state_io_done_lat": {
"avgcount": 18

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-08 Thread Zakhar Kirpichenko
Don't mean to hijack the thread, but I may be observing something similar
with 16.2.12: OSD performance noticeably peaks after OSD restart and then
gradually reduces over 10-14 days, while commit and apply latencies
increase across the board.

Non-default settings are:

"bluestore_cache_size_hdd": {
"default": "1073741824",
"mon": "4294967296",
"final": "4294967296"
},
"bluestore_cache_size_ssd": {
"default": "3221225472",
"mon": "4294967296",
"final": "4294967296"
},
...
"osd_memory_cache_min": {
"default": "134217728",
"mon": "2147483648",
"final": "2147483648"
},
"osd_memory_target": {
"default": "4294967296",
"mon": "17179869184",
"final": "17179869184"
},
"osd_scrub_sleep": {
"default": 0,
"mon": 0.10001,
"final": 0.10001
},
"rbd_balance_parent_reads": {
"default": false,
"mon": true,
"final": true
},

All other settings are default, the usage is rather simple Openstack / RBD.

I also noticed that OSD cache usage doesn't increase over time (see my
message "Ceph 16.2.12, bluestore cache doesn't seem to be used much" dated
26 April 2023, which received no comments), despite OSDs are being used
rather heavily and there's plenty of host and OSD cache / target memory
available. It may be worth checking if available memory is being used in a
good way.

/Z

On Mon, 8 May 2023 at 22:35, Igor Fedotov  wrote:

> Hey Nikola,
>
> On 5/8/2023 10:13 PM, Nikola Ciprich wrote:
> > OK, starting collecting those for all OSDs..
> > I have hour samples of all OSDs perf dumps loaded in DB, so I can easily
> examine,
> > sort, whatever..
> >
> You didn't reset the counters every hour, do you? So having average
> subop_w_latency growing that way means the current values were much
> higher than before.
>
> Curious if subop latencies were growing for every OSD or just a subset
> (may be even just a single one) of them?
>
>
> Next time you reach the bad state please do the following if possible:
>
> - reset perf counters for every OSD
>
> -  leave the cluster running for 10 mins and collect perf counters again.
>
> - Then start restarting OSD one-by-one starting with the worst OSD (in
> terms of subop_w_lat from the prev step). Wouldn't be sufficient to
> reset just a few OSDs before the cluster is back to normal?
>
> >> currently values for avgtime are around 0.0003 for subop_w_lat and
> 0.001-0.002
> >> for op_w_lat
> > OK, so there is no visible trend on op_w_lat, still between 0.001 and
> 0.002
> >
> > subop_w_lat seems to have increased since yesterday though! I see values
> from
> > 0.0004 to as high as 0.001
> >
> > If some other perf data might be interesting, please let me know..
> >
> > During OSD restarts, I noticed strange thing - restarts on first 6
> machines
> > went smooth, but then on another 3, I saw rocksdb logs recovery on all
> SSD
> > OSDs. but first didn't see any mention of daemon crash in ceph -s
> >
> > later, crash info appeared, but only about 3 daemons (in total, at least
> 20
> > of them crashed though)
> >
> > crash report was similar for all three OSDs:
> >
> > [root@nrbphav4a ~]# ceph crash info
> 2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3
> > {
> >  "backtrace": [
> >  "/lib64/libc.so.6(+0x54d90) [0x7f64a6323d90]",
> >  "(BlueStore::_txc_create(BlueStore::Collection*,
> BlueStore::OpSequencer*, std::__cxx11::list std::allocator >*, boost::intrusive_ptr)+0x413)
> [0x55a1c9d07c43]",
> >
> "(BlueStore::queue_transactions(boost::intrusive_ptr&,
> std::vector
> >&, boost::intrusive_ptr, ThreadPool::TPHandle*)+0x22b)
> [0x55a1c9d27e9b]",
> >  "(ReplicatedBackend::submit_transaction(hobject_t const&,
> object_stat_sum_t const&, eversion_t const&, std::unique_ptr std::default_delete >&&, eversion_t const&, eversion_t
> const&, std::vector >&&,
> std::optional&, Context*, unsigned long, osd_reqid_t,
> boost::intrusive_ptr)+0x8ad) [0x55a1c9bbcfdd]",
> >  "(PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*,
> PrimaryLogPG::OpContext*)+0x38f) [0x55a1c99d1cbf]",
> >
> "(PrimaryLogPG::simple_opc_submit(std::unique_ptr std::default_delete >)+0x57) [0x55a1c99d6777]",
> >
> "(PrimaryLogPG::handle_watch_timeout(std::shared_ptr)+0xb73)
> [0x55a1c99da883]",
> >  "/usr/bin/ceph-osd(+0x58794e) [0x55a1c992994e]",
> >  "(CommonSafeTimer::timer_thread()+0x11a)
> [0x55a1c9e226aa]",
> >  "/usr/bin/ceph-osd(+0xa80eb1) [0x55a1c9e22eb1]",
> >  "/lib64/libc.so.6(+0x9f802) [0x7f64a636e802]",
> >  "/lib64/libc.so.6(+0x3f450) [0x7f64a630e450]"
> >  ],
> >  "ceph_version": "17.2.6",
> >  "crash_id":
> "2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3",
> >  "entity_name

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-08 Thread Igor Fedotov

Hey Nikola,

On 5/8/2023 10:13 PM, Nikola Ciprich wrote:

OK, starting collecting those for all OSDs..
I have hour samples of all OSDs perf dumps loaded in DB, so I can easily 
examine,
sort, whatever..

You didn't reset the counters every hour, do you? So having average 
subop_w_latency growing that way means the current values were much 
higher than before.


Curious if subop latencies were growing for every OSD or just a subset 
(may be even just a single one) of them?



Next time you reach the bad state please do the following if possible:

- reset perf counters for every OSD

-  leave the cluster running for 10 mins and collect perf counters again.

- Then start restarting OSD one-by-one starting with the worst OSD (in 
terms of subop_w_lat from the prev step). Wouldn't be sufficient to 
reset just a few OSDs before the cluster is back to normal?



currently values for avgtime are around 0.0003 for subop_w_lat and 0.001-0.002
for op_w_lat

OK, so there is no visible trend on op_w_lat, still between 0.001 and 0.002

subop_w_lat seems to have increased since yesterday though! I see values from
0.0004 to as high as 0.001

If some other perf data might be interesting, please let me know..

During OSD restarts, I noticed strange thing - restarts on first 6 machines
went smooth, but then on another 3, I saw rocksdb logs recovery on all SSD
OSDs. but first didn't see any mention of daemon crash in ceph -s

later, crash info appeared, but only about 3 daemons (in total, at least 20
of them crashed though)

crash report was similar for all three OSDs:

[root@nrbphav4a ~]# ceph crash info 
2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3
{
 "backtrace": [
 "/lib64/libc.so.6(+0x54d90) [0x7f64a6323d90]",
 "(BlueStore::_txc_create(BlueStore::Collection*, BlueStore::OpSequencer*, 
std::__cxx11::list >*, 
boost::intrusive_ptr)+0x413) [0x55a1c9d07c43]",
 "(BlueStore::queue_transactions(boost::intrusive_ptr&, 
std::vector >&, 
boost::intrusive_ptr, ThreadPool::TPHandle*)+0x22b) [0x55a1c9d27e9b]",
 "(ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr >&&, eversion_t const&, eversion_t const&, std::vector >&&, std::optional&, Context*, unsigned long, osd_reqid_t, 
boost::intrusive_ptr)+0x8ad) [0x55a1c9bbcfdd]",
 "(PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, 
PrimaryLogPG::OpContext*)+0x38f) [0x55a1c99d1cbf]",
 "(PrimaryLogPG::simple_opc_submit(std::unique_ptr >)+0x57) [0x55a1c99d6777]",
 "(PrimaryLogPG::handle_watch_timeout(std::shared_ptr)+0xb73) 
[0x55a1c99da883]",
 "/usr/bin/ceph-osd(+0x58794e) [0x55a1c992994e]",
 "(CommonSafeTimer::timer_thread()+0x11a) [0x55a1c9e226aa]",
 "/usr/bin/ceph-osd(+0xa80eb1) [0x55a1c9e22eb1]",
 "/lib64/libc.so.6(+0x9f802) [0x7f64a636e802]",
 "/lib64/libc.so.6(+0x3f450) [0x7f64a630e450]"
 ],
 "ceph_version": "17.2.6",
 "crash_id": 
"2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3",
 "entity_name": "osd.98",
 "os_id": "almalinux",
 "os_name": "AlmaLinux",
 "os_version": "9.0 (Emerald Puma)",
 "os_version_id": "9.0",
 "process_name": "ceph-osd",
 "stack_sig": 
"b1a1c5bd45e23382497312202e16cfd7a62df018c6ebf9ded0f3b3ca3c1dfa66",
 "timestamp": "2023-05-08T17:45:47.056675Z",
 "utsname_hostname": "nrbphav4h",
 "utsname_machine": "x86_64",
 "utsname_release": "5.15.90lb9.01",
 "utsname_sysname": "Linux",
 "utsname_version": "#1 SMP Fri Jan 27 15:52:13 CET 2023"
}


I was trying to figure out why this particular 3 nodes could behave differently
and found out from colleagues, that those 3 nodes were added to cluster lately
with direct install of 17.2.5 (others were installed 15.2.16 and later upgraded)

not sure whether this is related to our problem though..

I see very similar crash reported here:https://tracker.ceph.com/issues/56346
so I'm not reporting..

Do you think this might somehow be the cause of the problem? Anything else I 
should
check in perf dumps or elsewhere?


Hmm... don't know yet. Could you please last 20K lines prior the crash 
from e.g two sample OSDs?


And the crash isn't permanent, OSDs are able to start after the 
second(?) shot, aren't they?



with best regards

nik







--
Igor Fedotov
Ceph Lead Developer
--
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web  | LinkedIn  | 
Youtube  | 
Twitter 


Meet us at the SC22 Conference! Learn more 
Technology Fast50 Award Winner by Deloitte 
!




[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-08 Thread Yuri Weinstein
Josh,

The 16.2.13 RC1 has all approvals and unless I hear any objections, I
will start publishing later today.

On Mon, May 8, 2023 at 12:09 PM Radoslaw Zarzynski  wrote:
>
> rados approved.
>
> On Sun, May 7, 2023 at 11:24 PM Yuri Weinstein  wrote:
> >
> > All PRs were cherry-picked and the new RC1 build is:
> >
> > https://shaman.ceph.com/builds/ceph/pacific-release/8f93a58b82b94b6c9ac48277cc15bd48d4c0a902/
> >
> > Rados, fs and rgw were rerun and results are summarized here:
> > https://tracker.ceph.com/issues/59542#note-1
> >
> > Seeking final approvals:
> >
> > rados - Radek
> > fs - Venky
> > rgw - Casey
> >
> > On Fri, May 5, 2023 at 8:27 AM Yuri Weinstein  wrote:
> > >
> > > I got verbal approvals for the listed PRs:
> > >
> > > https://github.com/ceph/ceph/pull/51232 -- Venky approved
> > > https://github.com/ceph/ceph/pull/51344  -- Venky approved
> > > https://github.com/ceph/ceph/pull/51200 -- Casey approved
> > > https://github.com/ceph/ceph/pull/50894  -- Radek approved
> > >
> > > Suites rados and fs will need to be retested on updates pacific-release 
> > > branch.
> > >
> > >
> > > On Thu, May 4, 2023 at 9:13 AM Yuri Weinstein  wrote:
> > > >
> > > > In summary:
> > > >
> > > > Release Notes:  https://github.com/ceph/ceph/pull/51301
> > > >
> > > > We plan to finish this release next week and we have the following PRs
> > > > planned to be added:
> > > >
> > > > https://github.com/ceph/ceph/pull/51232 -- Venky approved
> > > > https://github.com/ceph/ceph/pull/51344  -- Venky in progress
> > > > https://github.com/ceph/ceph/pull/51200 -- Casey approved
> > > > https://github.com/ceph/ceph/pull/50894  -- Radek in progress
> > > >
> > > > As soon as these PRs are finalized, I will cherry-pick them and
> > > > rebuild "pacific-release" and rerun appropriate suites.
> > > >
> > > > On Thu, May 4, 2023 at 9:07 AM Radoslaw Zarzynski  
> > > > wrote:
> > > > >
> > > > > If we get some time, I would like to include:
> > > > >
> > > > >   https://github.com/ceph/ceph/pull/50894.
> > > > >
> > > > > Regards,
> > > > > Radek
> > > > >
> > > > > On Thu, May 4, 2023 at 5:56 PM Venky Shankar  
> > > > > wrote:
> > > > > >
> > > > > > Hi Yuri,
> > > > > >
> > > > > > On Wed, May 3, 2023 at 7:10 PM Venky Shankar  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Tue, May 2, 2023 at 8:25 PM Yuri Weinstein 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Venky, I did plan to cherry-pick this PR if you approve this 
> > > > > > > > (this PR
> > > > > > > > was used for a rerun)
> > > > > > >
> > > > > > > OK. The fs suite failure is being looked into
> > > > > > > (https://tracker.ceph.com/issues/59626).
> > > > > >
> > > > > > Fix is being tracked by
> > > > > >
> > > > > > https://github.com/ceph/ceph/pull/51344
> > > > > >
> > > > > > Once ready, it needs to be included in 16.2.13 and would require a 
> > > > > > fs
> > > > > > suite re-run (although re-renning the failed tests should suffice,
> > > > > > however, I'm a bit inclined in putting it through the fs suite).
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > On Tue, May 2, 2023 at 7:51 AM Venky Shankar 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > Hi Yuri,
> > > > > > > > >
> > > > > > > > > On Fri, Apr 28, 2023 at 2:53 AM Yuri Weinstein 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > Details of this release are summarized here:
> > > > > > > > > >
> > > > > > > > > > https://tracker.ceph.com/issues/59542#note-1
> > > > > > > > > > Release Notes - TBD
> > > > > > > > > >
> > > > > > > > > > Seeking approvals for:
> > > > > > > > > >
> > > > > > > > > > smoke - Radek, Laura
> > > > > > > > > > rados - Radek, Laura
> > > > > > > > > >   rook - Sébastien Han
> > > > > > > > > >   cephadm - Adam K
> > > > > > > > > >   dashboard - Ernesto
> > > > > > > > > >
> > > > > > > > > > rgw - Casey
> > > > > > > > > > rbd - Ilya
> > > > > > > > > > krbd - Ilya
> > > > > > > > > > fs - Venky, Patrick
> > > > > > > > >
> > > > > > > > > There are a couple of new failures which are qa/test related 
> > > > > > > > > - I'll
> > > > > > > > > have a look at those (they _do not_ look serious).
> > > > > > > > >
> > > > > > > > > Also, Yuri, do you plan to merge
> > > > > > > > >
> > > > > > > > > https://github.com/ceph/ceph/pull/51232
> > > > > > > > >
> > > > > > > > > into the pacific-release branch although it's tagged with one 
> > > > > > > > > of your
> > > > > > > > > other pacific runs?
> > > > > > > > >
> > > > > > > > > > upgrade/octopus-x (pacific) - Laura (look the same as in 
> > > > > > > > > > 16.2.8)
> > > > > > > > > > upgrade/pacific-p2p - Laura
> > > > > > > > > > powercycle - Brad (SELinux denials)
> > > > > > > > > > ceph-volume - Guillaume, Adam K
> > > > > > > > > >
> > > > > > > > > > Thx
> > > > > > > > > > YuriW
> > > > > > > > > > ___
> > > > > > > > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > > > > > > > To unsubs

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-08 Thread Nikola Ciprich
Hello Igor,

so I was checking the performance every day since Tuesday.. every day it seemed
to be the same - ~ 60-70kOPS on random write from single VM
yesterday it finally dropped to 20kOPS
today to 10kOPS. I also tried with newly created volume, the result (after 
prefill)
is the same, so it doesn't make any difference..

so I reverted all mentioned options to their defaults and restarted all OSDs.
performance immediately returned to better values (I suppose this is again
caused by the restart only)

good news is, that setting osd_fast_shutdown_timeout to 0 really helped with
OSD crashes during restarts, which speeds it up a lot.. but I have some new
crashes, more on this later..

> > I'd suggest to start monitoring perf counters for your osds.
> > op_w_lat/subop_w_lat ones specifically. I presume they raise eventually,
> > don't they?
> OK, starting collecting those for all OSDs..
I have hour samples of all OSDs perf dumps loaded in DB, so I can easily 
examine,
sort, whatever..


> 
> currently values for avgtime are around 0.0003 for subop_w_lat and 0.001-0.002
> for op_w_lat
OK, so there is no visible trend on op_w_lat, still between 0.001 and 0.002

subop_w_lat seems to have increased since yesterday though! I see values from
0.0004 to as high as 0.001

If some other perf data might be interesting, please let me know..

During OSD restarts, I noticed strange thing - restarts on first 6 machines
went smooth, but then on another 3, I saw rocksdb logs recovery on all SSD
OSDs. but first didn't see any mention of daemon crash in ceph -s

later, crash info appeared, but only about 3 daemons (in total, at least 20
of them crashed though)

crash report was similar for all three OSDs:

[root@nrbphav4a ~]# ceph crash info 
2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3
{
"backtrace": [
"/lib64/libc.so.6(+0x54d90) [0x7f64a6323d90]",
"(BlueStore::_txc_create(BlueStore::Collection*, 
BlueStore::OpSequencer*, std::__cxx11::list 
>*, boost::intrusive_ptr)+0x413) [0x55a1c9d07c43]",

"(BlueStore::queue_transactions(boost::intrusive_ptr&,
 std::vector >&, 
boost::intrusive_ptr, ThreadPool::TPHandle*)+0x22b) 
[0x55a1c9d27e9b]",
"(ReplicatedBackend::submit_transaction(hobject_t const&, 
object_stat_sum_t const&, eversion_t const&, std::unique_ptr >&&, eversion_t const&, eversion_t const&, 
std::vector >&&, 
std::optional&, Context*, unsigned long, osd_reqid_t, 
boost::intrusive_ptr)+0x8ad) [0x55a1c9bbcfdd]",
"(PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, 
PrimaryLogPG::OpContext*)+0x38f) [0x55a1c99d1cbf]",

"(PrimaryLogPG::simple_opc_submit(std::unique_ptr >)+0x57) [0x55a1c99d6777]",
"(PrimaryLogPG::handle_watch_timeout(std::shared_ptr)+0xb73) 
[0x55a1c99da883]",
"/usr/bin/ceph-osd(+0x58794e) [0x55a1c992994e]",
"(CommonSafeTimer::timer_thread()+0x11a) [0x55a1c9e226aa]",
"/usr/bin/ceph-osd(+0xa80eb1) [0x55a1c9e22eb1]",
"/lib64/libc.so.6(+0x9f802) [0x7f64a636e802]",
"/lib64/libc.so.6(+0x3f450) [0x7f64a630e450]"
],
"ceph_version": "17.2.6",
"crash_id": 
"2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3",
"entity_name": "osd.98",
"os_id": "almalinux",
"os_name": "AlmaLinux",
"os_version": "9.0 (Emerald Puma)",
"os_version_id": "9.0",
"process_name": "ceph-osd",
"stack_sig": 
"b1a1c5bd45e23382497312202e16cfd7a62df018c6ebf9ded0f3b3ca3c1dfa66",
"timestamp": "2023-05-08T17:45:47.056675Z",
"utsname_hostname": "nrbphav4h",
"utsname_machine": "x86_64",
"utsname_release": "5.15.90lb9.01",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Fri Jan 27 15:52:13 CET 2023"
}


I was trying to figure out why this particular 3 nodes could behave differently
and found out from colleagues, that those 3 nodes were added to cluster lately
with direct install of 17.2.5 (others were installed 15.2.16 and later upgraded)

not sure whether this is related to our problem though..

I see very similar crash reported here: https://tracker.ceph.com/issues/56346
so I'm not reporting..

Do you think this might somehow be the cause of the problem? Anything else I 
should
check in perf dumps or elsewhere?

with best regards

nik






-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rgw service fails to start with zone not found

2023-05-08 Thread Adiga, Anantha
Hi Eugene,

I had removed the zone before removing it from the zonegroup. I will check the 
objects and remove the appropriate ones. Thank you. 

As outline in the thread, after setting the config for the rgw service, they 
started ok . 

Thank you,
Anantha


-Original Message-
From: Eugen Block  
Sent: Monday, May 8, 2023 10:55 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: rgw service fails to start with zone not found

Hi,
how exactly did you remove the configuration?
Check out the .rgw.root pool, there are different namespaces where the 
corresponding objects are stored.

rados -p .rgw.root ls —all

You should be able to remove those objects from the pool, but be careful to not 
delete anything you actually need.

Zitat von "Adiga, Anantha" :

> Hi,
>
> An existing multisite configuration was removed.  But the radosgw 
> services still see the old zone name and fail to start.
>
> journalctl -u
> ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@rgw.default.default.fl31ca10
> 4ja0201.ninovs
> ...
> May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug
> 2023-05-08T16:10:48.897+ 7f2f47634740  0 deferred set uid:gid to
> 167:167 (ceph:ceph)
> May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug
> 2023-05-08T16:10:48.897+ 7f2f47634740  0 ceph version 17.2.6 
> (d7ff0d10654d2280e08f1ab989> May 08 16:10:48 fl31ca104ja0201 
> bash[3964341]: debug
> 2023-05-08T16:10:48.897+ 7f2f47634740  0 framework: beast May 08 
> 16:10:48 fl31ca104ja0201 bash[3964341]: debug
> 2023-05-08T16:10:48.897+ 7f2f47634740  0 framework conf key:  
> port, val: 8080
> May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug
> 2023-05-08T16:10:48.897+ 7f2f47634740  1 radosgw_Main not setting 
> numa affinity May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug
> 2023-05-08T16:10:48.901+ 7f2f47634740  1 rgw_d3n:  
> rgw_d3n_l1_local_datacache_enabled=0
> May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug
> 2023-05-08T16:10:48.901+ 7f2f47634740  1 D3N datacache enabled: 0 
> May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug
> 2023-05-08T16:10:48.917+ 7f2f47634740  0 rgw main: ERROR: could 
> not find zone (fl2site2) May 08 16:10:48 fl31ca104ja0201 
> bash[3964341]: debug
> 2023-05-08T16:10:48.917+ 7f2f47634740  0 rgw main: ERROR: failed 
> to start notify service> May 08 16:10:48 fl31ca104ja0201 
> bash[3964341]: debug
> 2023-05-08T16:10:48.917+ 7f2f47634740  0 rgw main: ERROR: failed 
> to init services (ret=(> May 08 16:10:48 fl31ca104ja0201 
> bash[3964341]: debug
> 2023-05-08T16:10:48.921+ 7f2f47634740 -1 Couldn't init storage 
> provider (RADOS) May 08 16:10:49 fl31ca104ja0201 systemd[1]:
> ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@rgw.default.default.fl31ca10
> 4ja0201.ninovs.service: Main
> pr>
> May 08 16:10:49 fl31ca104ja0201 systemd[1]:  
> ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@rgw.default.default.fl31ca104ja0201.ninovs.service:
>   
> Failed
>
> Here is the current configuration
>
> root@fl31ca104ja0201:/# radosgw-admin period get {
> "id": "729f7cef-6340-4750-b3ae-9164177c0df3",
> "epoch": 1,
> "predecessor_uuid": "1d124ad5-57b0-41de-8def-823bd40f72aa",
> "sync_status": [],
> "period_map": {
> "id": "729f7cef-6340-4750-b3ae-9164177c0df3",
> "zonegroups": [
> {
> "id": "21b8306c-be43-4567-a0c5-74ab69937535",
> "name": "default",
> "api_name": "default",
> "is_master": "true",
> "endpoints": [],
> "hostnames": [],
> "hostnames_s3website": [],
> "master_zone": "ff468492-8a14-414b-a8cf-7c20ab699af3",
> "zones": [
> {
> "id": "ff468492-8a14-414b-a8cf-7c20ab699af3",
> "name": "default",
> "endpoints": [],
> "log_meta": "false",
> "log_data": "false",
> "bucket_index_max_shards": 11,
> "read_only": "false",
> "tier_type": "",
> "sync_from_all": "true",
> "sync_from": [],
> "redirect_zone": ""
> }
> ],
> "placement_targets": [
> {
> "name": "default-placement",
> "tags": [],
> "storage_classes": [
> "STANDARD"
> ]
> }
> ],
> "default_placement": "default-placement",
> "realm_id": "8999e04c-17a4-4150-8845-cecd6672d312",
> "sync_policy": {
> "groups": []
> }
> }
> ],
> "short_zone_ids": [
> {
> "key": "ff468492-8a14-414b-a8cf-7c20ab699af3",
> 

[ceph-users] Re: rgw service fails to start with zone not found

2023-05-08 Thread Adiga, Anantha
Thank you so much.
Here it is. I set them to the current value and  now the  rgw services are up.

Should the configuration variables  get set automatically for the gateway 
services as part of multiste configuration updates? OR it should be a manual 
procedure?

ceph config dump | grep client.rgw.default.default
client.rgw.default.default   advanced  rgw_realm
  global
 *
client.rgw.default.default   advanced  rgw_zone 
  fl2site2  
 *

Thank you,
Anantha

From: Danny Webb 
Sent: Monday, May 8, 2023 10:54 AM
To: Adiga, Anantha ; ceph-users@ceph.io
Subject: Re: rgw service fails to start with zone not found

are the old multisite conf values still in ceph.conf (eg, rgw_zonegroup, 
rgw_zone, rgw_realm)?

From: Adiga, Anantha mailto:anantha.ad...@intel.com>>
Sent: 08 May 2023 18:27
To: ceph-users@ceph.io 
mailto:ceph-users@ceph.io>>
Subject: [ceph-users] rgw service fails to start with zone not found

CAUTION: This email originates from outside THG

Hi,

An existing multisite configuration was removed.  But the radosgw services 
still see the old zone name and fail to start.

journalctl -u 
ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@rgw.default.default.fl31ca104ja0201.ninovs
...
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.897+ 7f2f47634740  0 deferred set uid:gid to 167:167 
(ceph:ceph)
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.897+ 7f2f47634740  0 ceph version 17.2.6 
(d7ff0d10654d2280e08f1ab989>
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.897+ 7f2f47634740  0 framework: beast
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.897+ 7f2f47634740  0 framework conf key: port, val: 8080
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.897+ 7f2f47634740  1 radosgw_Main not setting numa 
affinity
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.901+ 7f2f47634740  1 rgw_d3n: 
rgw_d3n_l1_local_datacache_enabled=0
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.901+ 7f2f47634740  1 D3N datacache enabled: 0
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.917+ 7f2f47634740  0 rgw main: ERROR: could not find 
zone (fl2site2)
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.917+ 7f2f47634740  0 rgw main: ERROR: failed to start 
notify service>
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.917+ 7f2f47634740  0 rgw main: ERROR: failed to init 
services (ret=(>
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.921+ 7f2f47634740 -1 Couldn't init storage provider 
(RADOS)
May 08 16:10:49 fl31ca104ja0201 systemd[1]: 
ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@rgw.default.default.fl31ca104ja0201.ninovs.service:
 Main pr>
May 08 16:10:49 fl31ca104ja0201 systemd[1]: 
ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@rgw.default.default.fl31ca104ja0201.ninovs.service:
 Failed

Here is the current configuration

root@fl31ca104ja0201:/# radosgw-admin period get
{
"id": "729f7cef-6340-4750-b3ae-9164177c0df3",
"epoch": 1,
"predecessor_uuid": "1d124ad5-57b0-41de-8def-823bd40f72aa",
"sync_status": [],
"period_map": {
"id": "729f7cef-6340-4750-b3ae-9164177c0df3",
"zonegroups": [
{
"id": "21b8306c-be43-4567-a0c5-74ab69937535",
"name": "default",
"api_name": "default",
"is_master": "true",
"endpoints": [],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "ff468492-8a14-414b-a8cf-7c20ab699af3",
"zones": [
{
"id": "ff468492-8a14-414b-a8cf-7c20ab699af3",
"name": "default",
"endpoints": [],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 11,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": [],
"storage_classes": [
"STANDARD"
]
}
],
   

[ceph-users] Re: rgw service fails to start with zone not found

2023-05-08 Thread Eugen Block

Hi,
how exactly did you remove the configuration?
Check out the .rgw.root pool, there are different namespaces where the  
corresponding objects are stored.


rados -p .rgw.root ls —all

You should be able to remove those objects from the pool, but be  
careful to not delete anything you actually need.


Zitat von "Adiga, Anantha" :


Hi,

An existing multisite configuration was removed.  But the radosgw  
services still see the old zone name and fail to start.


journalctl -u  
ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@rgw.default.default.fl31ca104ja0201.ninovs

...
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug  
2023-05-08T16:10:48.897+ 7f2f47634740  0 deferred set uid:gid to  
167:167 (ceph:ceph)
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug  
2023-05-08T16:10:48.897+ 7f2f47634740  0 ceph version 17.2.6  
(d7ff0d10654d2280e08f1ab989>
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug  
2023-05-08T16:10:48.897+ 7f2f47634740  0 framework: beast
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug  
2023-05-08T16:10:48.897+ 7f2f47634740  0 framework conf key:  
port, val: 8080
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug  
2023-05-08T16:10:48.897+ 7f2f47634740  1 radosgw_Main not  
setting numa affinity
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug  
2023-05-08T16:10:48.901+ 7f2f47634740  1 rgw_d3n:  
rgw_d3n_l1_local_datacache_enabled=0
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug  
2023-05-08T16:10:48.901+ 7f2f47634740  1 D3N datacache enabled: 0
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug  
2023-05-08T16:10:48.917+ 7f2f47634740  0 rgw main: ERROR: could  
not find zone (fl2site2)
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug  
2023-05-08T16:10:48.917+ 7f2f47634740  0 rgw main: ERROR: failed  
to start notify service>
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug  
2023-05-08T16:10:48.917+ 7f2f47634740  0 rgw main: ERROR: failed  
to init services (ret=(>
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug  
2023-05-08T16:10:48.921+ 7f2f47634740 -1 Couldn't init storage  
provider (RADOS)
May 08 16:10:49 fl31ca104ja0201 systemd[1]:  
ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@rgw.default.default.fl31ca104ja0201.ninovs.service: Main  
pr>
May 08 16:10:49 fl31ca104ja0201 systemd[1]:  
ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@rgw.default.default.fl31ca104ja0201.ninovs.service:  
Failed


Here is the current configuration

root@fl31ca104ja0201:/# radosgw-admin period get
{
"id": "729f7cef-6340-4750-b3ae-9164177c0df3",
"epoch": 1,
"predecessor_uuid": "1d124ad5-57b0-41de-8def-823bd40f72aa",
"sync_status": [],
"period_map": {
"id": "729f7cef-6340-4750-b3ae-9164177c0df3",
"zonegroups": [
{
"id": "21b8306c-be43-4567-a0c5-74ab69937535",
"name": "default",
"api_name": "default",
"is_master": "true",
"endpoints": [],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "ff468492-8a14-414b-a8cf-7c20ab699af3",
"zones": [
{
"id": "ff468492-8a14-414b-a8cf-7c20ab699af3",
"name": "default",
"endpoints": [],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 11,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": [],
"storage_classes": [
"STANDARD"
]
}
],
"default_placement": "default-placement",
"realm_id": "8999e04c-17a4-4150-8845-cecd6672d312",
"sync_policy": {
"groups": []
}
}
],
"short_zone_ids": [
{
"key": "ff468492-8a14-414b-a8cf-7c20ab699af3",
"val": 3883136521
}
]
},
"master_zonegroup": "21b8306c-be43-4567-a0c5-74ab69937535",
"master_zone": "ff468492-8a14-414b-a8cf-7c20ab699af3",
"period_config": {
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"us

[ceph-users] Re: rgw service fails to start with zone not found

2023-05-08 Thread Danny Webb
are the old multisite conf values still in ceph.conf (eg, rgw_zonegroup, 
rgw_zone, rgw_realm)?

From: Adiga, Anantha 
Sent: 08 May 2023 18:27
To: ceph-users@ceph.io 
Subject: [ceph-users] rgw service fails to start with zone not found

CAUTION: This email originates from outside THG

Hi,

An existing multisite configuration was removed.  But the radosgw services 
still see the old zone name and fail to start.

journalctl -u 
ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@rgw.default.default.fl31ca104ja0201.ninovs
...
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.897+ 7f2f47634740  0 deferred set uid:gid to 167:167 
(ceph:ceph)
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.897+ 7f2f47634740  0 ceph version 17.2.6 
(d7ff0d10654d2280e08f1ab989>
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.897+ 7f2f47634740  0 framework: beast
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.897+ 7f2f47634740  0 framework conf key: port, val: 8080
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.897+ 7f2f47634740  1 radosgw_Main not setting numa 
affinity
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.901+ 7f2f47634740  1 rgw_d3n: 
rgw_d3n_l1_local_datacache_enabled=0
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.901+ 7f2f47634740  1 D3N datacache enabled: 0
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.917+ 7f2f47634740  0 rgw main: ERROR: could not find 
zone (fl2site2)
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.917+ 7f2f47634740  0 rgw main: ERROR: failed to start 
notify service>
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.917+ 7f2f47634740  0 rgw main: ERROR: failed to init 
services (ret=(>
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.921+ 7f2f47634740 -1 Couldn't init storage provider 
(RADOS)
May 08 16:10:49 fl31ca104ja0201 systemd[1]: 
ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@rgw.default.default.fl31ca104ja0201.ninovs.service:
 Main pr>
May 08 16:10:49 fl31ca104ja0201 systemd[1]: 
ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@rgw.default.default.fl31ca104ja0201.ninovs.service:
 Failed

Here is the current configuration

root@fl31ca104ja0201:/# radosgw-admin period get
{
"id": "729f7cef-6340-4750-b3ae-9164177c0df3",
"epoch": 1,
"predecessor_uuid": "1d124ad5-57b0-41de-8def-823bd40f72aa",
"sync_status": [],
"period_map": {
"id": "729f7cef-6340-4750-b3ae-9164177c0df3",
"zonegroups": [
{
"id": "21b8306c-be43-4567-a0c5-74ab69937535",
"name": "default",
"api_name": "default",
"is_master": "true",
"endpoints": [],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "ff468492-8a14-414b-a8cf-7c20ab699af3",
"zones": [
{
"id": "ff468492-8a14-414b-a8cf-7c20ab699af3",
"name": "default",
"endpoints": [],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 11,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": [],
"storage_classes": [
"STANDARD"
]
}
],
"default_placement": "default-placement",
"realm_id": "8999e04c-17a4-4150-8845-cecd6672d312",
"sync_policy": {
"groups": []
}
}
],
"short_zone_ids": [
{
"key": "ff468492-8a14-414b-a8cf-7c20ab699af3",
"val": 3883136521
}
]
},
"master_zonegroup": "21b8306c-be43-4567-a0c5-74ab69937535",
"master_zone": "ff468492-8a14-414b-a8cf-7c20ab699af3",
"period_config": {
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_ratelimit": {
"max_read_ops": 0,
  

[ceph-users] rgw service fails to start with zone not found

2023-05-08 Thread Adiga, Anantha
Hi,

An existing multisite configuration was removed.  But the radosgw services 
still see the old zone name and fail to start.

journalctl -u 
ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@rgw.default.default.fl31ca104ja0201.ninovs
...
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.897+ 7f2f47634740  0 deferred set uid:gid to 167:167 
(ceph:ceph)
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.897+ 7f2f47634740  0 ceph version 17.2.6 
(d7ff0d10654d2280e08f1ab989>
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.897+ 7f2f47634740  0 framework: beast
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.897+ 7f2f47634740  0 framework conf key: port, val: 8080
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.897+ 7f2f47634740  1 radosgw_Main not setting numa 
affinity
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.901+ 7f2f47634740  1 rgw_d3n: 
rgw_d3n_l1_local_datacache_enabled=0
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.901+ 7f2f47634740  1 D3N datacache enabled: 0
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.917+ 7f2f47634740  0 rgw main: ERROR: could not find 
zone (fl2site2)
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.917+ 7f2f47634740  0 rgw main: ERROR: failed to start 
notify service>
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.917+ 7f2f47634740  0 rgw main: ERROR: failed to init 
services (ret=(>
May 08 16:10:48 fl31ca104ja0201 bash[3964341]: debug 
2023-05-08T16:10:48.921+ 7f2f47634740 -1 Couldn't init storage provider 
(RADOS)
May 08 16:10:49 fl31ca104ja0201 systemd[1]: 
ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@rgw.default.default.fl31ca104ja0201.ninovs.service:
 Main pr>
May 08 16:10:49 fl31ca104ja0201 systemd[1]: 
ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@rgw.default.default.fl31ca104ja0201.ninovs.service:
 Failed

Here is the current configuration

root@fl31ca104ja0201:/# radosgw-admin period get
{
"id": "729f7cef-6340-4750-b3ae-9164177c0df3",
"epoch": 1,
"predecessor_uuid": "1d124ad5-57b0-41de-8def-823bd40f72aa",
"sync_status": [],
"period_map": {
"id": "729f7cef-6340-4750-b3ae-9164177c0df3",
"zonegroups": [
{
"id": "21b8306c-be43-4567-a0c5-74ab69937535",
"name": "default",
"api_name": "default",
"is_master": "true",
"endpoints": [],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "ff468492-8a14-414b-a8cf-7c20ab699af3",
"zones": [
{
"id": "ff468492-8a14-414b-a8cf-7c20ab699af3",
"name": "default",
"endpoints": [],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 11,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": [],
"storage_classes": [
"STANDARD"
]
}
],
"default_placement": "default-placement",
"realm_id": "8999e04c-17a4-4150-8845-cecd6672d312",
"sync_policy": {
"groups": []
}
}
],
"short_zone_ids": [
{
"key": "ff468492-8a14-414b-a8cf-7c20ab699af3",
"val": 3883136521
}
]
},
"master_zonegroup": "21b8306c-be43-4567-a0c5-74ab69937535",
"master_zone": "ff468492-8a14-414b-a8cf-7c20ab699af3",
"period_config": {
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_ratelimit": {
"max_read_ops": 0,
"max_write_ops": 0,
"max_read_bytes": 0,
"max_write_bytes": 0,
"enabled": false
},
"bucket_ratelimit": {
"max_read_ops": 0,
"max_write_ops": 0,
"max_read_bytes": 0,
"max_write_bytes": 0,
"enabled"

[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-08 Thread Casey Bodley
On Sun, May 7, 2023 at 5:25 PM Yuri Weinstein  wrote:
>
> All PRs were cherry-picked and the new RC1 build is:
>
> https://shaman.ceph.com/builds/ceph/pacific-release/8f93a58b82b94b6c9ac48277cc15bd48d4c0a902/
>
> Rados, fs and rgw were rerun and results are summarized here:
> https://tracker.ceph.com/issues/59542#note-1
>
> Seeking final approvals:
>
> rados - Radek
> fs - Venky
> rgw - Casey

rgw approved, thanks

>
> On Fri, May 5, 2023 at 8:27 AM Yuri Weinstein  wrote:
> >
> > I got verbal approvals for the listed PRs:
> >
> > https://github.com/ceph/ceph/pull/51232 -- Venky approved
> > https://github.com/ceph/ceph/pull/51344  -- Venky approved
> > https://github.com/ceph/ceph/pull/51200 -- Casey approved
> > https://github.com/ceph/ceph/pull/50894  -- Radek approved
> >
> > Suites rados and fs will need to be retested on updates pacific-release 
> > branch.
> >
> >
> > On Thu, May 4, 2023 at 9:13 AM Yuri Weinstein  wrote:
> > >
> > > In summary:
> > >
> > > Release Notes:  https://github.com/ceph/ceph/pull/51301
> > >
> > > We plan to finish this release next week and we have the following PRs
> > > planned to be added:
> > >
> > > https://github.com/ceph/ceph/pull/51232 -- Venky approved
> > > https://github.com/ceph/ceph/pull/51344  -- Venky in progress
> > > https://github.com/ceph/ceph/pull/51200 -- Casey approved
> > > https://github.com/ceph/ceph/pull/50894  -- Radek in progress
> > >
> > > As soon as these PRs are finalized, I will cherry-pick them and
> > > rebuild "pacific-release" and rerun appropriate suites.
> > >
> > > On Thu, May 4, 2023 at 9:07 AM Radoslaw Zarzynski  
> > > wrote:
> > > >
> > > > If we get some time, I would like to include:
> > > >
> > > >   https://github.com/ceph/ceph/pull/50894.
> > > >
> > > > Regards,
> > > > Radek
> > > >
> > > > On Thu, May 4, 2023 at 5:56 PM Venky Shankar  
> > > > wrote:
> > > > >
> > > > > Hi Yuri,
> > > > >
> > > > > On Wed, May 3, 2023 at 7:10 PM Venky Shankar  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, May 2, 2023 at 8:25 PM Yuri Weinstein  
> > > > > > wrote:
> > > > > > >
> > > > > > > Venky, I did plan to cherry-pick this PR if you approve this 
> > > > > > > (this PR
> > > > > > > was used for a rerun)
> > > > > >
> > > > > > OK. The fs suite failure is being looked into
> > > > > > (https://tracker.ceph.com/issues/59626).
> > > > >
> > > > > Fix is being tracked by
> > > > >
> > > > > https://github.com/ceph/ceph/pull/51344
> > > > >
> > > > > Once ready, it needs to be included in 16.2.13 and would require a fs
> > > > > suite re-run (although re-renning the failed tests should suffice,
> > > > > however, I'm a bit inclined in putting it through the fs suite).
> > > > >
> > > > > >
> > > > > > >
> > > > > > > On Tue, May 2, 2023 at 7:51 AM Venky Shankar 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hi Yuri,
> > > > > > > >
> > > > > > > > On Fri, Apr 28, 2023 at 2:53 AM Yuri Weinstein 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > Details of this release are summarized here:
> > > > > > > > >
> > > > > > > > > https://tracker.ceph.com/issues/59542#note-1
> > > > > > > > > Release Notes - TBD
> > > > > > > > >
> > > > > > > > > Seeking approvals for:
> > > > > > > > >
> > > > > > > > > smoke - Radek, Laura
> > > > > > > > > rados - Radek, Laura
> > > > > > > > >   rook - Sébastien Han
> > > > > > > > >   cephadm - Adam K
> > > > > > > > >   dashboard - Ernesto
> > > > > > > > >
> > > > > > > > > rgw - Casey
> > > > > > > > > rbd - Ilya
> > > > > > > > > krbd - Ilya
> > > > > > > > > fs - Venky, Patrick
> > > > > > > >
> > > > > > > > There are a couple of new failures which are qa/test related - 
> > > > > > > > I'll
> > > > > > > > have a look at those (they _do not_ look serious).
> > > > > > > >
> > > > > > > > Also, Yuri, do you plan to merge
> > > > > > > >
> > > > > > > > https://github.com/ceph/ceph/pull/51232
> > > > > > > >
> > > > > > > > into the pacific-release branch although it's tagged with one 
> > > > > > > > of your
> > > > > > > > other pacific runs?
> > > > > > > >
> > > > > > > > > upgrade/octopus-x (pacific) - Laura (look the same as in 
> > > > > > > > > 16.2.8)
> > > > > > > > > upgrade/pacific-p2p - Laura
> > > > > > > > > powercycle - Brad (SELinux denials)
> > > > > > > > > ceph-volume - Guillaume, Adam K
> > > > > > > > >
> > > > > > > > > Thx
> > > > > > > > > YuriW
> > > > > > > > > ___
> > > > > > > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > > > > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Cheers,
> > > > > > > > Venky
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Cheers,
> > > > > > Venky
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Cheers,
> > > > > Venky
> > > > > ___
> > > > > Dev mailing list -- d...@ceph.io
> >

[ceph-users] Re: non root deploy ceph 17.2.5 failed

2023-05-08 Thread Eugen Block

Hi,

could you provide some more details about your host OS? Which cephadm  
version is it? I was able to bootstrap a one-node cluster with both  
17.2.5 and 17.2.6 with a non-root user with no such error on openSUSE  
Leap 15.4:


quincy:~ # rpm -qa | grep cephadm
cephadm-17.2.6.248+gad656d572cb-lp154.2.1.noarch

deployer@quincy:~> sudo cephadm --image quay.io/ceph/ceph:v17.2.5  
bootstrap --mon-ip 172.17.2.3 --skip-monitoring-stack --ssh-user  
deployer --single-host-defaults

Verifying ssh connectivity ...
Adding key to deployer@localhost authorized_keys...
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman (/usr/bin/podman) version 4.4.4 is present
[...]
Ceph version: ceph version 17.2.5  
(98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)

Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...
mon is available
[...]
Adding key to deployer@localhost authorized_keys...
Adding host quincy...
Deploying mon service with default placement...
Deploying mgr service with default placement...
[...]
Bootstrap complete.

Zitat von Ben :


Hi,

with following command:

sudo cephadm  --docker bootstrap --mon-ip 10.1.32.33 --skip-monitoring-stack
  --ssh-user deployer
the user deployer has passwordless sudo configuration.
I can see the error below:

debug 2023-05-04T12:46:43.268+ 7fc5ddc2e700  0 [cephadm ERROR
cephadm.ssh] Unable to write
szhyf-xx1d002-hx15w:/var/lib/ceph/ad3a132e-e9ee-11ed-8a19-043f72fb8bf9/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e:
scp:
/tmp/var/lib/ceph/ad3a132e-e9ee-11ed-8a19-043f72fb8bf9/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e.new:
Permission denied

Traceback (most recent call last):

  File "/usr/share/ceph/mgr/cephadm/ssh.py", line 222, in _write_remote_file

await asyncssh.scp(f.name, (conn, tmp_path))

  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in scp

await source.run(srcpath)

  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 458, in run

self.handle_error(exc)

  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in
handle_error

raise exc from None

  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in run

await self._send_files(path, b'')

  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in
_send_files

self.handle_error(exc)

  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in
handle_error

raise exc from None

  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in
_send_files

await self._send_file(srcpath, dstpath, attrs)

  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 365, in
_send_file

await self._make_cd_request(b'C', attrs, size, srcpath)

  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 343, in
_make_cd_request

self._fs.basename(path))

  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 224, in
make_request

raise exc

Any ideas on this?

Thanks,
Ben
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Host offline after performing dnf upgrade on RHEL 8.7 host

2023-05-08 Thread Mevludin Blazevic

Ok, the hosts seems to be online again, but it took quite a long time..

Am 08.05.2023 um 13:22 schrieb Mevludin Blazevic:

Hi all,

after I performed a minor RHEL  package upgrade (8.7 -> 8.7) in one of 
our Ceph hosts, I get a Ceph warning describing that cephadm "Can't 
communicate with remote host `...`, possibly because python3 is not 
installed there: [Errno 12] Cannot allocate memory, although Python3 
is installed. However, I figured out that these packages were updated:


Installing:
 kernel 4.18.0-425.19.2.el8_7
 kernel-core 4.18.0-425.19.2.el8_7
 kernel-modules 4.18.0-425.19.2.el8_7
Upgrading:
 NetworkManager 1:1.40.0-6.el8_7
 NetworkManager-config-server   1:1.40.0-6.el8_7
 NetworkManager-initscripts-updown  1:1.40.0-6.el8_7
 NetworkManager-libnm   1:1.40.0-6.el8_7
 NetworkManager-team    1:1.40.0-6.el8_7
 NetworkManager-tui 1:1.40.0-6.el8_7
 authselect 1.2.5-2.el8_7
 authselect-compat 1.2.5-2.el8_7
 authselect-libs 1.2.5-2.el8_7
 bpftool 4.18.0-425.19.2.el8_7
 buildah 1:1.27.3-1.module+el8.7.0+17824+66a0202b
 cockpit-podman 53-1.module+el8.7.0+17824+66a0202b
 conmon 3:2.1.4-1.module+el8.7.0+17824+66a0202b
 container-selinux 2:2.189.0-1.module+el8.7.0+17824+66a0202b
 containernetworking-plugins 1:1.1.1-3.module+el8.7.0+17824+66a0202b
 containers-common 2:1-46.module+el8.7.0+17824+66a0202b
 criu 3.15-3.module+el8.7.0+17824+66a0202b
 curl   7.61.1-25.el8_7.3
 dbus   1:1.12.8-23.el8_7.1
 dbus-common    1:1.12.8-23.el8_7.1
 dbus-daemon    1:1.12.8-23.el8_7.1
 dbus-libs  1:1.12.8-23.el8_7.1
 dbus-tools 1:1.12.8-23.el8_7.1
 device-mapper-multipath 0.8.4-28.el8_7.3
 device-mapper-multipath-libs 0.8.4-28.el8_7.3
 dhcp-client 12:4.3.6-48.el8_7.1
 dhcp-common 12:4.3.6-48.el8_7.1
 dhcp-libs 12:4.3.6-48.el8_7.1
 dracut 049-218.git20221019.el8_7
 dracut-config-rescue 049-218.git20221019.el8_7
 dracut-network 049-218.git20221019.el8_7
 dracut-squash 049-218.git20221019.el8_7
 emacs-filesystem 1:26.1-7.el8_7.1
 epel-release 8-19.el8
 expat 2.2.5-10.el8_7.1
 fuse-overlayfs 1.9-1.module+el8.7.0+17824+66a0202b
 gnutls 3.6.16-6.el8_7
 grub2-common 1:2.02-142.el8_7.3
 grub2-pc 1:2.02-142.el8_7.3
 grub2-pc-modules 1:2.02-142.el8_7.3
 grub2-tools 1:2.02-142.el8_7.3
 grub2-tools-efi 1:2.02-142.el8_7.3
 grub2-tools-extra 1:2.02-142.el8_7.3
 grub2-tools-minimal 1:2.02-142.el8_7.3
 insights-client 3.1.7-9.el8_7
 iptables 1.8.4-23.el8_7.1
 iptables-ebtables 1.8.4-23.el8_7.1
 iptables-libs 1.8.4-23.el8_7.1
 iwl100-firmware 39.31.5.1-111.el8_7.1
 iwl1000-firmware 1:39.31.5.1-111.el8_7.1
 iwl105-firmware 18.168.6.1-111.el8_7.1
 iwl135-firmware 18.168.6.1-111.el8_7.1
 iwl2000-firmware 18.168.6.1-111.el8_7.1
 iwl2030-firmware 18.168.6.1-111.el8_7.1
 iwl3160-firmware 1:25.30.13.0-111.el8_7.1
 iwl5000-firmware 8.83.5.1_1-111.el8_7.1
 iwl5150-firmware 8.24.2.2-111.el8_7.1
 iwl6000-firmware 9.221.4.1-111.el8_7.1
 iwl6000g2a-firmware 18.168.6.1-111.el8_7.1
 iwl6000g2b-firmware 18.168.6.1-111.el8_7.1
 iwl6050-firmware 41.28.5.1-111.el8_7.1
 iwl7260-firmware 1:25.30.13.0-111.el8_7.1
 kernel-tools 4.18.0-425.19.2.el8_7
 kernel-tools-libs 4.18.0-425.19.2.el8_7
 kmod-kvdo 6.2.7.17-88.el8_7
 kpartx 0.8.4-28.el8_7.3
 libblkid 2.32.1-39.el8_7
 libcurl 7.61.1-25.el8_7.3
 libertas-usb8388-firmware 2:20220726-111.git150864a4.el8_7
 libfdisk 2.32.1-39.el8_7
 libgcc 8.5.0-16.el8_7
 libgomp 8.5.0-16.el8_7
 libipa_hbac 2.7.3-4.el8_7.3
 libksba 1.3.5-9.el8_7
 libmount 2.32.1-39.el8_7
 libnfsidmap 1:2.3.3-57.el8_7.1
 libslirp 4.4.0-1.module+el8.7.0+17824+66a0202b
 libsmartcols 2.32.1-39.el8_7
 libsmbclient 4.16.4-6.el8_7
 libsolv 0.7.20-4.el8_7
 libsss_autofs 2.7.3-4.el8_7.3
 libsss_certmap 2.7.3-4.el8_7.3
 libsss_idmap 2.7.3-4.el8_7.3
 libsss_nss_idmap 2.7.3-4.el8_7.3
 libsss_sudo 2.7.3-4.el8_7.3
 libstdc++ 8.5.0-16.el8_7
 libtasn1 4.13-4.el8_7
 libuuid 2.32.1-39.el8_7
 libwbclient 4.16.4-6.el8_7
 libxml2 2.9.7-15.el8_7.1
 linux-firmware 20220726-111.git150864a4.el8_7
 nss 3.79.0-11.el8_7
 nss-softokn 3.79.0-11.el8_7
 nss-softokn-freebl 3.79.0-11.el8_7
 nss-sysinit 3.79.0-11.el8_7
 nss-util 3.79.0-11.el8_7
 openssh 8.0p1-17.el8_7
 openssh-clients 8.0p1-17.el8_7
 openssh-server 8.0p1-17.el8_7
 openssl 1:1.1.1k-9.el8_7
 openssl-libs 1:1.1.1k-9.el8_7
 platform-python 3.6.8-48.el8_7.1
 platform-python-setuptools 39.2.0-6.el8_7.1
 podman 3:4.2.0-8.module+el8.7.0+17824+66a0202b
 podman-catatonit 3:4.2.0-8.module+el8.7.0+17824+66a0202b
 python3-libs 3.6.8-48.el8_7.1
 python3-libxml2 2.9.7-15.el8_7.1
 python3-perf 4.18.0-425.19.2.el8_7
 python3-setuptools 39.2.0-6.el8_7.1
 python3-setuptools-wheel 39.2.0-6.el8_7.1
 python3-sssdconfig 2.7.3-4.el8_7.3
 rhc 1:0.2.1-12.el8_7
 rsync 3.1.3-19.el8_7.1
 runc 1:1.1.4-1.module+el8.7.0+17824+66a0202b
 samba-client-libs 4.16.4-6.el8_7
 samba-common 4.16.4-6.el8_7
 samba-common-libs 4.16.4-6.el8_7
 selinux-policy 3.14.3-108.el8_7.2
 se

[ceph-users] Ceph Host offline after performing dnf upgrade on RHEL 8.7 host

2023-05-08 Thread Mevludin Blazevic

Hi all,

after I performed a minor RHEL  package upgrade (8.7 -> 8.7) in one of 
our Ceph hosts, I get a Ceph warning describing that cephadm "Can't 
communicate with remote host `...`, possibly because python3 is not 
installed there: [Errno 12] Cannot allocate memory, although Python3 is 
installed. However, I figured out that these packages were updated:


Installing:
 kernel 4.18.0-425.19.2.el8_7
 kernel-core 4.18.0-425.19.2.el8_7
 kernel-modules 4.18.0-425.19.2.el8_7
Upgrading:
 NetworkManager 1:1.40.0-6.el8_7
 NetworkManager-config-server   1:1.40.0-6.el8_7
 NetworkManager-initscripts-updown  1:1.40.0-6.el8_7
 NetworkManager-libnm   1:1.40.0-6.el8_7
 NetworkManager-team    1:1.40.0-6.el8_7
 NetworkManager-tui 1:1.40.0-6.el8_7
 authselect 1.2.5-2.el8_7
 authselect-compat 1.2.5-2.el8_7
 authselect-libs 1.2.5-2.el8_7
 bpftool 4.18.0-425.19.2.el8_7
 buildah 1:1.27.3-1.module+el8.7.0+17824+66a0202b
 cockpit-podman 53-1.module+el8.7.0+17824+66a0202b
 conmon 3:2.1.4-1.module+el8.7.0+17824+66a0202b
 container-selinux 2:2.189.0-1.module+el8.7.0+17824+66a0202b
 containernetworking-plugins 1:1.1.1-3.module+el8.7.0+17824+66a0202b
 containers-common 2:1-46.module+el8.7.0+17824+66a0202b
 criu 3.15-3.module+el8.7.0+17824+66a0202b
 curl   7.61.1-25.el8_7.3
 dbus   1:1.12.8-23.el8_7.1
 dbus-common    1:1.12.8-23.el8_7.1
 dbus-daemon    1:1.12.8-23.el8_7.1
 dbus-libs  1:1.12.8-23.el8_7.1
 dbus-tools 1:1.12.8-23.el8_7.1
 device-mapper-multipath 0.8.4-28.el8_7.3
 device-mapper-multipath-libs 0.8.4-28.el8_7.3
 dhcp-client 12:4.3.6-48.el8_7.1
 dhcp-common 12:4.3.6-48.el8_7.1
 dhcp-libs 12:4.3.6-48.el8_7.1
 dracut 049-218.git20221019.el8_7
 dracut-config-rescue 049-218.git20221019.el8_7
 dracut-network 049-218.git20221019.el8_7
 dracut-squash 049-218.git20221019.el8_7
 emacs-filesystem 1:26.1-7.el8_7.1
 epel-release 8-19.el8
 expat 2.2.5-10.el8_7.1
 fuse-overlayfs 1.9-1.module+el8.7.0+17824+66a0202b
 gnutls 3.6.16-6.el8_7
 grub2-common 1:2.02-142.el8_7.3
 grub2-pc 1:2.02-142.el8_7.3
 grub2-pc-modules 1:2.02-142.el8_7.3
 grub2-tools 1:2.02-142.el8_7.3
 grub2-tools-efi 1:2.02-142.el8_7.3
 grub2-tools-extra 1:2.02-142.el8_7.3
 grub2-tools-minimal 1:2.02-142.el8_7.3
 insights-client 3.1.7-9.el8_7
 iptables 1.8.4-23.el8_7.1
 iptables-ebtables 1.8.4-23.el8_7.1
 iptables-libs 1.8.4-23.el8_7.1
 iwl100-firmware 39.31.5.1-111.el8_7.1
 iwl1000-firmware 1:39.31.5.1-111.el8_7.1
 iwl105-firmware 18.168.6.1-111.el8_7.1
 iwl135-firmware 18.168.6.1-111.el8_7.1
 iwl2000-firmware 18.168.6.1-111.el8_7.1
 iwl2030-firmware 18.168.6.1-111.el8_7.1
 iwl3160-firmware 1:25.30.13.0-111.el8_7.1
 iwl5000-firmware 8.83.5.1_1-111.el8_7.1
 iwl5150-firmware 8.24.2.2-111.el8_7.1
 iwl6000-firmware 9.221.4.1-111.el8_7.1
 iwl6000g2a-firmware 18.168.6.1-111.el8_7.1
 iwl6000g2b-firmware 18.168.6.1-111.el8_7.1
 iwl6050-firmware 41.28.5.1-111.el8_7.1
 iwl7260-firmware 1:25.30.13.0-111.el8_7.1
 kernel-tools 4.18.0-425.19.2.el8_7
 kernel-tools-libs 4.18.0-425.19.2.el8_7
 kmod-kvdo 6.2.7.17-88.el8_7
 kpartx 0.8.4-28.el8_7.3
 libblkid 2.32.1-39.el8_7
 libcurl 7.61.1-25.el8_7.3
 libertas-usb8388-firmware 2:20220726-111.git150864a4.el8_7
 libfdisk 2.32.1-39.el8_7
 libgcc 8.5.0-16.el8_7
 libgomp 8.5.0-16.el8_7
 libipa_hbac 2.7.3-4.el8_7.3
 libksba 1.3.5-9.el8_7
 libmount 2.32.1-39.el8_7
 libnfsidmap 1:2.3.3-57.el8_7.1
 libslirp 4.4.0-1.module+el8.7.0+17824+66a0202b
 libsmartcols 2.32.1-39.el8_7
 libsmbclient 4.16.4-6.el8_7
 libsolv 0.7.20-4.el8_7
 libsss_autofs 2.7.3-4.el8_7.3
 libsss_certmap 2.7.3-4.el8_7.3
 libsss_idmap 2.7.3-4.el8_7.3
 libsss_nss_idmap 2.7.3-4.el8_7.3
 libsss_sudo 2.7.3-4.el8_7.3
 libstdc++ 8.5.0-16.el8_7
 libtasn1 4.13-4.el8_7
 libuuid 2.32.1-39.el8_7
 libwbclient 4.16.4-6.el8_7
 libxml2 2.9.7-15.el8_7.1
 linux-firmware 20220726-111.git150864a4.el8_7
 nss 3.79.0-11.el8_7
 nss-softokn 3.79.0-11.el8_7
 nss-softokn-freebl 3.79.0-11.el8_7
 nss-sysinit 3.79.0-11.el8_7
 nss-util 3.79.0-11.el8_7
 openssh 8.0p1-17.el8_7
 openssh-clients 8.0p1-17.el8_7
 openssh-server 8.0p1-17.el8_7
 openssl 1:1.1.1k-9.el8_7
 openssl-libs 1:1.1.1k-9.el8_7
 platform-python 3.6.8-48.el8_7.1
 platform-python-setuptools 39.2.0-6.el8_7.1
 podman 3:4.2.0-8.module+el8.7.0+17824+66a0202b
 podman-catatonit 3:4.2.0-8.module+el8.7.0+17824+66a0202b
 python3-libs 3.6.8-48.el8_7.1
 python3-libxml2 2.9.7-15.el8_7.1
 python3-perf 4.18.0-425.19.2.el8_7
 python3-setuptools 39.2.0-6.el8_7.1
 python3-setuptools-wheel 39.2.0-6.el8_7.1
 python3-sssdconfig 2.7.3-4.el8_7.3
 rhc 1:0.2.1-12.el8_7
 rsync 3.1.3-19.el8_7.1
 runc 1:1.1.4-1.module+el8.7.0+17824+66a0202b
 samba-client-libs 4.16.4-6.el8_7
 samba-common 4.16.4-6.el8_7
 samba-common-libs 4.16.4-6.el8_7
 selinux-policy 3.14.3-108.el8_7.2
 selinux-policy-targeted 3.14.3-108.el8_7.2
 slirp4netns 1.2.0-2.module+el8.7.0+17824+66a0202b
 sos 4.5.1-3.el8
 sqlite 3.26.0-

[ceph-users] Os changed to Ubuntu, device class not shown

2023-05-08 Thread Szabo, Istvan (Agoda)
Hi,

We have an octopus cluster where we want to move from centos to Ubuntu, after 
activate all the osd, class is not shown in ceph osd tree.
However ceph-volume list shows the crush device class :/

Should I just add it or?



This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-08 Thread Venky Shankar
On Mon, May 8, 2023 at 2:54 AM Yuri Weinstein  wrote:
>
> All PRs were cherry-picked and the new RC1 build is:
>
> https://shaman.ceph.com/builds/ceph/pacific-release/8f93a58b82b94b6c9ac48277cc15bd48d4c0a902/
>
> Rados, fs and rgw were rerun and results are summarized here:
> https://tracker.ceph.com/issues/59542#note-1
>
> Seeking final approvals:
>
> rados - Radek
> fs - Venky

fs approved.

> rgw - Casey
>
> On Fri, May 5, 2023 at 8:27 AM Yuri Weinstein  wrote:
> >
> > I got verbal approvals for the listed PRs:
> >
> > https://github.com/ceph/ceph/pull/51232 -- Venky approved
> > https://github.com/ceph/ceph/pull/51344  -- Venky approved
> > https://github.com/ceph/ceph/pull/51200 -- Casey approved
> > https://github.com/ceph/ceph/pull/50894  -- Radek approved
> >
> > Suites rados and fs will need to be retested on updates pacific-release 
> > branch.
> >
> >
> > On Thu, May 4, 2023 at 9:13 AM Yuri Weinstein  wrote:
> > >
> > > In summary:
> > >
> > > Release Notes:  https://github.com/ceph/ceph/pull/51301
> > >
> > > We plan to finish this release next week and we have the following PRs
> > > planned to be added:
> > >
> > > https://github.com/ceph/ceph/pull/51232 -- Venky approved
> > > https://github.com/ceph/ceph/pull/51344  -- Venky in progress
> > > https://github.com/ceph/ceph/pull/51200 -- Casey approved
> > > https://github.com/ceph/ceph/pull/50894  -- Radek in progress
> > >
> > > As soon as these PRs are finalized, I will cherry-pick them and
> > > rebuild "pacific-release" and rerun appropriate suites.
> > >
> > > On Thu, May 4, 2023 at 9:07 AM Radoslaw Zarzynski  
> > > wrote:
> > > >
> > > > If we get some time, I would like to include:
> > > >
> > > >   https://github.com/ceph/ceph/pull/50894.
> > > >
> > > > Regards,
> > > > Radek
> > > >
> > > > On Thu, May 4, 2023 at 5:56 PM Venky Shankar  
> > > > wrote:
> > > > >
> > > > > Hi Yuri,
> > > > >
> > > > > On Wed, May 3, 2023 at 7:10 PM Venky Shankar  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, May 2, 2023 at 8:25 PM Yuri Weinstein  
> > > > > > wrote:
> > > > > > >
> > > > > > > Venky, I did plan to cherry-pick this PR if you approve this 
> > > > > > > (this PR
> > > > > > > was used for a rerun)
> > > > > >
> > > > > > OK. The fs suite failure is being looked into
> > > > > > (https://tracker.ceph.com/issues/59626).
> > > > >
> > > > > Fix is being tracked by
> > > > >
> > > > > https://github.com/ceph/ceph/pull/51344
> > > > >
> > > > > Once ready, it needs to be included in 16.2.13 and would require a fs
> > > > > suite re-run (although re-renning the failed tests should suffice,
> > > > > however, I'm a bit inclined in putting it through the fs suite).
> > > > >
> > > > > >
> > > > > > >
> > > > > > > On Tue, May 2, 2023 at 7:51 AM Venky Shankar 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hi Yuri,
> > > > > > > >
> > > > > > > > On Fri, Apr 28, 2023 at 2:53 AM Yuri Weinstein 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > Details of this release are summarized here:
> > > > > > > > >
> > > > > > > > > https://tracker.ceph.com/issues/59542#note-1
> > > > > > > > > Release Notes - TBD
> > > > > > > > >
> > > > > > > > > Seeking approvals for:
> > > > > > > > >
> > > > > > > > > smoke - Radek, Laura
> > > > > > > > > rados - Radek, Laura
> > > > > > > > >   rook - Sébastien Han
> > > > > > > > >   cephadm - Adam K
> > > > > > > > >   dashboard - Ernesto
> > > > > > > > >
> > > > > > > > > rgw - Casey
> > > > > > > > > rbd - Ilya
> > > > > > > > > krbd - Ilya
> > > > > > > > > fs - Venky, Patrick
> > > > > > > >
> > > > > > > > There are a couple of new failures which are qa/test related - 
> > > > > > > > I'll
> > > > > > > > have a look at those (they _do not_ look serious).
> > > > > > > >
> > > > > > > > Also, Yuri, do you plan to merge
> > > > > > > >
> > > > > > > > https://github.com/ceph/ceph/pull/51232
> > > > > > > >
> > > > > > > > into the pacific-release branch although it's tagged with one 
> > > > > > > > of your
> > > > > > > > other pacific runs?
> > > > > > > >
> > > > > > > > > upgrade/octopus-x (pacific) - Laura (look the same as in 
> > > > > > > > > 16.2.8)
> > > > > > > > > upgrade/pacific-p2p - Laura
> > > > > > > > > powercycle - Brad (SELinux denials)
> > > > > > > > > ceph-volume - Guillaume, Adam K
> > > > > > > > >
> > > > > > > > > Thx
> > > > > > > > > YuriW
> > > > > > > > > ___
> > > > > > > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > > > > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Cheers,
> > > > > > > > Venky
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Cheers,
> > > > > > Venky
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Cheers,
> > > > > Venky
> > > > > ___
> > > > > Dev mailing list -- d...@ceph.io
> > > > > T

[ceph-users] non root deploy ceph 17.2.5 failed

2023-05-08 Thread Ben
Hi,

with following command:

sudo cephadm  --docker bootstrap --mon-ip 10.1.32.33 --skip-monitoring-stack
  --ssh-user deployer
the user deployer has passwordless sudo configuration.
I can see the error below:

debug 2023-05-04T12:46:43.268+ 7fc5ddc2e700  0 [cephadm ERROR
cephadm.ssh] Unable to write
szhyf-xx1d002-hx15w:/var/lib/ceph/ad3a132e-e9ee-11ed-8a19-043f72fb8bf9/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e:
scp:
/tmp/var/lib/ceph/ad3a132e-e9ee-11ed-8a19-043f72fb8bf9/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e.new:
Permission denied

Traceback (most recent call last):

  File "/usr/share/ceph/mgr/cephadm/ssh.py", line 222, in _write_remote_file

await asyncssh.scp(f.name, (conn, tmp_path))

  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in scp

await source.run(srcpath)

  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 458, in run

self.handle_error(exc)

  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in
handle_error

raise exc from None

  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in run

await self._send_files(path, b'')

  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in
_send_files

self.handle_error(exc)

  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in
handle_error

raise exc from None

  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in
_send_files

await self._send_file(srcpath, dstpath, attrs)

  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 365, in
_send_file

await self._make_cd_request(b'C', attrs, size, srcpath)

  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 343, in
_make_cd_request

self._fs.basename(path))

  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 224, in
make_request

raise exc

Any ideas on this?

Thanks,
Ben
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io