[ceph-users] question about OSD onode hits ratio

2023-08-02 Thread Ben
Hi,
We have a cluster running for a while. From grafana ceph dashboard, I saw
OSD onode hits ratio 92% when cluster was just up and running. After couple
month, it says now 70%. This is not a good trend I think. Just wondering
what should be done to stop this trend.

Many thank,
Ben
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ref v18.2.0 QE Validation status

2023-08-02 Thread Brad Hubbard
On Thu, Aug 3, 2023 at 8:31 AM Yuri Weinstein  wrote:

> Updates:
>
> 1. bookworm distro build support
> We will not build bookworm until Debian bug
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1030129 is resolved
>
> 2. nfs ganesha
> fixed (Thanks Guillaume and Kaleb)
>
> 3. powercycle failures due to teuthology - fixed (Thanks Brad, Zack,
> Dan).  I am expecting Brad to approve it as the SELinux denials is the
> known issue.
>

Confirmed, definite pass, much better.


>
> smoke approved.
>
> Unless I hear any objections I see no open issues and will start building
> (jammy focal centos8 centos9 windows)
>
> On Wed, Aug 2, 2023 at 7:56 AM Yuri Weinstein  wrote:
>
>> https://github.com/ceph/ceph/pull/52710 merged
>>
>> smoke:
>> https://tracker.ceph.com/issues/62227 marked resolved
>> https://tracker.ceph.com/issues/62228 in progress
>>
>> Only smoke and prowercycle items are remaining from the test approval
>> standpoint.
>>
>> Here is the quote from Neha's status to the clt summarizing the correct
>> outstanding issues:
>> "
>> 1. bookworm distro build support
>> action item - Neha and Josh will have a discussion with Dan to figure out
>> what help he needs and if at all there is a workaround for the first reef
>> release
>>
>> 2. nfs ganesha
>> action item - figure out with Guillaume and Kaleb if a workaround of
>> pinning to an older release is possible
>>
>> 3. powercycle failures due to teuthology -
>> https://pulpito.ceph.com/yuriw-2023-07-29_14:04:17-powercycle-reef-release-distro-default-smithi/
>>
>> action item - open a tracker and get Zack's thoughts on whether this
>> issue can be classified as infra-only and get powercycle approvals from Brad
>> "
>>
>> On Wed, Aug 2, 2023 at 3:03 AM Radoslaw Zarzynski 
>> wrote:
>>
>>> Final ACK for RADOS.
>>>
>>> On Tue, Aug 1, 2023 at 6:28 PM Laura Flores  wrote:
>>>
 Rados failures are summarized here:
 https://tracker.ceph.com/projects/rados/wiki/REEF#Reef-v1820

 All are known. Will let Radek give the final ack.

 On Tue, Aug 1, 2023 at 9:05 AM Nizamudeen A  wrote:

> dashboard approved! failure is unrelated and tracked via
> https://tracker.ceph.com/issues/58946
>
> Regards,
> Nizam
>
> On Sun, Jul 30, 2023 at 9:16 PM Yuri Weinstein 
> wrote:
>
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/62231#note-1
> >
> > Seeking approvals/reviews for:
> >
> > smoke - Laura, Radek
> > rados - Neha, Radek, Travis, Ernesto, Adam King
> > rgw - Casey
> > fs - Venky
> > orch - Adam King
> > rbd - Ilya
> > krbd - Ilya
> > upgrade-clients:client-upgrade* - in progress
> > powercycle - Brad
> >
> > Please reply to this email with approval and/or trackers of known
> > issues/PRs to address them.
> >
> > bookworm distro support is an outstanding issue.
> >
> > TIA
> > YuriW
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


 --

 Laura Flores

 She/Her/Hers

 Software Engineer, Ceph Storage 

 Chicago, IL

 lflo...@ibm.com | lflo...@redhat.com 
 M: +17087388804


 ___
 Dev mailing list -- d...@ceph.io
 To unsubscribe send an email to dev-le...@ceph.io

>>> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>


-- 
Cheers,
Brad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ref v18.2.0 QE Validation status

2023-08-02 Thread Yuri Weinstein
Updates:

1. bookworm distro build support
We will not build bookworm until Debian bug
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1030129 is resolved

2. nfs ganesha
fixed (Thanks Guillaume and Kaleb)

3. powercycle failures due to teuthology - fixed (Thanks Brad, Zack, Dan).
I am expecting Brad to approve it as the SELinux denials is the known issue.

smoke approved.

Unless I hear any objections I see no open issues and will start building
(jammy focal centos8 centos9 windows)

On Wed, Aug 2, 2023 at 7:56 AM Yuri Weinstein  wrote:

> https://github.com/ceph/ceph/pull/52710 merged
>
> smoke:
> https://tracker.ceph.com/issues/62227 marked resolved
> https://tracker.ceph.com/issues/62228 in progress
>
> Only smoke and prowercycle items are remaining from the test approval
> standpoint.
>
> Here is the quote from Neha's status to the clt summarizing the correct
> outstanding issues:
> "
> 1. bookworm distro build support
> action item - Neha and Josh will have a discussion with Dan to figure out
> what help he needs and if at all there is a workaround for the first reef
> release
>
> 2. nfs ganesha
> action item - figure out with Guillaume and Kaleb if a workaround of
> pinning to an older release is possible
>
> 3. powercycle failures due to teuthology -
> https://pulpito.ceph.com/yuriw-2023-07-29_14:04:17-powercycle-reef-release-distro-default-smithi/
>
> action item - open a tracker and get Zack's thoughts on whether this issue
> can be classified as infra-only and get powercycle approvals from Brad
> "
>
> On Wed, Aug 2, 2023 at 3:03 AM Radoslaw Zarzynski 
> wrote:
>
>> Final ACK for RADOS.
>>
>> On Tue, Aug 1, 2023 at 6:28 PM Laura Flores  wrote:
>>
>>> Rados failures are summarized here:
>>> https://tracker.ceph.com/projects/rados/wiki/REEF#Reef-v1820
>>>
>>> All are known. Will let Radek give the final ack.
>>>
>>> On Tue, Aug 1, 2023 at 9:05 AM Nizamudeen A  wrote:
>>>
 dashboard approved! failure is unrelated and tracked via
 https://tracker.ceph.com/issues/58946

 Regards,
 Nizam

 On Sun, Jul 30, 2023 at 9:16 PM Yuri Weinstein 
 wrote:

 > Details of this release are summarized here:
 >
 > https://tracker.ceph.com/issues/62231#note-1
 >
 > Seeking approvals/reviews for:
 >
 > smoke - Laura, Radek
 > rados - Neha, Radek, Travis, Ernesto, Adam King
 > rgw - Casey
 > fs - Venky
 > orch - Adam King
 > rbd - Ilya
 > krbd - Ilya
 > upgrade-clients:client-upgrade* - in progress
 > powercycle - Brad
 >
 > Please reply to this email with approval and/or trackers of known
 > issues/PRs to address them.
 >
 > bookworm distro support is an outstanding issue.
 >
 > TIA
 > YuriW
 > ___
 > ceph-users mailing list -- ceph-users@ceph.io
 > To unsubscribe send an email to ceph-users-le...@ceph.io
 >
 >
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io

>>>
>>>
>>> --
>>>
>>> Laura Flores
>>>
>>> She/Her/Hers
>>>
>>> Software Engineer, Ceph Storage 
>>>
>>> Chicago, IL
>>>
>>> lflo...@ibm.com | lflo...@redhat.com 
>>> M: +17087388804
>>>
>>>
>>> ___
>>> Dev mailing list -- d...@ceph.io
>>> To unsubscribe send an email to dev-le...@ceph.io
>>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ref v18.2.0 QE Validation status

2023-08-02 Thread Laura Flores
The second smoke issue was also fixed:
https://tracker.ceph.com/issues/57206

In any case, it is a non-blocker.

On Wed, Aug 2, 2023 at 9:56 AM Yuri Weinstein  wrote:

> https://github.com/ceph/ceph/pull/52710 merged
>
> smoke:
> https://tracker.ceph.com/issues/62227 marked resolved
> https://tracker.ceph.com/issues/62228 in progress
>
> Only smoke and prowercycle items are remaining from the test approval
> standpoint.
>
> Here is the quote from Neha's status to the clt summarizing the correct
> outstanding issues:
> "
> 1. bookworm distro build support
> action item - Neha and Josh will have a discussion with Dan to figure out
> what help he needs and if at all there is a workaround for the first reef
> release
>
> 2. nfs ganesha
> action item - figure out with Guillaume and Kaleb if a workaround of
> pinning to an older release is possible
>
> 3. powercycle failures due to teuthology -
> https://pulpito.ceph.com/yuriw-2023-07-29_14:04:17-powercycle-reef-release-distro-default-smithi/
>
> action item - open a tracker and get Zack's thoughts on whether this issue
> can be classified as infra-only and get powercycle approvals from Brad
> "
>
> On Wed, Aug 2, 2023 at 3:03 AM Radoslaw Zarzynski 
> wrote:
>
>> Final ACK for RADOS.
>>
>> On Tue, Aug 1, 2023 at 6:28 PM Laura Flores  wrote:
>>
>>> Rados failures are summarized here:
>>> https://tracker.ceph.com/projects/rados/wiki/REEF#Reef-v1820
>>>
>>> All are known. Will let Radek give the final ack.
>>>
>>> On Tue, Aug 1, 2023 at 9:05 AM Nizamudeen A  wrote:
>>>
 dashboard approved! failure is unrelated and tracked via
 https://tracker.ceph.com/issues/58946

 Regards,
 Nizam

 On Sun, Jul 30, 2023 at 9:16 PM Yuri Weinstein 
 wrote:

 > Details of this release are summarized here:
 >
 > https://tracker.ceph.com/issues/62231#note-1
 >
 > Seeking approvals/reviews for:
 >
 > smoke - Laura, Radek
 > rados - Neha, Radek, Travis, Ernesto, Adam King
 > rgw - Casey
 > fs - Venky
 > orch - Adam King
 > rbd - Ilya
 > krbd - Ilya
 > upgrade-clients:client-upgrade* - in progress
 > powercycle - Brad
 >
 > Please reply to this email with approval and/or trackers of known
 > issues/PRs to address them.
 >
 > bookworm distro support is an outstanding issue.
 >
 > TIA
 > YuriW
 > ___
 > ceph-users mailing list -- ceph-users@ceph.io
 > To unsubscribe send an email to ceph-users-le...@ceph.io
 >
 >
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io

>>>
>>>
>>> --
>>>
>>> Laura Flores
>>>
>>> She/Her/Hers
>>>
>>> Software Engineer, Ceph Storage 
>>>
>>> Chicago, IL
>>>
>>> lflo...@ibm.com | lflo...@redhat.com 
>>> M: +17087388804
>>>
>>>
>>> ___
>>> Dev mailing list -- d...@ceph.io
>>> To unsubscribe send an email to dev-le...@ceph.io
>>>
>>

-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] mgr services frequently crash on nodes 2,3,4

2023-08-02 Thread Adiga, Anantha
Hi,

Mgr service crash frequently on nodes 2 3 and 4  with the same condition after 
the 4th node was added.

root@zp3110b001a0104:/# ceph crash stat
19 crashes recorded
16 older than 1 days old:
2023-07-29T03:35:32.006309Z_7b622c2b-a2fc-425a-acb8-dc1673b4c189
2023-07-29T03:35:32.055174Z_a2ee1e23-5f41-4dbe-86ff-643fbf870dc9
2023-07-29T14:34:13.752432Z_39b6a0d9-1bc3-4481-9a14-c92fea6c2710
2023-07-30T03:02:57.510867Z_df595e04-0ac2-4e3d-93be-a7225348ea19
2023-07-30T06:20:09.322530Z_0c2485f8-281c-4440-8b08-89b08a669de4
2023-07-30T10:16:46.798405Z_79082f37-ee08-4a2b-84d1-d96c4026f321
2023-07-30T10:16:46.843441Z_788391d6-3278-48c4-a95b-1934ee3265c1
2023-07-31T02:26:55.903966Z_416a1e94-a8e1-4057-a683-a907faf400a1
2023-07-31T04:40:10.216044Z_bef9d811-4e92-45cd-bcd7-3282962c8dfe
2023-07-31T08:44:20.893344Z_037688ae-266f-4879-932c-2239f4679fd6
2023-07-31T09:22:12.527968Z_f136c93b-7156-4176-a734-66a5a62513a4
2023-07-31T15:22:08.417988Z_b80c6255-5eb3-41dd-b0b1-8bc5b070094f
2023-07-31T23:05:16.589501Z_20ed8ef9-a478-49de-a371-08ea7a9937e5
2023-08-01T01:26:01.911387Z_670f9e3c-7fbe-497f-9f0b-abeaefd8f2b3
2023-08-01T01:51:39.759874Z_ff8206e4-34aa-44fe-82ac-7339e6714bb7
2023-08-01T01:56:21.955706Z_98c86cdd-45ec-47dc-8f0c-2e5e09731db8
7 older than 3 days old:
2023-07-29T03:35:32.006309Z_7b622c2b-a2fc-425a-acb8-dc1673b4c189
2023-07-29T03:35:32.055174Z_a2ee1e23-5f41-4dbe-86ff-643fbf870dc9
2023-07-29T14:34:13.752432Z_39b6a0d9-1bc3-4481-9a14-c92fea6c2710
2023-07-30T03:02:57.510867Z_df595e04-0ac2-4e3d-93be-a7225348ea19
2023-07-30T06:20:09.322530Z_0c2485f8-281c-4440-8b08-89b08a669de4
2023-07-30T10:16:46.798405Z_79082f37-ee08-4a2b-84d1-d96c4026f321
2023-07-30T10:16:46.843441Z_788391d6-3278-48c4-a95b-1934ee3265c1

root@zp3110b001a0104:/var/lib/ceph/8dbfcd81-fee3-49d2-ac0c-e988c8be7178/crash/posted/2023-07-31T08:44:20.893344Z_037688ae-266f-4879-932c-2239f4679fd6#
 cat meta
{
"crash_id": 
"2023-07-31T08:44:20.893344Z_037688ae-266f-4879-932c-2239f4679fd6",
"timestamp": "2023-07-31T08:44:20.893344Z",
"process_name": "ceph-mgr",
"entity_name": "mgr.zp3110b001a0104.tmbkzq",
"ceph_version": "16.2.5",
"utsname_hostname": "zp3110b001a0104",
"utsname_sysname": "Linux",
"utsname_release": "5.4.0-153-generic",
"utsname_version": "#170-Ubuntu SMP Fri Jun 16 13:43:31 UTC 2023",
"utsname_machine": "x86_64",
"os_name": "CentOS Linux",
"os_id": "centos",
"os_version_id": "8",
"os_version": "8",
"assert_condition": "pending_service_map.epoch > service_map.epoch",
"assert_func": "DaemonServer::got_service_map()::",
"assert_file": 
"/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.5/rpm/el8/BUILD/ceph-16.2.5/src/mgr/DaemonServer.cc",
"assert_line": 2932,
"assert_thread_name": "ms_dispatch",
"assert_msg": 
"/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.5/rpm/el8/BUILD/ceph-16.2.5/src/mgr/DaemonServer.cc:
 In function 'DaemonServer::got_service_map()::' 
thread 7f127440a700 time 
2023-07-31T08:44:20.887150+\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.5/rpm/el8/BUILD/ceph-16.2.5/src/mgr/DaemonServer.cc:
 2932: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)\n",
"backtrace": [
"/lib64/libpthread.so.0(+0x12b20) [0x7f127c611b20]",
"gsignal()",
"abort()",
"(ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x1a9) [0x7f127da26b75]",
"/usr/lib64/ceph/libceph-common.so.2(+0x276d3e) [0x7f127da26d3e]",
"(DaemonServer::got_service_map()+0xb2d) [0x5625aee23a4d]",
"(Mgr::handle_service_map(boost::intrusive_ptr)+0x1b6) 
[0x5625aee527c6]",
"(Mgr::ms_dispatch2(boost::intrusive_ptr const&)+0x894) 
[0x5625aee55424]",
"(MgrStandby::ms_dispatch2(boost::intrusive_ptr const&)+0xb0) 
[0x5625aee5ec10]",
"(DispatchQueue::entry()+0x126a) [0x7f127dc610ca]",
"(DispatchQueue::DispatchThread::entry()+0x11) [0x7f127dd11591]",
"/lib64/libpthread.so.0(+0x814a) [0x7f127c60714a]",
"clone()"
]
}
root@zp3110b001a0104:/var/lib/ceph/8dbfcd81-fee3-49d2-ac0c-e988c8be7178/crash/posted/2023-07-31T08:44:20.893344Z_037688ae-266f-4879-932c-2239f4679fd6#
 more log
--- begin dump of recent events ---
-> 2023-07-31T08:27:14.084+ 7f126fc01700 10 monclient: 
_send_mon_message to mon.zp3110b001a0104 at v2:XX.XXX.26.4:3300/0
-9998> 

[ceph-users] Re: Disk device path changed - cephadm faild to apply osd service

2023-08-02 Thread Kilian Ries
Ok just tried it it works like expected ... just dump the yaml, edit it and 
apply it again!


Regards ;)


Von: Eugen Block 
Gesendet: Mittwoch, 2. August 2023 12:53:20
An: Kilian Ries
Cc: ceph-users@ceph.io
Betreff: Re: AW: [ceph-users] Re: Disk device path changed - cephadm faild to 
apply osd service

But that could be done easily like this:

service_type: osd
service_id: ssd-db
service_name: osd.ssd-db
placement:
   hosts:
   - storage01
   - storage02
...
spec:
   block_db_size: 64G
   data_devices:
 rotational: 1
   db_devices:
 rotational: 0
   filter_logic: AND
   objectstore: bluestore

Anyway, I would expect that fixing the drivegroup config would fix
your issue, but I'm not sure either.

Zitat von Kilian Ries :

> Yes i need specific device paths because all HDD / SSD are the same
> size / same vendor etc. I group multiple HDDs with an exclusive SSD
> for cacheing, for example:
>
>
> spec:
>
>   data_devices:
>
> paths:
>
> - /dev/sdh
>
> - /dev/sdi
>
> - /dev/sdj
>
> - /dev/sdk
>
> - /dev/sdl
>
>   db_devices:
>
> paths:
>
> - /dev/sdf
>
>   filter_logic: AND
>
>   objectstore: bluestore
>
> 
> Von: Eugen Block 
> Gesendet: Mittwoch, 2. August 2023 08:13:41
> An: ceph-users@ceph.io
> Betreff: [ceph-users] Re: Disk device path changed - cephadm faild
> to apply osd service
>
> Do you really need device paths in your configuration? You could use
> other criteria like disk sizes, vendors, rotational flag etc. If you
> really want device paths you'll probably need to ensure they're
> persistent across reboots via udev rules.
>
> Zitat von Kilian Ries :
>
>> Hi,
>>
>>
>> it seems that after reboot / OS update my disk labels / device paths
>> may have changed. Since then i get an error like this:
>>
>>
>>
>> CEPHADM_APPLY_SPEC_FAIL: Failed to apply 1 service(s): osd.osd-12-22_hdd-2
>>
>>
>>
>> ###
>>
>>
>> RuntimeError: cephadm exited with an error code: 1, stderr:Non-zero
>> exit code 1 from /bin/docker run --rm --ipc=host
>> --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume
>> --privileged --group-add=disk --init -e
>> CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:9e2fd45a080aea67d1935d7d9a9025b6db2e8be9173186e068a79a0da5a54ada
>>  -e NODE_NAME=ceph-osd07.intern -e CEPH_USE_RANDOM_NONCE=1 -e 
>> CEPH_VOLUME_OSDSPEC_AFFINITY=osd-12-22_hdd-2 -e 
>> CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v 
>> /var/run/ceph/01578d80-6c97-46ba-9327-cb2b13980916:/var/run/ceph:z -v 
>> /var/log/ceph/01578d80-6c97-46ba-9327-cb2b13980916:/var/log/ceph:z -v 
>> /var/lib/ceph/01578d80-6c97-46ba-9327-cb2b13980916/crash:/var/lib/ceph/crash:z
>>  -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v 
>> /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v 
>> /tmp/ceph-tmp2cvmr5lf:/etc/ceph/ceph.conf:z -v
>> /tmp/ceph-tmpb38cuw7q:/var/lib/ceph/bootstrap-osd/ceph.keyring:z
>> quay.io/ceph/ceph@sha256:9e2fd45a080aea67d1935d7d9a9
>>  025b6db2e8be9173186e068a79a0da5a54ada lvm batch --no-auto /dev/sdm
>> /dev/sdn /dev/sdo /dev/sdp /dev/sdq --db-devices /dev/sdg --yes
>> --no-systemd
>>
>> /bin/docker: stderr Traceback (most recent call last):
>>
>> /bin/docker: stderr   File "/usr/sbin/ceph-volume", line 11, in 
>>
>> /bin/docker: stderr load_entry_point('ceph-volume==1.0.0',
>> 'console_scripts', 'ceph-volume')()
>>
>> /bin/docker: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41, in
>> __init__
>>
>> /bin/docker: stderr self.main(self.argv)
>>
>> /bin/docker: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line
>> 59, in newfunc
>>
>> /bin/docker: stderr return f(*a, **kw)
>>
>> /bin/docker: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in
>> main
>>
>> /bin/docker: stderr terminal.dispatch(self.mapper, subcommand_args)
>>
>> /bin/docker: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line
>> 194, in dispatch
>>
>> /bin/docker: stderr instance.main()
>>
>> /bin/docker: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py",
>> line 46, in main
>>
>> /bin/docker: stderr terminal.dispatch(self.mapper, self.argv)
>>
>> /bin/docker: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line
>> 192, in dispatch
>>
>> /bin/docker: stderr instance = mapper.get(arg)(argv[count:])
>>
>> /bin/docker: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py",
>> line 348, in __init__
>>
>> /bin/docker: stderr self.args = parser.parse_args(argv)
>>
>> /bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
>> 1734, in parse_args
>>
>> /bin/docker: stderr args, argv = self.parse_known_args(args, namespace)
>>
>> /bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
>> 1766, in parse_known_args
>>
>> /bin/docker: stderr namespace, 

[ceph-users] Re: ceph-volume lvm migrate error

2023-08-02 Thread Roland Giesler

Ouch, I got exited too quickly!

On 2023/08/02 21:27, Roland Giesler wrote:

# systemctl start ceph-osd@14

And, viola!, it did it.

# ls -la /var/lib/ceph/osd/ceph-14/block*
lrwxrwxrwx 1 ceph ceph 50 Dec 25  2022 /var/lib/ceph/osd/ceph-14/block 
-> /dev/mapper/0GVWr9-dQ65-LHcx-y6fD-z7fI-10A9-gVWZkY
lrwxrwxrwx 1 root root 10 Aug  2 21:17 
/var/lib/ceph/osd/ceph-14/block.db -> /dev/dm-20


It crashed!

# systemctl status ceph-osd@14
● ceph-osd@14.service - Ceph object storage daemon osd.14
 Loaded: loaded (/lib/systemd/system/ceph-osd@.service; 
enabled-runtime; vendor preset: enabled)

    Drop-In: /usr/lib/systemd/system/ceph-osd@.service.d
 └─ceph-after-pve-cluster.conf
 Active: failed (Result: exit-code) since Wed 2023-08-02 21:18:54 
SAST; 10min ago
    Process: 520652 ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh 
--cluster ${CLUSTER} --id 14 (code=exited, status=0/SUCCESS)
    Process: 520660 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} 
--id 14 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)

   Main PID: 520660 (code=exited, status=1/FAILURE)
    CPU: 90ms

Aug 02 21:18:54 FT1-NodeC systemd[1]: ceph-osd@14.service: Scheduled 
restart job, restart counter is at 3.
Aug 02 21:18:54 FT1-NodeC systemd[1]: Stopped Ceph object storage daemon 
osd.14.
Aug 02 21:18:54 FT1-NodeC systemd[1]: ceph-osd@14.service: Start request 
repeated too quickly.
Aug 02 21:18:54 FT1-NodeC systemd[1]: ceph-osd@14.service: Failed with 
result 'exit-code'.
Aug 02 21:18:54 FT1-NodeC systemd[1]: Failed to start Ceph object 
storage daemon osd.14.
Aug 02 21:28:49 FT1-NodeC systemd[1]: ceph-osd@14.service: Start request 
repeated too quickly.
Aug 02 21:28:49 FT1-NodeC systemd[1]: ceph-osd@14.service: Failed with 
result 'exit-code'.
Aug 02 21:28:49 FT1-NodeC systemd[1]: Failed to start Ceph object 
storage daemon osd.14.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-volume lvm migrate error

2023-08-02 Thread Roland Giesler

On 2023/08/02 13:29, Roland Giesler wrote:


On 2023/08/02 12:53, Igor Fedotov wrote:

Roland,

First of all there are no block.db/block.wal symlinks in OSD folder. 
Which means there are no standalone DB/WAL any more.


That is surprising.  So ceph-volume is not able to extract the DB/WAL 
from an OSD to migrate it it seems?


I figured out the if one doesn't specify and separate LV for the DB/WAL, 
it is integrated into the data drive.


However, one can create a new DB/WAL for and OSD as follows:

# systemctl stop ceph-osd@14
# ceph-bluestore-tool bluefs-bdev-new-db --path 
/var/lib/ceph/osd/ceph-14 --dev-target 
/dev/NodeC-nvme1/NodeC-nvme-LV-RocksDB1 --bluestore-block-db-size 45G

inferring bluefs devices from bluestore path
DB device added /dev/dm-20
# systemctl start ceph-osd@14

And, viola!, it did it.

# ls -la /var/lib/ceph/osd/ceph-14/block*
lrwxrwxrwx 1 ceph ceph 50 Dec 25  2022 /var/lib/ceph/osd/ceph-14/block 
-> /dev/mapper/0GVWr9-dQ65-LHcx-y6fD-z7fI-10A9-gVWZkY
lrwxrwxrwx 1 root root 10 Aug  2 21:17 
/var/lib/ceph/osd/ceph-14/block.db -> /dev/dm-20


I'm just check it out now, so see if there are no errors and that it 
actually does what I think it does.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RHEL / CephFS / Pacific / SELinux unavoidable "relabel inode" error?

2023-08-02 Thread Gregory Farnum
I don't think we've seen this reported before. SELinux gets a hefty
workout from Red Hat with their downstream ODF for OpenShift
(Kubernetes), so it certainly works at a basic level.

SELinux is a fussy beast though, so if you're eg mounting CephFS
across RHEL nodes and invoking SELinux against it, any differences
between those nodes (like differing UIDs, or probably lots of other
bits) could result in it looking wrong to them. I'm not even close to
an expert but IIUC, people generally turn off SELinux for their
shared/distributed data.
-Greg

On Wed, Aug 2, 2023 at 5:53 AM Harry G Coin  wrote:
>
> Hi!  No matter what I try, using the latest cephfs on an all
> ceph-pacific setup, I've not been able to avoid this error message,
> always similar to this on RHEL family clients:
>
> SELinux: inode=1099954719159 on dev=ceph was found to have an invalid
> context=system_u:object_r:unlabeled_t:s0.  This indicates you may need
> to relabel the inode or the filesystem in question.
>
> What's the answer?
>
>
> Thanks
>
> Harry Coin
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrading nautilus / centos7 to octopus / ubuntu 20.04. - Suggestions and hints?

2023-08-02 Thread Eugen Block
It's all covered in the docs [1], one of the points I already  
mentioned (require-osd-release), you should have bluestore OSDs and  
converted them to ceph-volume before you can adopt them with cephadm  
(if you deployed your cluster pre-nautilus).


[1]  
https://docs.ceph.com/en/nautilus/releases/nautilus/#upgrading-from-mimic-or-luminous



Zitat von Marc :



from Ceph perspective it's supported to upgrade from N to P, you can
safely skip O. We have done that on several clusters without any
issues. You just need to make sure that your upgrade to N was
complete.


How do you verify if the upgrade was complete?



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ref v18.2.0 QE Validation status

2023-08-02 Thread Radoslaw Zarzynski
Final ACK for RADOS.

On Tue, Aug 1, 2023 at 6:28 PM Laura Flores  wrote:

> Rados failures are summarized here:
> https://tracker.ceph.com/projects/rados/wiki/REEF#Reef-v1820
>
> All are known. Will let Radek give the final ack.
>
> On Tue, Aug 1, 2023 at 9:05 AM Nizamudeen A  wrote:
>
>> dashboard approved! failure is unrelated and tracked via
>> https://tracker.ceph.com/issues/58946
>>
>> Regards,
>> Nizam
>>
>> On Sun, Jul 30, 2023 at 9:16 PM Yuri Weinstein 
>> wrote:
>>
>> > Details of this release are summarized here:
>> >
>> > https://tracker.ceph.com/issues/62231#note-1
>> >
>> > Seeking approvals/reviews for:
>> >
>> > smoke - Laura, Radek
>> > rados - Neha, Radek, Travis, Ernesto, Adam King
>> > rgw - Casey
>> > fs - Venky
>> > orch - Adam King
>> > rbd - Ilya
>> > krbd - Ilya
>> > upgrade-clients:client-upgrade* - in progress
>> > powercycle - Brad
>> >
>> > Please reply to this email with approval and/or trackers of known
>> > issues/PRs to address them.
>> >
>> > bookworm distro support is an outstanding issue.
>> >
>> > TIA
>> > YuriW
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
>
> --
>
> Laura Flores
>
> She/Her/Hers
>
> Software Engineer, Ceph Storage 
>
> Chicago, IL
>
> lflo...@ibm.com | lflo...@redhat.com 
> M: +17087388804
>
>
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Luminous Bluestore issues and RGW Multi-site Recovery

2023-08-02 Thread Greg O'Neill
Hi Konstantin,

I dropped the SATA SSDs from /sys/block and rescanned it. Re-running 
`ceph-bluestore-tool fsck` resulted in the same output (unexpected aio error). 
Syslog says the drive is not in write-protect mode, however smart says life 
remaining is at 1%. Drives have approximately 42k power-on hours.

$ lsscsi -g
[0:0:0:0]diskATA  ST1000NX0423 TN04  /dev/sda   /dev/sg0 
[1:0:0:0]diskATA  ST1000NX0423 TN04  /dev/sdb   /dev/sg1 
[2:0:0:0]diskATA  Micron_5200_MTFD U004  /dev/sdc   /dev/sg2 
[3:0:0:0]diskATA  Micron_5200_MTFD U004  /dev/sdd   /dev/sg3 
[4:0:0:0]diskATA  Micron_5200_MTFD U004  /dev/sde   /dev/sg4 
[5:0:0:0]diskATA  Micron_5200_MTFD U004  /dev/sdf   /dev/sg5 
[6:0:0:0]diskATA  Micron_5200_MTFD U004  /dev/sdg   /dev/sg6 
[7:0:0:0]diskATA  Micron_5200_MTFD U004  /dev/sdh   /dev/sg7 

$ echo 1 > /sys/class/scsi_generic/sg3/device/delete
$ tail -n 10 /var/log/messages
Jul 31 07:08:03 ceph-osd4 rsyslogd: -- MARK --
Jul 31 07:28:03 ceph-osd4 rsyslogd: -- MARK --
Jul 31 07:48:03 ceph-osd4 rsyslogd: -- MARK --
Jul 31 08:08:03 ceph-osd4 rsyslogd: -- MARK --
Jul 31 08:28:03 ceph-osd4 rsyslogd: -- MARK --
Jul 31 08:48:03 ceph-osd4 rsyslogd: -- MARK --
Jul 31 08:51:41 ceph-osd4 kernel: [144240.695000] sd 3:0:0:0: [sdd] 
Synchronizing SCSI cache
Jul 31 08:51:41 ceph-osd4 kernel: [144240.695063] sd 3:0:0:0: [sdd] Stopping 
disk
Jul 31 08:51:41 ceph-osd4 kernel: [144241.011641] ata4.00: disabled

$ echo "- - -" > /sys/class/scsi_host/host3/scan
$ tail -n 10 /var/log/messages
Jul 31 08:52:48 ceph-osd4 kernel: [144308.293457] ata4.00: Enabling 
discard_zeroes_data
Jul 31 08:52:48 ceph-osd4 kernel: [144308.293480] sd 3:0:0:0: [sdi] 3750748848 
512-byte logical blocks: (1.92 TB/1.75 TiB)
Jul 31 08:52:48 ceph-osd4 kernel: [144308.293481] sd 3:0:0:0: [sdi] 4096-byte 
physical blocks
Jul 31 08:52:48 ceph-osd4 kernel: [144308.293486] sd 3:0:0:0: [sdi] Write 
Protect is off
Jul 31 08:52:48 ceph-osd4 kernel: [144308.293493] sd 3:0:0:0: [sdi] Write 
cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul 31 08:52:48 ceph-osd4 kernel: [144308.293517] sd 3:0:0:0: Attached scsi 
generic sg3 type 0
Jul 31 08:52:48 ceph-osd4 kernel: [144308.293670] ata4.00: Enabling 
discard_zeroes_data
Jul 31 08:52:48 ceph-osd4 kernel: [144308.297332]  sdi: sdi1
Jul 31 08:52:48 ceph-osd4 kernel: [144308.297898] ata4.00: Enabling 
discard_zeroes_data
Jul 31 08:52:48 ceph-osd4 kernel: [144308.297928] sd 3:0:0:0: [sdi] Attached 
SCSI removable disk

$ lsscsi -g
[0:0:0:0]diskATA  ST1000NX0423 TN04  /dev/sda   /dev/sg0 
[1:0:0:0]diskATA  ST1000NX0423 TN04  /dev/sdb   /dev/sg1 
[2:0:0:0]diskATA  Micron_5200_MTFD U004  /dev/sdc   /dev/sg2 
[3:0:0:0]diskATA  Micron_5200_MTFD U004  /dev/sdi   /dev/sg3 
[4:0:0:0]diskATA  Micron_5200_MTFD U004  /dev/sde   /dev/sg4 
[5:0:0:0]diskATA  Micron_5200_MTFD U004  /dev/sdf   /dev/sg5 
[6:0:0:0]diskATA  Micron_5200_MTFD U004  /dev/sdg   /dev/sg6 
[7:0:0:0]diskATA  Micron_5200_MTFD U004  /dev/sdh   /dev/sg7
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrading nautilus / centos7 to octopus / ubuntu 20.04. - Suggestions and hints?

2023-08-02 Thread Marc


> 
> from Ceph perspective it's supported to upgrade from N to P, you can
> safely skip O. We have done that on several clusters without any
> issues. You just need to make sure that your upgrade to N was
> complete.

How do you verify if the upgrade was complete?


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RHEL / CephFS / Pacific / SELinux unavoidable "relabel inode" error?

2023-08-02 Thread Harry G Coin
Hi!  No matter what I try, using the latest cephfs on an all 
ceph-pacific setup, I've not been able to avoid this error message, 
always similar to this on RHEL family clients:


SELinux: inode=1099954719159 on dev=ceph was found to have an invalid 
context=system_u:object_r:unlabeled_t:s0.  This indicates you may need 
to relabel the inode or the filesystem in question.


What's the answer?


Thanks

Harry Coin

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ref v18.2.0 QE Validation status

2023-08-02 Thread Yuri Weinstein
https://github.com/ceph/ceph/pull/52710 merged

smoke:
https://tracker.ceph.com/issues/62227 marked resolved
https://tracker.ceph.com/issues/62228 in progress

Only smoke and prowercycle items are remaining from the test approval
standpoint.

Here is the quote from Neha's status to the clt summarizing the correct
outstanding issues:
"
1. bookworm distro build support
action item - Neha and Josh will have a discussion with Dan to figure out
what help he needs and if at all there is a workaround for the first reef
release

2. nfs ganesha
action item - figure out with Guillaume and Kaleb if a workaround of
pinning to an older release is possible

3. powercycle failures due to teuthology -
https://pulpito.ceph.com/yuriw-2023-07-29_14:04:17-powercycle-reef-release-distro-default-smithi/

action item - open a tracker and get Zack's thoughts on whether this issue
can be classified as infra-only and get powercycle approvals from Brad
"

On Wed, Aug 2, 2023 at 3:03 AM Radoslaw Zarzynski 
wrote:

> Final ACK for RADOS.
>
> On Tue, Aug 1, 2023 at 6:28 PM Laura Flores  wrote:
>
>> Rados failures are summarized here:
>> https://tracker.ceph.com/projects/rados/wiki/REEF#Reef-v1820
>>
>> All are known. Will let Radek give the final ack.
>>
>> On Tue, Aug 1, 2023 at 9:05 AM Nizamudeen A  wrote:
>>
>>> dashboard approved! failure is unrelated and tracked via
>>> https://tracker.ceph.com/issues/58946
>>>
>>> Regards,
>>> Nizam
>>>
>>> On Sun, Jul 30, 2023 at 9:16 PM Yuri Weinstein 
>>> wrote:
>>>
>>> > Details of this release are summarized here:
>>> >
>>> > https://tracker.ceph.com/issues/62231#note-1
>>> >
>>> > Seeking approvals/reviews for:
>>> >
>>> > smoke - Laura, Radek
>>> > rados - Neha, Radek, Travis, Ernesto, Adam King
>>> > rgw - Casey
>>> > fs - Venky
>>> > orch - Adam King
>>> > rbd - Ilya
>>> > krbd - Ilya
>>> > upgrade-clients:client-upgrade* - in progress
>>> > powercycle - Brad
>>> >
>>> > Please reply to this email with approval and/or trackers of known
>>> > issues/PRs to address them.
>>> >
>>> > bookworm distro support is an outstanding issue.
>>> >
>>> > TIA
>>> > YuriW
>>> > ___
>>> > ceph-users mailing list -- ceph-users@ceph.io
>>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>> >
>>> >
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>
>>
>> --
>>
>> Laura Flores
>>
>> She/Her/Hers
>>
>> Software Engineer, Ceph Storage 
>>
>> Chicago, IL
>>
>> lflo...@ibm.com | lflo...@redhat.com 
>> M: +17087388804
>>
>>
>> ___
>> Dev mailing list -- d...@ceph.io
>> To unsubscribe send an email to dev-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Disk device path changed - cephadm faild to apply osd service

2023-08-02 Thread Anthony D'Atri
This.

You can even constrain placement by size or model number.

> On Aug 2, 2023, at 6:53 AM, Eugen Block  wrote:
> 
> But that could be done easily like this:
> 
> service_type: osd
> service_id: ssd-db
> service_name: osd.ssd-db
> placement:
>  hosts:
>  - storage01
>  - storage02
> ...
> spec:
>  block_db_size: 64G
>  data_devices:
>rotational: 1
>  db_devices:
>rotational: 0
>  filter_logic: AND
>  objectstore: bluestore
> 
> Anyway, I would expect that fixing the drivegroup config would fix your 
> issue, but I'm not sure either.
> 
> Zitat von Kilian Ries :
> 
>> Yes i need specific device paths because all HDD / SSD are the same size / 
>> same vendor etc. I group multiple HDDs with an exclusive SSD for cacheing, 
>> for example:
>> 
>> 
>> spec:
>> 
>>  data_devices:
>> 
>>paths:
>> 
>>- /dev/sdh
>> 
>>- /dev/sdi
>> 
>>- /dev/sdj
>> 
>>- /dev/sdk
>> 
>>- /dev/sdl
>> 
>>  db_devices:
>> 
>>paths:
>> 
>>- /dev/sdf
>> 
>>  filter_logic: AND
>> 
>>  objectstore: bluestore
>> 
>> 
>> Von: Eugen Block 
>> Gesendet: Mittwoch, 2. August 2023 08:13:41
>> An: ceph-users@ceph.io
>> Betreff: [ceph-users] Re: Disk device path changed - cephadm faild to apply 
>> osd service
>> 
>> Do you really need device paths in your configuration? You could use
>> other criteria like disk sizes, vendors, rotational flag etc. If you
>> really want device paths you'll probably need to ensure they're
>> persistent across reboots via udev rules.
>> 
>> Zitat von Kilian Ries :
>> 
>>> Hi,
>>> 
>>> 
>>> it seems that after reboot / OS update my disk labels / device paths
>>> may have changed. Since then i get an error like this:
>>> 
>>> 
>>> 
>>> CEPHADM_APPLY_SPEC_FAIL: Failed to apply 1 service(s): osd.osd-12-22_hdd-2
>>> 
>>> 
>>> 
>>> ###
>>> 
>>> 
>>> RuntimeError: cephadm exited with an error code: 1, stderr:Non-zero
>>> exit code 1 from /bin/docker run --rm --ipc=host
>>> --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume
>>> --privileged --group-add=disk --init -e
>>> CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:9e2fd45a080aea67d1935d7d9a9025b6db2e8be9173186e068a79a0da5a54ada
>>>  -e NODE_NAME=ceph-osd07.intern -e CEPH_USE_RANDOM_NONCE=1 -e 
>>> CEPH_VOLUME_OSDSPEC_AFFINITY=osd-12-22_hdd-2 -e 
>>> CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v 
>>> /var/run/ceph/01578d80-6c97-46ba-9327-cb2b13980916:/var/run/ceph:z -v 
>>> /var/log/ceph/01578d80-6c97-46ba-9327-cb2b13980916:/var/log/ceph:z -v 
>>> /var/lib/ceph/01578d80-6c97-46ba-9327-cb2b13980916/crash:/var/lib/ceph/crash:z
>>>  -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v 
>>> /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v 
>>> /tmp/ceph-tmp2cvmr5lf:/etc/ceph/ceph.conf:z -v 
>>> /tmp/ceph-tmpb38cuw7q:/var/lib/ceph/bootstrap-osd/ceph.keyring:z
>>> quay.io/ceph/ceph@sha256:9e2fd45a080aea67d1935d7d9a9
>>> 025b6db2e8be9173186e068a79a0da5a54ada lvm batch --no-auto /dev/sdm
>>> /dev/sdn /dev/sdo /dev/sdp /dev/sdq --db-devices /dev/sdg --yes
>>> --no-systemd
>>> 
>>> /bin/docker: stderr Traceback (most recent call last):
>>> 
>>> /bin/docker: stderr   File "/usr/sbin/ceph-volume", line 11, in 
>>> 
>>> /bin/docker: stderr load_entry_point('ceph-volume==1.0.0',
>>> 'console_scripts', 'ceph-volume')()
>>> 
>>> /bin/docker: stderr   File
>>> "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41, in
>>> __init__
>>> 
>>> /bin/docker: stderr self.main(self.argv)
>>> 
>>> /bin/docker: stderr   File
>>> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line
>>> 59, in newfunc
>>> 
>>> /bin/docker: stderr return f(*a, **kw)
>>> 
>>> /bin/docker: stderr   File
>>> "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in
>>> main
>>> 
>>> /bin/docker: stderr terminal.dispatch(self.mapper, subcommand_args)
>>> 
>>> /bin/docker: stderr   File
>>> "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line
>>> 194, in dispatch
>>> 
>>> /bin/docker: stderr instance.main()
>>> 
>>> /bin/docker: stderr   File
>>> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py",
>>> line 46, in main
>>> 
>>> /bin/docker: stderr terminal.dispatch(self.mapper, self.argv)
>>> 
>>> /bin/docker: stderr   File
>>> "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line
>>> 192, in dispatch
>>> 
>>> /bin/docker: stderr instance = mapper.get(arg)(argv[count:])
>>> 
>>> /bin/docker: stderr   File
>>> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py",
>>> line 348, in __init__
>>> 
>>> /bin/docker: stderr self.args = parser.parse_args(argv)
>>> 
>>> /bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
>>> 1734, in parse_args
>>> 
>>> /bin/docker: stderr args, argv = self.parse_known_args(args, namespace)
>>> 
>>> /bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
>>> 1766, in parse_known_args
>>> 
>>> /bin/docker: stderr namespace, args =
>>> 

[ceph-users] Re: Disk device path changed - cephadm faild to apply osd service

2023-08-02 Thread Eugen Block

But that could be done easily like this:

service_type: osd
service_id: ssd-db
service_name: osd.ssd-db
placement:
  hosts:
  - storage01
  - storage02
...
spec:
  block_db_size: 64G
  data_devices:
rotational: 1
  db_devices:
rotational: 0
  filter_logic: AND
  objectstore: bluestore

Anyway, I would expect that fixing the drivegroup config would fix  
your issue, but I'm not sure either.


Zitat von Kilian Ries :

Yes i need specific device paths because all HDD / SSD are the same  
size / same vendor etc. I group multiple HDDs with an exclusive SSD  
for cacheing, for example:



spec:

  data_devices:

paths:

- /dev/sdh

- /dev/sdi

- /dev/sdj

- /dev/sdk

- /dev/sdl

  db_devices:

paths:

- /dev/sdf

  filter_logic: AND

  objectstore: bluestore


Von: Eugen Block 
Gesendet: Mittwoch, 2. August 2023 08:13:41
An: ceph-users@ceph.io
Betreff: [ceph-users] Re: Disk device path changed - cephadm faild  
to apply osd service


Do you really need device paths in your configuration? You could use
other criteria like disk sizes, vendors, rotational flag etc. If you
really want device paths you'll probably need to ensure they're
persistent across reboots via udev rules.

Zitat von Kilian Ries :


Hi,


it seems that after reboot / OS update my disk labels / device paths
may have changed. Since then i get an error like this:



CEPHADM_APPLY_SPEC_FAIL: Failed to apply 1 service(s): osd.osd-12-22_hdd-2



###


RuntimeError: cephadm exited with an error code: 1, stderr:Non-zero
exit code 1 from /bin/docker run --rm --ipc=host
--stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume
--privileged --group-add=disk --init -e
CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:9e2fd45a080aea67d1935d7d9a9025b6db2e8be9173186e068a79a0da5a54ada -e NODE_NAME=ceph-osd07.intern -e CEPH_USE_RANDOM_NONCE=1 -e CEPH_VOLUME_OSDSPEC_AFFINITY=osd-12-22_hdd-2 -e CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v /var/run/ceph/01578d80-6c97-46ba-9327-cb2b13980916:/var/run/ceph:z -v /var/log/ceph/01578d80-6c97-46ba-9327-cb2b13980916:/var/log/ceph:z -v /var/lib/ceph/01578d80-6c97-46ba-9327-cb2b13980916/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v /tmp/ceph-tmp2cvmr5lf:/etc/ceph/ceph.conf:z -v  
/tmp/ceph-tmpb38cuw7q:/var/lib/ceph/bootstrap-osd/ceph.keyring:z

quay.io/ceph/ceph@sha256:9e2fd45a080aea67d1935d7d9a9
 025b6db2e8be9173186e068a79a0da5a54ada lvm batch --no-auto /dev/sdm
/dev/sdn /dev/sdo /dev/sdp /dev/sdq --db-devices /dev/sdg --yes
--no-systemd

/bin/docker: stderr Traceback (most recent call last):

/bin/docker: stderr   File "/usr/sbin/ceph-volume", line 11, in 

/bin/docker: stderr load_entry_point('ceph-volume==1.0.0',
'console_scripts', 'ceph-volume')()

/bin/docker: stderr   File
"/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41, in
__init__

/bin/docker: stderr self.main(self.argv)

/bin/docker: stderr   File
"/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line
59, in newfunc

/bin/docker: stderr return f(*a, **kw)

/bin/docker: stderr   File
"/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in
main

/bin/docker: stderr terminal.dispatch(self.mapper, subcommand_args)

/bin/docker: stderr   File
"/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line
194, in dispatch

/bin/docker: stderr instance.main()

/bin/docker: stderr   File
"/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py",
line 46, in main

/bin/docker: stderr terminal.dispatch(self.mapper, self.argv)

/bin/docker: stderr   File
"/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line
192, in dispatch

/bin/docker: stderr instance = mapper.get(arg)(argv[count:])

/bin/docker: stderr   File
"/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py",
line 348, in __init__

/bin/docker: stderr self.args = parser.parse_args(argv)

/bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
1734, in parse_args

/bin/docker: stderr args, argv = self.parse_known_args(args, namespace)

/bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
1766, in parse_known_args

/bin/docker: stderr namespace, args =
self._parse_known_args(args, namespace)

/bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
1954, in _parse_known_args

/bin/docker: stderr positionals_end_index =
consume_positionals(start_index)

/bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
1931, in consume_positionals

/bin/docker: stderr take_action(action, args)

/bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
1824, in take_action

/bin/docker: stderr argument_values = self._get_values(action,
argument_strings)

/bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
2279, in _get_values

/bin/docker: stderr   

[ceph-users] Re: Disk device path changed - cephadm faild to apply osd service

2023-08-02 Thread Kilian Ries
However since it's already broken now - is this the correct way to fix it?


###

ceph orch ls --service_name= --export > myservice.yaml

-> change my device path to the correct one

ceph orch apply -i myservice.yaml [--dry-run]
###


Thanks

Von: Eugen Block 
Gesendet: Mittwoch, 2. August 2023 08:13:41
An: ceph-users@ceph.io
Betreff: [ceph-users] Re: Disk device path changed - cephadm faild to apply osd 
service

Do you really need device paths in your configuration? You could use
other criteria like disk sizes, vendors, rotational flag etc. If you
really want device paths you'll probably need to ensure they're
persistent across reboots via udev rules.

Zitat von Kilian Ries :

> Hi,
>
>
> it seems that after reboot / OS update my disk labels / device paths
> may have changed. Since then i get an error like this:
>
>
>
> CEPHADM_APPLY_SPEC_FAIL: Failed to apply 1 service(s): osd.osd-12-22_hdd-2
>
>
>
> ###
>
>
> RuntimeError: cephadm exited with an error code: 1, stderr:Non-zero
> exit code 1 from /bin/docker run --rm --ipc=host
> --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume
> --privileged --group-add=disk --init -e
> CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:9e2fd45a080aea67d1935d7d9a9025b6db2e8be9173186e068a79a0da5a54ada
>  -e NODE_NAME=ceph-osd07.intern -e CEPH_USE_RANDOM_NONCE=1 -e 
> CEPH_VOLUME_OSDSPEC_AFFINITY=osd-12-22_hdd-2 -e 
> CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v 
> /var/run/ceph/01578d80-6c97-46ba-9327-cb2b13980916:/var/run/ceph:z -v 
> /var/log/ceph/01578d80-6c97-46ba-9327-cb2b13980916:/var/log/ceph:z -v 
> /var/lib/ceph/01578d80-6c97-46ba-9327-cb2b13980916/crash:/var/lib/ceph/crash:z
>  -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v 
> /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v 
> /tmp/ceph-tmp2cvmr5lf:/etc/ceph/ceph.conf:z -v 
> /tmp/ceph-tmpb38cuw7q:/var/lib/ceph/bootstrap-osd/ceph.keyring:z
> quay.io/ceph/ceph@sha256:9e2fd45a080aea67d1935d7d9a9
>  025b6db2e8be9173186e068a79a0da5a54ada lvm batch --no-auto /dev/sdm
> /dev/sdn /dev/sdo /dev/sdp /dev/sdq --db-devices /dev/sdg --yes
> --no-systemd
>
> /bin/docker: stderr Traceback (most recent call last):
>
> /bin/docker: stderr   File "/usr/sbin/ceph-volume", line 11, in 
>
> /bin/docker: stderr load_entry_point('ceph-volume==1.0.0',
> 'console_scripts', 'ceph-volume')()
>
> /bin/docker: stderr   File
> "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41, in
> __init__
>
> /bin/docker: stderr self.main(self.argv)
>
> /bin/docker: stderr   File
> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line
> 59, in newfunc
>
> /bin/docker: stderr return f(*a, **kw)
>
> /bin/docker: stderr   File
> "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in
> main
>
> /bin/docker: stderr terminal.dispatch(self.mapper, subcommand_args)
>
> /bin/docker: stderr   File
> "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line
> 194, in dispatch
>
> /bin/docker: stderr instance.main()
>
> /bin/docker: stderr   File
> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py",
> line 46, in main
>
> /bin/docker: stderr terminal.dispatch(self.mapper, self.argv)
>
> /bin/docker: stderr   File
> "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line
> 192, in dispatch
>
> /bin/docker: stderr instance = mapper.get(arg)(argv[count:])
>
> /bin/docker: stderr   File
> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py",
> line 348, in __init__
>
> /bin/docker: stderr self.args = parser.parse_args(argv)
>
> /bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
> 1734, in parse_args
>
> /bin/docker: stderr args, argv = self.parse_known_args(args, namespace)
>
> /bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
> 1766, in parse_known_args
>
> /bin/docker: stderr namespace, args =
> self._parse_known_args(args, namespace)
>
> /bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
> 1954, in _parse_known_args
>
> /bin/docker: stderr positionals_end_index =
> consume_positionals(start_index)
>
> /bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
> 1931, in consume_positionals
>
> /bin/docker: stderr take_action(action, args)
>
> /bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
> 1824, in take_action
>
> /bin/docker: stderr argument_values = self._get_values(action,
> argument_strings)
>
> /bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
> 2279, in _get_values
>
> /bin/docker: stderr value = [self._get_value(action, v) for v in
> arg_strings]
>
> /bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
> 2279, in 
>
> /bin/docker: stderr value = [self._get_value(action, v) for v in
> arg_strings]
>
> /bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
> 2294, in _get_value
>
> /bin/docker: stderr result = 

[ceph-users] Re: Disk device path changed - cephadm faild to apply osd service

2023-08-02 Thread Kilian Ries
Yes i need specific device paths because all HDD / SSD are the same size / same 
vendor etc. I group multiple HDDs with an exclusive SSD for cacheing, for 
example:


spec:

  data_devices:

paths:

- /dev/sdh

- /dev/sdi

- /dev/sdj

- /dev/sdk

- /dev/sdl

  db_devices:

paths:

- /dev/sdf

  filter_logic: AND

  objectstore: bluestore


Von: Eugen Block 
Gesendet: Mittwoch, 2. August 2023 08:13:41
An: ceph-users@ceph.io
Betreff: [ceph-users] Re: Disk device path changed - cephadm faild to apply osd 
service

Do you really need device paths in your configuration? You could use
other criteria like disk sizes, vendors, rotational flag etc. If you
really want device paths you'll probably need to ensure they're
persistent across reboots via udev rules.

Zitat von Kilian Ries :

> Hi,
>
>
> it seems that after reboot / OS update my disk labels / device paths
> may have changed. Since then i get an error like this:
>
>
>
> CEPHADM_APPLY_SPEC_FAIL: Failed to apply 1 service(s): osd.osd-12-22_hdd-2
>
>
>
> ###
>
>
> RuntimeError: cephadm exited with an error code: 1, stderr:Non-zero
> exit code 1 from /bin/docker run --rm --ipc=host
> --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume
> --privileged --group-add=disk --init -e
> CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:9e2fd45a080aea67d1935d7d9a9025b6db2e8be9173186e068a79a0da5a54ada
>  -e NODE_NAME=ceph-osd07.intern -e CEPH_USE_RANDOM_NONCE=1 -e 
> CEPH_VOLUME_OSDSPEC_AFFINITY=osd-12-22_hdd-2 -e 
> CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v 
> /var/run/ceph/01578d80-6c97-46ba-9327-cb2b13980916:/var/run/ceph:z -v 
> /var/log/ceph/01578d80-6c97-46ba-9327-cb2b13980916:/var/log/ceph:z -v 
> /var/lib/ceph/01578d80-6c97-46ba-9327-cb2b13980916/crash:/var/lib/ceph/crash:z
>  -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v 
> /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v 
> /tmp/ceph-tmp2cvmr5lf:/etc/ceph/ceph.conf:z -v 
> /tmp/ceph-tmpb38cuw7q:/var/lib/ceph/bootstrap-osd/ceph.keyring:z
> quay.io/ceph/ceph@sha256:9e2fd45a080aea67d1935d7d9a9
>  025b6db2e8be9173186e068a79a0da5a54ada lvm batch --no-auto /dev/sdm
> /dev/sdn /dev/sdo /dev/sdp /dev/sdq --db-devices /dev/sdg --yes
> --no-systemd
>
> /bin/docker: stderr Traceback (most recent call last):
>
> /bin/docker: stderr   File "/usr/sbin/ceph-volume", line 11, in 
>
> /bin/docker: stderr load_entry_point('ceph-volume==1.0.0',
> 'console_scripts', 'ceph-volume')()
>
> /bin/docker: stderr   File
> "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41, in
> __init__
>
> /bin/docker: stderr self.main(self.argv)
>
> /bin/docker: stderr   File
> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line
> 59, in newfunc
>
> /bin/docker: stderr return f(*a, **kw)
>
> /bin/docker: stderr   File
> "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in
> main
>
> /bin/docker: stderr terminal.dispatch(self.mapper, subcommand_args)
>
> /bin/docker: stderr   File
> "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line
> 194, in dispatch
>
> /bin/docker: stderr instance.main()
>
> /bin/docker: stderr   File
> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py",
> line 46, in main
>
> /bin/docker: stderr terminal.dispatch(self.mapper, self.argv)
>
> /bin/docker: stderr   File
> "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line
> 192, in dispatch
>
> /bin/docker: stderr instance = mapper.get(arg)(argv[count:])
>
> /bin/docker: stderr   File
> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py",
> line 348, in __init__
>
> /bin/docker: stderr self.args = parser.parse_args(argv)
>
> /bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
> 1734, in parse_args
>
> /bin/docker: stderr args, argv = self.parse_known_args(args, namespace)
>
> /bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
> 1766, in parse_known_args
>
> /bin/docker: stderr namespace, args =
> self._parse_known_args(args, namespace)
>
> /bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
> 1954, in _parse_known_args
>
> /bin/docker: stderr positionals_end_index =
> consume_positionals(start_index)
>
> /bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
> 1931, in consume_positionals
>
> /bin/docker: stderr take_action(action, args)
>
> /bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
> 1824, in take_action
>
> /bin/docker: stderr argument_values = self._get_values(action,
> argument_strings)
>
> /bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
> 2279, in _get_values
>
> /bin/docker: stderr value = [self._get_value(action, v) for v in
> arg_strings]
>
> /bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line
> 2279, in 
>
> /bin/docker: stderr value = [self._get_value(action, v) for v in
> arg_strings]

[ceph-users] Re: ceph-volume lvm migrate error

2023-08-02 Thread Igor Fedotov

Hi Roland,

could you please share the content of thd relevant OSD subfolder?

Also you might want to run:

ceph-bluestore-tool --path  --command bluefs-bdev-sizes

to make sure DB/WAL are effectively in use.


Thanks,

Igor

On 8/2/2023 12:04 PM, Roland Giesler wrote:
I need some help with this please.  The command below gives and error 
which is not helpful to me.


ceph-volume lvm migrate --osd-id 14 --osd-fsid 
4de2a617-4452-420d-a99b-9e0cd6b2a99b --from db wal --target 
NodeC-nvme1/NodeC-nvme-LV-RocksDB1

--> Source device list is empty
Unable to migrate to : NodeC-nvme1/NodeC-nvme-LV-RocksDB1

Alternatively I have tried to only specify --from db instead of 
including wal, but it makes no difference.


Here is the OSD in question.

# ls -la /dev/ceph-025b887e-4f06-468f-845c-0ddf9ad04990/
lrwxrwxrwx  1 root root    7 Dec 25  2022 
osd-block-4de2a617-4452-420d-a99b-9e0cd6b2a99b -> ../dm-4



What is happening here?  I want to move the DB/WAL to NVMe storage 
without trashing the data OSD and having to go through rebalancing for 
each drive I do this for.



thanks

Roland

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Not all Bucket Shards being used

2023-08-02 Thread Christian Kugler
> Thank you for the information, Christian. When you reshard the bucket id is 
> updated (with most recent versions of ceph, a generation number is 
> incremented). The first bucket id matches the bucket marker, but after the 
> first reshard they diverge.

This makes a lot of sense and explains why the large omap objects do
not go away. It is the old shards that are too big.

> The bucket id is in the names of the currently used bucket index shards. 
> You’re searching for the marker, which means you’re finding older bucket 
> index shards.
>
> Change your commands to these:
>
> # rados -p raum.rgw.buckets.index ls \
>|grep 3caabb9a-4e3b-4b8a-8222-34c33dd63210.10648356.1 \
>|sort -V
>
> # rados -p raum.rgw.buckets.index ls \
>|grep 3caabb9a-4e3b-4b8a-8222-34c33dd63210.10648356.1 \
>|sort -V \
>|xargs -IOMAP sh -c \
>'rados -p raum.rgw.buckets.index listomapkeys OMAP | wc -l'

I don't think the outputs are very interesting here. They are as expected:
- 131 lines of rados objects (omap)
- each omap contains about 70k keys (below the 100k limit).

> When you refer to the “second zone”, what do you mean? Is this cluster using 
> multisite? If and only if your answer is “no”, then it’s safe to remove old 
> bucket index shards. Depending on the version of ceph running when reshard 
> was run, they were either intentionally left behind (earlier behavior) or 
> removed automatically (later behavior).

Yes, this cluster uses multisite. It is one realm, one zonegroup with
two zones (bidirectional sync).
Ceph warns about resharding on the non-metadata zone. So I did not do
that and only resharded on the metadata zone.
The resharding was done using a radosgw-admin v16.2.6 on a ceph
cluster running v17.2.5.
Is there a way to get rid of the old (big) shards without breaking something?

Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph-volume lvm migrate error

2023-08-02 Thread Roland Giesler
I need some help with this please.  The command below gives and error 
which is not helpful to me.


ceph-volume lvm migrate --osd-id 14 --osd-fsid 
4de2a617-4452-420d-a99b-9e0cd6b2a99b --from db wal --target 
NodeC-nvme1/NodeC-nvme-LV-RocksDB1

--> Source device list is empty
Unable to migrate to : NodeC-nvme1/NodeC-nvme-LV-RocksDB1

Alternatively I have tried to only specify --from db instead of 
including wal, but it makes no difference.


Here is the OSD in question.

# ls -la /dev/ceph-025b887e-4f06-468f-845c-0ddf9ad04990/
lrwxrwxrwx  1 root root    7 Dec 25  2022 
osd-block-4de2a617-4452-420d-a99b-9e0cd6b2a99b -> ../dm-4



What is happening here?  I want to move the DB/WAL to NVMe storage 
without trashing the data OSD and having to go through rebalancing for 
each drive I do this for.



thanks

Roland

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 1 Large omap object found

2023-08-02 Thread Mark Johnson
Thanks Eugen, massive help.  Working on identifying and cleaning up old/empty 
buckets now.



On Wed, 2023-08-02 at 06:10 +, Eugen Block wrote:
Correct, only a deep-scrub will check that threshold. 'ceph config
set' is persistent, a daemon restart will preserve the new config value.

Zitat von Mark Johnson mailto:ma...@iovox.com>>:

Never mind, I think I worked it out.  I consulted the Quincy
documentation which just said to do this:

ceph config set osd osd_deep_scrub_large_omap_object_key_threshold 200

But when i did that, the health warning didn't clear.  I took a
guess that maybe I needed to trigger a deep scrub on that PG as it
probably only checks the value against the threshold at scrub time,
and when that was done, the health warning has now cleared.  Not
sure if that persists across restarts or not but I'll cross that
bridge if/when I come to it.



On Wed, 2023-08-02 at 05:31 +, Mark Johnson wrote:
Regarding changing this bvalue back to the previous default of
2,000,000, how would I go about doing that?  I tried following that
SUSE KB article which says to do this:

ceph tell 'osd.*' injectargs --
osd_deep_scrub_large_omap_object_key_threshold=200

But while that didn't fail as such, it didn't apply any changes.  Is
there a way to apply this on the fly without restarting the cluster?




On Tue, 2023-08-01 at 22:44 +, Mark Johnson wrote:
Thanks for that.  That's pretty much how I was reading it, but the
text you provided is a lot more explanatory than what I'd managed to
find and makes it a bit clearer.  Without going into too much detail,
yes we do have a single user that is used to create multiple a bucket
for each of a multiple tenants on a daily basis.  So, we'd be
creating many buckets each day and all owned by the same account.
Therefore, it's quite possible that there could be 400,000 buckets
owned by the one user.  I don't know an easy way to get a figure - I
tried a "radosgw bucket stats" output to a file but after about 4
hours it still hadn't returned anything so I gave up.

I have a feeling that we do have a rolling clean out of objects in
these buckets, so we might be only keeping 3 months of data for some
customers, 6 months for others, 12 months for others etc.  But, I
think one of our guys mentioned that the cleanup might not be getting
rid of buckets, only the files in them.  So, I may have to get our
dev guys to revisit this and see if we can clean up a crapload of
empty buckets.


On Tue, 2023-08-01 at 08:37 +, Eugen Block wrote:
Thanks. Just for reference I'm quoting the SUSE doc [1] you mentioned
because it explains what you already summarized:

User indices are not sharded, in other words we store all the keys
of names of buckets under one object. This can cause large objects
to be found. The large object is only accessed in the List All
Buckets S3/Swift API. Unlike bucket indices, the large object is not
exactly in the object IO path. Depending on the use case for so many
buckets, the warning isn't dangerous as the large object is only
used for the List All Buckets API.
The error shows a user has 500K buckets causing the omap issue.
Sharding does not occur at the user level. Bucket indexes are
sharded but buckets per user is not (and usually the default
max_bucket is 1000).

Does this mean that you actually have a user with around 400k
buckets?
If you can't delete unused buckets (you already ruled out creating
multiple users) there's probably no way around increasing the
threshold, I guess. I'm not the biggest RGW expert but we have a few
customers where the threshold was actually increased to the previous
default to get rid of the warning (if other actions were not
possible). So far we didn't get any reports causing any issue at all.
But I'd be curious if the devs or someone with more experience has a
better advice.

[1] https://www.suse.com/support/kb/doc/?id=19698

Zitat von Mark Johnson
mailto:ma...@iovox.com>>mailto:ma...@iovox.com:

Here you go.  It doesn't format very well, so I'll summarize what I'm
seeing.

5.c has 78051 OMAP_BYTES and 398 OMAP_KEYS
5.16 has 80186950 OMAP_BYTES and 401505 OMAP_KEYS

The remaining 30 PGS have zero of both.  However, the BYTES for each
PG
is very much the same at around 890 for each.


# ceph pg ls-by-pool default.rgw.meta

PGOBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTESOMAP_BYTES*
OMAP_KEYS*  LOGSTATESINCE  VERSION
REPORTEDUP ACTING SCRUB_STAMP
DEEP_SCRUB_STAMP LAST_SCRUB_DURATION
SCRUB_SCHEDULING
5.0 26240 0  00  89098640
0  10076 active+clean10h  8093'54176
8093:5396520   [21,4,12]p21   [21,4,12]p21  2023-07-
31T21:13:20.554485+  2023-07-26T03:40:27.457946+
5  periodic scrub scheduled @ 2023-08-01T23:55:14.134653+
5.1 26065 0  0 

[ceph-users] Re: Disk device path changed - cephadm faild to apply osd service

2023-08-02 Thread Eugen Block
Do you really need device paths in your configuration? You could use  
other criteria like disk sizes, vendors, rotational flag etc. If you  
really want device paths you'll probably need to ensure they're  
persistent across reboots via udev rules.


Zitat von Kilian Ries :


Hi,


it seems that after reboot / OS update my disk labels / device paths  
may have changed. Since then i get an error like this:




CEPHADM_APPLY_SPEC_FAIL: Failed to apply 1 service(s): osd.osd-12-22_hdd-2



###


RuntimeError: cephadm exited with an error code: 1, stderr:Non-zero  
exit code 1 from /bin/docker run --rm --ipc=host  
--stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume  
--privileged --group-add=disk --init -e  
CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:9e2fd45a080aea67d1935d7d9a9025b6db2e8be9173186e068a79a0da5a54ada -e NODE_NAME=ceph-osd07.intern -e CEPH_USE_RANDOM_NONCE=1 -e CEPH_VOLUME_OSDSPEC_AFFINITY=osd-12-22_hdd-2 -e CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v /var/run/ceph/01578d80-6c97-46ba-9327-cb2b13980916:/var/run/ceph:z -v /var/log/ceph/01578d80-6c97-46ba-9327-cb2b13980916:/var/log/ceph:z -v /var/lib/ceph/01578d80-6c97-46ba-9327-cb2b13980916/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v /tmp/ceph-tmp2cvmr5lf:/etc/ceph/ceph.conf:z -v /tmp/ceph-tmpb38cuw7q:/var/lib/ceph/bootstrap-osd/ceph.keyring:z  
quay.io/ceph/ceph@sha256:9e2fd45a080aea67d1935d7d9a9
 025b6db2e8be9173186e068a79a0da5a54ada lvm batch --no-auto /dev/sdm  
/dev/sdn /dev/sdo /dev/sdp /dev/sdq --db-devices /dev/sdg --yes  
--no-systemd


/bin/docker: stderr Traceback (most recent call last):

/bin/docker: stderr   File "/usr/sbin/ceph-volume", line 11, in 

/bin/docker: stderr load_entry_point('ceph-volume==1.0.0',  
'console_scripts', 'ceph-volume')()


/bin/docker: stderr   File  
"/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41, in  
__init__


/bin/docker: stderr self.main(self.argv)

/bin/docker: stderr   File  
"/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line  
59, in newfunc


/bin/docker: stderr return f(*a, **kw)

/bin/docker: stderr   File  
"/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in  
main


/bin/docker: stderr terminal.dispatch(self.mapper, subcommand_args)

/bin/docker: stderr   File  
"/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line  
194, in dispatch


/bin/docker: stderr instance.main()

/bin/docker: stderr   File  
"/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py",  
line 46, in main


/bin/docker: stderr terminal.dispatch(self.mapper, self.argv)

/bin/docker: stderr   File  
"/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line  
192, in dispatch


/bin/docker: stderr instance = mapper.get(arg)(argv[count:])

/bin/docker: stderr   File  
"/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py",  
line 348, in __init__


/bin/docker: stderr self.args = parser.parse_args(argv)

/bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line  
1734, in parse_args


/bin/docker: stderr args, argv = self.parse_known_args(args, namespace)

/bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line  
1766, in parse_known_args


/bin/docker: stderr namespace, args =  
self._parse_known_args(args, namespace)


/bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line  
1954, in _parse_known_args


/bin/docker: stderr positionals_end_index =  
consume_positionals(start_index)


/bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line  
1931, in consume_positionals


/bin/docker: stderr take_action(action, args)

/bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line  
1824, in take_action


/bin/docker: stderr argument_values = self._get_values(action,  
argument_strings)


/bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line  
2279, in _get_values


/bin/docker: stderr value = [self._get_value(action, v) for v in  
arg_strings]


/bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line  
2279, in 


/bin/docker: stderr value = [self._get_value(action, v) for v in  
arg_strings]


/bin/docker: stderr   File "/usr/lib64/python3.6/argparse.py", line  
2294, in _get_value


/bin/docker: stderr result = type_func(arg_string)

/bin/docker: stderr   File  
"/usr/lib/python3.6/site-packages/ceph_volume/util/arg_validators.py", line  
116, in __call__


/bin/docker: stderr return self._format_device(self._is_valid_device())

/bin/docker: stderr   File  
"/usr/lib/python3.6/site-packages/ceph_volume/util/arg_validators.py", line  
127, in _is_valid_device


/bin/docker: stderr super()._is_valid_device(raise_sys_exit=False)

/bin/docker: stderr   File  
"/usr/lib/python3.6/site-packages/ceph_volume/util/arg_validators.py", line  
104, in _is_valid_device



[ceph-users] Re: 1 Large omap object found

2023-08-02 Thread Eugen Block
Correct, only a deep-scrub will check that threshold. 'ceph config  
set' is persistent, a daemon restart will preserve the new config value.


Zitat von Mark Johnson :


Never mind, I think I worked it out.  I consulted the Quincy
documentation which just said to do this:

ceph config set osd osd_deep_scrub_large_omap_object_key_threshold 200

But when i did that, the health warning didn't clear.  I took a  
guess that maybe I needed to trigger a deep scrub on that PG as it  
probably only checks the value against the threshold at scrub time,  
and when that was done, the health warning has now cleared.  Not  
sure if that persists across restarts or not but I'll cross that  
bridge if/when I come to it.




On Wed, 2023-08-02 at 05:31 +, Mark Johnson wrote:

Regarding changing this bvalue back to the previous default of
2,000,000, how would I go about doing that?  I tried following that
SUSE KB article which says to do this:

ceph tell 'osd.*' injectargs --
osd_deep_scrub_large_omap_object_key_threshold=200

But while that didn't fail as such, it didn't apply any changes.  Is
there a way to apply this on the fly without restarting the cluster?




On Tue, 2023-08-01 at 22:44 +, Mark Johnson wrote:
Thanks for that.  That's pretty much how I was reading it, but the
text you provided is a lot more explanatory than what I'd managed to
find and makes it a bit clearer.  Without going into too much detail,
yes we do have a single user that is used to create multiple a bucket
for each of a multiple tenants on a daily basis.  So, we'd be
creating many buckets each day and all owned by the same account. 
Therefore, it's quite possible that there could be 400,000 buckets
owned by the one user.  I don't know an easy way to get a figure - I
tried a "radosgw bucket stats" output to a file but after about 4
hours it still hadn't returned anything so I gave up.

I have a feeling that we do have a rolling clean out of objects in
these buckets, so we might be only keeping 3 months of data for some
customers, 6 months for others, 12 months for others etc.  But, I
think one of our guys mentioned that the cleanup might not be getting
rid of buckets, only the files in them.  So, I may have to get our
dev guys to revisit this and see if we can clean up a crapload of
empty buckets.


On Tue, 2023-08-01 at 08:37 +, Eugen Block wrote:
Thanks. Just for reference I'm quoting the SUSE doc [1] you mentioned
because it explains what you already summarized:

User indices are not sharded, in other words we store all the keys
of names of buckets under one object. This can cause large objects
to be found. The large object is only accessed in the List All
Buckets S3/Swift API. Unlike bucket indices, the large object is not
exactly in the object IO path. Depending on the use case for so many
buckets, the warning isn't dangerous as the large object is only
used for the List All Buckets API.
The error shows a user has 500K buckets causing the omap issue.
Sharding does not occur at the user level. Bucket indexes are
sharded but buckets per user is not (and usually the default
max_bucket is 1000).

Does this mean that you actually have a user with around 400k
buckets?
If you can't delete unused buckets (you already ruled out creating
multiple users) there's probably no way around increasing the
threshold, I guess. I'm not the biggest RGW expert but we have a few
customers where the threshold was actually increased to the previous
default to get rid of the warning (if other actions were not
possible). So far we didn't get any reports causing any issue at all.
But I'd be curious if the devs or someone with more experience has a
better advice.

[1] https://www.suse.com/support/kb/doc/?id=19698

Zitat von Mark Johnson
mailto:ma...@iovox.com>>:

Here you go.  It doesn't format very well, so I'll summarize what I'm
seeing.

5.c has 78051 OMAP_BYTES and 398 OMAP_KEYS
5.16 has 80186950 OMAP_BYTES and 401505 OMAP_KEYS

The remaining 30 PGS have zero of both.  However, the BYTES for each
PG
is very much the same at around 890 for each.


# ceph pg ls-by-pool default.rgw.meta

PG    OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES    OMAP_BYTES*
OMAP_KEYS*  LOG    STATE    SINCE  VERSION
REPORTED    UP ACTING SCRUB_STAMP
DEEP_SCRUB_STAMP LAST_SCRUB_DURATION 
SCRUB_SCHEDULING
5.0 26240 0  0    0  8909864    0
0  10076 active+clean    10h  8093'54176
8093:5396520   [21,4,12]p21   [21,4,12]p21  2023-07-
31T21:13:20.554485+  2023-07-26T03:40:27.457946+
5  periodic scrub scheduled @ 2023-08-01T23:55:14.134653+
5.1 26065 0  0    0  8840849    0
0  10029 active+clean    10h  8093'56529
8093:4891333   [14,7,23]p14   [14,7,23]p14  2023-07-
31T20:37:34.920128+  2023-07-30T10:55:16.529046+
5  periodic scrub scheduled @ 

[ceph-users] Re: Upgrading nautilus / centos7 to octopus / ubuntu 20.04. - Suggestions and hints?

2023-08-02 Thread Eugen Block

Hi,

from Ceph perspective it's supported to upgrade from N to P, you can  
safely skip O. We have done that on several clusters without any  
issues. You just need to make sure that your upgrade to N was  
complete. Just a few days ago someone tried to upgrade from O to Q  
with "require-osd-release nautilus" which broke the upgrade. We have  
plans to switch our OS as well (probably on some customer clusters as  
well) but since they all are already under cephadm management I don't  
expect major issues with that.


Regards,
Eugen

Zitat von Bailey Allison :


Hi Götz,



We’ve done a similar process which involves going from starting at  
CentOS 7 Nautilus and upgrading to Rocky 8/Ubuntu 20.04 Octopus+.




What we do is start on CentOS 7 Nautilus we upgrade to Octopus on  
CentOS 7 (we’ve built python packages and have them on our repo to  
satisfy some ceph-mgr things and such with octopus on centos 7)




From here we have a process to migrate the node from CentOS 7 to  
Rocky 8 preserving the OS and stuff, but you could also just  
reinstall the OS and reinstall ceph packages/config files if need be.




Once on Rocky 8 and Octopus we then upgrade the ceph versions further.



The order of the upgrades would be like this:



CentOS 7 / Nautilus

CentOS 7 / Octopus

Rocky 8 / Octopus

Ubuntu 20 / Octopus



We aren’t changing from CentOS right to Ubuntu OS but process is  
similar enough.




The one time we did switch to Ubuntu we just did the same process  
then once on the latest ceph version just reinstalled a node at a  
time from Rocky to Ubuntu. Probably are fine to go right from CentOS  
7 to Ubuntu but we figured it’d be more reliable to go from el8 rpms  
to debs than el7 rpms to debs.




I would say make the priority matching ceph versions if possible,  
and then OS. This is what has worked for us. Other people may have  
different experiences however.




In your case, your b choice is the closest to what we would do so I  
would say that should be the safest.




Overall your a and b is kind of a mix of what we do for our upgrades.



Regards,



Bailey



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io