[ceph-users] Re: radosgw-admin hangs

2022-08-29 Thread Magdy Tawfik
Hi Boris

Thank you then  I'm not alone in the sea 

it seems Mon is fine after migration

[mm@cephadm-X~]$ sudo ceph mon stat
e29: 5 mons at {xx1.com=[v2:10.3.144.10:3300/0,v1:10.3.144.10:6789/0
],xx2=[v2:10.3.144.11:3300/0,v1:10.3.144.11:6789/0],xx3=[v2:
10.3.144.12:3300/0,v1:10.3.144.12:6789/0],xx4=[v2:
10.3.144.13:3300/0,v1:10.3.144.13:6789/0],xx5=[v2:
10.3.144.14:3300/0,v1:10.3.144.14:6789/0]}, election epoch 5676, leader 0
xx1, quorum 0,1,2,3,4 xx1,xx2,xx3,xx3,xx4
[mm@ephadm-X ~]$

Also  ceph -m IPADDRESS status works with all IPs with no issue

Any ideas where else I should look

Appreciating your help a lot

On Wed, Aug 24, 2022 at 9:22 PM Boris  wrote:

> Hi Magdy,
> maybe this helps.
>
>
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/6J5KZ7ELC7EWUS6YMKOSJ3E3JRNTHKBQ/
>
> Cheers
>  Boris
>
> Am 24.08.2022 um 22:09 schrieb Magdy Tawfik :
>
> Hi All
>
> I have a cluster with 5 MON & 3 MGR 12 OSD + RGW nodes
> was working OK with no issue
>
> I have moved 4 physical machines to VMs and redeployed  mgr/mon daemons
> since that time
>
> when trying to access radows-admin tools  it hangs with no response at
> all until killing it
> nothing gets out at all with any options
>
> Need  your advice, please
>
> Mag
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Changing the cluster network range

2022-08-29 Thread Burkhard Linke

Hi,


some years ago we changed our setup from a IPoIB cluster network to a 
single network setup, which is a similar operation.



The OSD use the cluster network for heartbeats and backfilling 
operation; both use standard tcp connection. There is no "global view" 
on the networks involved; OSDs announce their public and private network 
(if present) via an update to the OSD map on OSD boot. OSDs expect to be 
able to create TCP connections to the announced IP addresses and ports. 
Mon and mgr instances do not use the cluster network at all.


If you want to change the networks (either public or private), you need 
to ensure that during the migration TCP connectivity between the old 
networks and the new networks is possible, e.g. via a route on some 
router. Since we had an isolated IPoIB networks without any connections 
to some router, we used one of the ceph hosts as router. Worked fine for 
a migration in live production ;-)


Regarding the network size: I'm not sure whether the code requires an 
exact CIDR match for the interface. If in doubt, have a look at the 
source code


As already mentioned in another answer, most setups do not require an 
extra cluster network. It is extra effort both in setup, maintenance and 
operating. Unless your network is the bottleneck you might want to use 
this pending configuration change to switch to a single network setup.



Regards,

Burkhard Linke


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cephadm unable to upgrade/add RGW node

2022-08-29 Thread Reza Bakhshayeshi
Hi

I'm using the pacific version with cephadm. After a failed upgrade from
16.2.7 to 17.2.2, 2/3 MGR nodes stopped working (this is a known bug of
upgrade) and the orchestrator also didn't respond to rollback services, so
I had to remove the daemons and add the correct one manually by running
this command:

ceph orch daemon add mgr --placement=

As it was mentioned in some bugs I tried removing admin label and
reapplying them as well.

Now, that the cluster status is healthy, but the orchestrator still doesn't
work properly when I'm going to add RGW node. Also I cannot upgrade to a
newer version:

ceph orch host add   rgw-swift
ceph orch apply rgw swift --realm= --zone=
--placement="label:rgw-swift" --port=

I can't see any error logs. It seems like just not responding anymore.
I Also tried these commands:

ceph orch pause/cancel/resume
ceph orch module enable/disable

Do you have any idea?

Best,
Reza
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Automanage block devices

2022-08-29 Thread Dominique Ramaekers
Hi,

I really like the behavior of ceph to auto-manage block devices. But I get ceph 
status warnings if I map an image to a /dev/rbd

Some log output:
Aug 29 11:57:34 hvs002 bash[465970]: Non-zero exit code 2 from /usr/bin/docker 
run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint 
/usr/sbin/ceph-volume --privileged --group-add=disk --init -e 
CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:43f6e905f3e34abe4adbc9042b9d6f6b625dee8fa8d93c2bae53fa9b61c3df1a
 -e NODE_NAME=hvs002 -e CEPH_USE_RANDOM_NONCE=1 -e 
CEPH_VOLUME_OSDSPEC_AFFINITY=all-available-devices -e 
CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v 
/var/run/ceph/dd4b0610-b4d2-11ec-bb58-d1b32ae31585:/var/run/ceph:z -v 
/var/log/ceph/dd4b0610-b4d2-11ec-bb58-d1b32ae31585:/var/log/ceph:z -v 
/var/lib/ceph/dd4b0610-b4d2-11ec-bb58-d1b32ae31585/crash:/var/lib/ceph/crash:z 
-v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v 
/run/lock/lvm:/run/lock/lvm -v /:/rootfs -v 
/tmp/ceph-tmpke1ihnc_:/etc/ceph/ceph.conf:z -v 
/tmp/ceph-tmpaqbxw8ga:/var/lib/ceph/bootstrap-osd/ceph.keyring:z 
quay.io/ceph/ceph@sha256:43f6e905f3e34abe4adbc9042b9d6f6b625dee8fa8d93c2bae
 53fa9b61c3df1a lvm batch --no-auto /dev/rbd0 --yes --no-systemd
Aug 29 11:57:34 hvs002 bash[465970]: /usr/bin/docker: stderr  stderr: lsblk: 
/dev/rbd0: not a block device

Aug 29 11:57:34 hvs002 bash[465970]: cluster 2022-08-29T09:57:33.973654+ 
mon.hvs001 (mon.0) 34133 : cluster [WRN] Health check failed: Failed to apply 1 
service(s): osd.all-available-devices (CEPHADM_APPLY_SPEC_FAIL)

If I map a image to a rdb, the automanage feature want to add it as an osd. It 
fails (as it apparently isn't detected as a block device), so I guess my images 
are untouched, but still I worry because I can't find a lot of information 
about these warnings.

Do I risk a conflict between my operations on a mapped rbd image/device? Wil at 
some point ceph alter my image unintentionally?

Do I risk ceph to add such an image as an osd?

I can disable the managed feature of the osd-management, but then I lose 
automatic functions of ceph. Is there a way to tell ceph to exclude /dev/rdb* 
devices from the autodetect/automanage?

Greetings,

Dominique.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-dokan: Can not copy files from cephfs to windows

2022-08-29 Thread Lucian Petrut

Hi,

I couldn't reproduce the issue using those specific Ceph and Dokany builds.

Could you please check the ceph-dokan logs?

Thanks,

Lucian

On 11.08.2022 12:08, Spyros Trigazis wrote:

Hello ceph users,

I am trying to use ceph-dokan with a testing ceph cluster (versions below).

I can mount the volume in different machines and I can copy/create files
in the volume. However when I try to copy from the mounted volume to
the windows filesystem I get:

Invalid MS-DOS function

Has anyone faced the same issue?

Thanks,
Spyros

windows version: 2019 and 10
cluster version: 16.2.9-1
dokany: https://github.com/dokan-dev/dokany/releases/tag/v1.5.1.1000
client: https://cloudba.se/ceph-win-latest-pacific
C:\ProgramData\Ceph>ceph-dokan --version
ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific
(stable)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Automanage block devices

2022-08-29 Thread Dominique Ramaekers
Hi Etienne,

Maybe I didn't make myself clear...

When I map an rbd-image from my cluster to a /dev/rbd, ceph wants to 
automatically add the /dev/rbd as an OSD. This is undesirable behavior. Trying 
to add a /dev/rdb mapped to an image in the same cluster??? Scary...

Luckily the automatic creation of the OSD fails.

Nevertheless, I would feel better if ceph just doesn't try to add the /dev/rbd 
to the cluster.

Do I risk a conflict between my operations on a mapped rbd image/device?

Will at some point ceph alter my image unintentionally?

Do I risk ceph to actually add such an image as an osd?

I can disable the managed feature of the osd-management, but then I lose 
automatic functions of ceph. Is there a way to tell ceph to exclude /dev/rdb* 
devices from the autodetect/automanage?

Greetings,

Dominique.

> -Oorspronkelijk bericht-
> Van: Etienne Menguy 
> Verzonden: maandag 29 augustus 2022 13:44
> Aan: Dominique Ramaekers 
> CC: ceph-users@ceph.io
> Onderwerp: RE: Automanage block devices
> 
> Hey,
> 
> /usr/sbin/ceph-volume ... lvm batch --no-auto /dev/rbd0 You want to add an
> OSD using rbd0?
> 
> To map a block device, just use rbd map (
> https://docs.ceph.com/en/quincy/man/8/rbdmap/ )
> 
> Étienne
> 
> > -Original Message-
> > From: Dominique Ramaekers 
> > Sent: lundi 29 août 2022 12:32
> > To: ceph-users@ceph.io
> > Subject: [ceph-users] Automanage block devices
> >
> > [Some people who received this message don't often get email from
> > dominique.ramaek...@cometal.be. Learn why this is important at
> > https://aka.ms/LearnAboutSenderIdentification ]
> >
> > Hi,
> >
> > I really like the behavior of ceph to auto-manage block devices. But I
> > get ceph status warnings if I map an image to a /dev/rbd
> >
> > Some log output:
> > Aug 29 11:57:34 hvs002 bash[465970]: Non-zero exit code 2 from
> > /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host
> > -- entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk
> > --init -e
> >
> CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:43f6e905f3e34abe4adbc90
> > 42b9d6f6b625dee8fa8d93c2bae53fa9b61c3df1a -e NODE_NAME=hvs002 -e
> > CEPH_USE_RANDOM_NONCE=1 -e
> CEPH_VOLUME_OSDSPEC_AFFINITY=all-
> > available-devices -e CEPH_VOLUME_SKIP_RESTORECON=yes -e
> > CEPH_VOLUME_DEBUG=1 -v /var/run/ceph/dd4b0610-b4d2-11ec-bb58-
> > d1b32ae31585:/var/run/ceph:z -v /var/log/ceph/dd4b0610-b4d2-11ec-
> bb58-
> > d1b32ae31585:/var/log/ceph:z -v /var/lib/ceph/dd4b0610-b4d2-11ec-bb58-
> > d1b32ae31585/crash:/var/lib/ceph/crash:z -v /dev:/dev -v
> > /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
> > /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v /tmp/ceph-
> > tmpke1ihnc_:/etc/ceph/ceph.conf:z -v /tmp/ceph-
> > tmpaqbxw8ga:/var/lib/ceph/bootstrap-osd/ceph.keyring:z
> >
> quay.io/ceph/ceph@sha256:43f6e905f3e34abe4adbc9042b9d6f6b625dee8fa
> > 8d93c2bae
> >  53fa9b61c3df1a lvm batch --no-auto /dev/rbd0 --yes --no-systemd Aug
> > 29
> > 11:57:34 hvs002 bash[465970]: /usr/bin/docker: stderr  stderr: lsblk:
> > /dev/rbd0: not a block device
> >
> > Aug 29 11:57:34 hvs002 bash[465970]: cluster 2022-08-
> > 29T09:57:33.973654+ mon.hvs001 (mon.0) 34133 : cluster [WRN]
> > Health check failed: Failed to apply 1 service(s):
> > osd.all-available-devices
> > (CEPHADM_APPLY_SPEC_FAIL)
> >
> > If I map a image to a rdb, the automanage feature want to add it as an
> > osd. It fails (as it apparently isn't detected as a block device), so
> > I guess my images are untouched, but still I worry because I can't
> > find a lot of information about these warnings.
> >
> > Do I risk a conflict between my operations on a mapped rbd image/device?
> > Wil at some point ceph alter my image unintentionally?
> >
> > Do I risk ceph to add such an image as an osd?
> >
> > I can disable the managed feature of the osd-management, but then I
> > lose automatic functions of ceph. Is there a way to tell ceph to
> > exclude /dev/rdb* devices from the autodetect/automanage?
> >
> > Greetings,
> >
> > Dominique.
> >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs growing beyond full ratio

2022-08-29 Thread Wyll Ingersoll


I would think so, but it isn't happening nearly fast enough.

It's literally been over 10 days with 40 new drives across 2 new servers and 
they barely have any PGs yet. A few, but not nearly enough to help with the 
imbalance.

From: Jarett 
Sent: Sunday, August 28, 2022 8:19 PM
To: Wyll Ingersoll ; ceph-users@ceph.io 

Subject: RE: [ceph-users] OSDs growing beyond full ratio


Isn’t rebalancing onto the empty OSDs default behavior?



From: Wyll Ingersoll
Sent: Sunday, August 28, 2022 10:31 AM
To: ceph-users@ceph.io
Subject: [ceph-users] OSDs growing beyond full ratio



We have a pacific cluster that is overly filled and is having major trouble 
recovering.  We are desperate for help in improving recovery speed.  We have 
modified all of the various recovery throttling parameters.



The full_ratio is 0.95 but we have several osds that continue to grow and are 
approaching 100% utilization.  They are reweighted to almost 0, but yet 
continue to grow.

Why is this happening?  I thought the cluster would stop writing to the osd 
when it was at above the full ratio.





We have added additional capacity to the cluster but the new OSDs are being 
used very very slowly.  The primary pool in the cluster is the RGW data pool 
which is a 12+4 EC pool using "host" placement rules across 18 hosts, 2 new 
hosts with 20x10TB osds each were recently added but they are only very very 
slowly being filled up.  I don't see how to force recovery on that particular 
pool.   From what I understand, we cannot modify the EC parameters without 
destroying the pool and we cannot offload that pool to any others because there 
is no other place to store the amount of data.





We have been running "ceph osd reweight-by-utilization"  periodically and it 
works for a while (a few hours) but then recovery and backfill IO numbers drop 
to negligible values.



The balancer module will not run because the current misplaced % is about 97%.



Would it be more effective to use the osmaptool and generate a bunch of upmap 
commands to manually move data around or keep trying to get 
reweight-by-utlilization to work?



Any suggestions (other than deleting data which we cannot do at this point, the 
pools are not accessible) or adding more storage (we already did and it is not 
being utilized very heavily yet for some reason).









___

ceph-users mailing list -- ceph-users@ceph.io

To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm unable to upgrade/add RGW node

2022-08-29 Thread Reza Bakhshayeshi
I found a misconfiguration in my ceph config dump:

mgradvanced
 mgr/cephadm/migration_current  5

and changing it to 3 solved the issue and the orchestrator is back to
working properly.

That's something to do with the previous failed upgrade to Quincy, which
updated automatically.
If I understand correctly migration_current is somehow a safety feature in
the upgrade.
If you have more info, please let me know.

Regards,
Reza


On Mon, 29 Aug 2022 at 10:50, Reza Bakhshayeshi 
wrote:

> Hi
>
> I'm using the pacific version with cephadm. After a failed upgrade from
> 16.2.7 to 17.2.2, 2/3 MGR nodes stopped working (this is a known bug of
> upgrade) and the orchestrator also didn't respond to rollback services, so
> I had to remove the daemons and add the correct one manually by running
> this command:
>
> ceph orch daemon add mgr --placement=
>
> As it was mentioned in some bugs I tried removing admin label and
> reapplying them as well.
>
> Now, that the cluster status is healthy, but the orchestrator still
> doesn't work properly when I'm going to add RGW node. Also I cannot upgrade
> to a newer version:
>
> ceph orch host add   rgw-swift
> ceph orch apply rgw swift --realm= --zone=
> --placement="label:rgw-swift" --port=
>
> I can't see any error logs. It seems like just not responding anymore.
> I Also tried these commands:
>
> ceph orch pause/cancel/resume
> ceph orch module enable/disable
>
> Do you have any idea?
>
> Best,
> Reza
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs growing beyond full ratio

2022-08-29 Thread Wyll Ingersoll
Thank You!

I will see about trying these out, probably using your suggestion of several 
iterations with #1 and then #3.



From: Stefan Kooman 
Sent: Monday, August 29, 2022 1:38 AM
To: Wyll Ingersoll ; ceph-users@ceph.io 

Subject: Re: [ceph-users] OSDs growing beyond full ratio

On 8/28/22 17:30, Wyll Ingersoll wrote:
> We have a pacific cluster that is overly filled and is having major trouble 
> recovering.  We are desperate for help in improving recovery speed.  We have 
> modified all of the various recovery throttling parameters.
>
> The full_ratio is 0.95 but we have several osds that continue to grow and are 
> approaching 100% utilization.  They are reweighted to almost 0, but yet 
> continue to grow.
> Why is this happening?  I thought the cluster would stop writing to the osd 
> when it was at above the full ratio.
>
>
> We have added additional capacity to the cluster but the new OSDs are being 
> used very very slowly.  The primary pool in the cluster is the RGW data pool 
> which is a 12+4 EC pool using "host" placement rules across 18 hosts, 2 new 
> hosts with 20x10TB osds each were recently added but they are only very very 
> slowly being filled up.  I don't see how to force recovery on that particular 
> pool.   From what I understand, we cannot modify the EC parameters without 
> destroying the pool and we cannot offload that pool to any others because 
> there is no other place to store the amount of data.
>
>
> We have been running "ceph osd reweight-by-utilization"  periodically and it 
> works for a while (a few hours) but then recovery and backfill IO numbers 
> drop to negligible values.
>
> The balancer module will not run because the current misplaced % is about 97%.
>
> Would it be more effective to use the osmaptool and generate a bunch of upmap 
> commands to manually move data around or keep trying to get 
> reweight-by-utlilization to work?

I would use the script: upmap-remapped.py [1] to get your cluster
healthy again, and after that pgremapper [2] to drain PGs from the full
OSDs. At a certain point (usage) you might want to let the Ceph balancer
do it's thing. But from experience I can tell that Jonas Jelten
ceph-balancer script is currently doing a way better job [3]. Search the
list for the use / usage of the scripts (or use a search engine). With
upmaps you have more control on where PGs should go. You might want to
skip step [2] and directly try ceph-balancer [3].

Gr. Stefan

[1]:
https://gitlab.cern.ch/ceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py
[2]: https://github.com/digitalocean/pgremapper/
[3]: https://github.com/TheJJ/ceph-balancer
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Automanage block devices

2022-08-29 Thread Dominique Ramaekers
Interesting, but weird...

I use Quincy
root@hvs001:/# ceph versions
{
"mon": {
"ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy 
(stable)": 3
},
"mgr": {
"ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy 
(stable)": 2
},
"osd": {
"ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy 
(stable)": 6
},
"mds": {
"ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy 
(stable)": 2
},
"overall": {
"ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy 
(stable)": 13
}
}

And the ceph-volume inventory does show rdb devices...

root@hvs001:/# rbd --pool libvirt-pool map --image PCVIRTdra
/dev/rbd0
root@hvs001:/# ceph-volume inventory

Device Path   Size rotates available Model name
/dev/rbd0 140.00 GBFalse   False
/dev/sda  894.25 GBFalse   False MZILS960HEHP/007
/dev/sdb  894.25 GBFalse   False MZILS960HEHP/007

I put a comment on the git-hub page...

> -Oorspronkelijk bericht-
> Van: Etienne Menguy 
> Verzonden: maandag 29 augustus 2022 14:34
> Aan: Dominique Ramaekers 
> CC: ceph-users@ceph.io
> Onderwerp: RE: Automanage block devices
> 
> I now understand, thanks for explanation.
> 
> You use latest ceph-volume version? I see there was a change to ignore rbd
> devices in ceph-volume.
> 
> https://tracker.ceph.com/issues/53846
> https://github.com/ceph/ceph/pull/44604
> 
> Étienne
> 
> > -Original Message-
> > From: Dominique Ramaekers 
> > Sent: lundi 29 août 2022 14:15
> > To: Etienne Menguy 
> > Cc: ceph-users@ceph.io
> > Subject: RE: Automanage block devices
> >
> > [You don't often get email from dominique.ramaek...@cometal.be. Learn
> > why this is important at https://aka.ms/LearnAboutSenderIdentification
> > ]
> >
> > Hi Etienne,
> >
> > Maybe I didn't make myself clear...
> >
> > When I map an rbd-image from my cluster to a /dev/rbd, ceph wants to
> > automatically add the /dev/rbd as an OSD. This is undesirable behavior.
> > Trying to add a /dev/rdb mapped to an image in the same cluster???
> Scary...
> >
> > Luckily the automatic creation of the OSD fails.
> >
> > Nevertheless, I would feel better if ceph just doesn't try to add the
> > /dev/rbd to the cluster.
> >
> > Do I risk a conflict between my operations on a mapped rbd image/device?
> >
> > Will at some point ceph alter my image unintentionally?
> >
> > Do I risk ceph to actually add such an image as an osd?
> >
> > I can disable the managed feature of the osd-management, but then I
> > lose automatic functions of ceph. Is there a way to tell ceph to
> > exclude /dev/rdb* devices from the autodetect/automanage?
> >
> > Greetings,
> >
> > Dominique.
> >
> > > -Oorspronkelijk bericht-
> > > Van: Etienne Menguy 
> > > Verzonden: maandag 29 augustus 2022 13:44
> > > Aan: Dominique Ramaekers 
> > > CC: ceph-users@ceph.io
> > > Onderwerp: RE: Automanage block devices
> > >
> > > Hey,
> > >
> > > /usr/sbin/ceph-volume ... lvm batch --no-auto /dev/rbd0 You want to
> > > add an OSD using rbd0?
> > >
> > > To map a block device, just use rbd map (
> > >
> >
> https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs
> > >
> >
> .ceph.com%2Fen%2Fquincy%2Fman%2F8%2Frbdmap%2F&data=05%7C
> > 01%7Cetien
> > >
> >
> ne.menguy%40ubisoft.com%7C4738083baf3f47d6193008da89b80c84%7Ce01
> > bd386f
> > >
> >
> a514210a2a429e5ab6f7ab1%7C0%7C0%7C637973720834251057%7CUnknown
> > %7CTWFpb
> > >
> >
> GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> > 6Mn0
> > >
> >
> %3D%7C3000%7C%7C%7C&sdata=5b6nh0%2BoSl2eGzd3gbU7CgegmW
> > Axluxemi%2F8
> > > TYEdeTc%3D&reserved=0 )
> > >
> > > Étienne
> > >
> > > > -Original Message-
> > > > From: Dominique Ramaekers 
> > > > Sent: lundi 29 août 2022 12:32
> > > > To: ceph-users@ceph.io
> > > > Subject: [ceph-users] Automanage block devices
> > > >
> > > > [Some people who received this message don't often get email from
> > > > dominique.ramaek...@cometal.be. Learn why this is important at
> > > > https://aka.ms/LearnAboutSenderIdentification ]
> > > >
> > > > Hi,
> > > >
> > > > I really like the behavior of ceph to auto-manage block devices.
> > > > But I get ceph status warnings if I map an image to a /dev/rbd
> > > >
> > > > Some log output:
> > > > Aug 29 11:57:34 hvs002 bash[465970]: Non-zero exit code 2 from
> > > > /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM
> > > > --net=host
> > > > -- entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk
> > > > --init -e
> > > >
> > >
> >
> CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:43f6e905f3e34abe4adbc90
> > > > 42b9d6f6b625dee8fa8d93c2bae53fa9b61c3df1a -e
> NODE_NAME=hvs002 -
> > e
> > > > CEPH_USE_RANDOM_NONCE=1 -e
> > > CEPH_VOLUME_OSDSPEC_AFFINITY=all-
> > > > available-devices -e CEPH_VOLUME_SKIP_RESTORECON=yes -e
> > > > CEPH_VOLUME_DEBUG=1

[ceph-users] Re: Automanage block devices

2022-08-29 Thread Robert Sander

Am 29.08.22 um 14:14 schrieb Dominique Ramaekers:


Nevertheless, I would feel better if ceph just doesn't try to add the /dev/rbd 
to the cluster.


It looks like your drivegroup specification is too generic.

Can you post the YAML for that here?

You should be as specific as possible with the specification, i.e. 
include vendor or model information or sizes and HDD or SSD type.


This way RBDs would not be included when the orchestrator searches for 
new devices.


BTW: It is quite unusual mapping RBDs on OSD nodes. Do you run a 
hypercoverged setup?


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 220009 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Bug in crush algorithm? 1 PG with the same OSD twice.

2022-08-29 Thread Dan van der Ster
Hi Frank,

CRUSH can only find 5 OSDs, given your current tree, rule, and
reweights. This is why there is a NONE in the UP set for shard 6.
But in ACTING we see that it is refusing to remove shard 6 from osd.1
-- that is the only copy of that shard, so in this case it's helping
you rather than deleting the shard altogether.
ACTING == what the OSDs are serving now.
UP == where CRUSH wants to place the shards.

I suspect that this is a case of CRUSH tunables + your reweights
putting CRUSH in a corner case of not finding 6 OSDs for that
particular PG.
If you set the reweights all back to 1, it probably finds 6 OSDs?

Cheers, Dan


On Mon, Aug 29, 2022 at 4:44 PM Frank Schilder  wrote:
>
> Hi all,
>
> I'm investigating a problem with a degenerated PG on an octopus 15.2.16 test 
> cluster. It has 3Hosts x 3OSDs and a 4+2 EC pool with failure domain OSD. 
> After simulating a disk fail by removing an OSD and letting the cluster 
> recover (all under load), I end up with a PG with the same OSD allocated 
> twice:
>
> PG 4.1c, UP: [6,1,4,5,3,NONE] ACTING: [6,1,4,5,3,1]
>
> OSD 1 is allocated twice. How is this even possible?
>
> Here the OSD tree:
>
> ID  CLASS  WEIGHT   TYPE NAME  STATUS REWEIGHT  PRI-AFF
> -1 2.44798  root default
> -7 0.81599  host tceph-01
>  0hdd  0.27199  osd.0 up   0.87999  1.0
>  3hdd  0.27199  osd.3 up   0.98000  1.0
>  6hdd  0.27199  osd.6 up   0.92999  1.0
> -3 0.81599  host tceph-02
>  2hdd  0.27199  osd.2 up   0.95999  1.0
>  4hdd  0.27199  osd.4 up   0.8  1.0
>  8hdd  0.27199  osd.8 up   0.8  1.0
> -5 0.81599  host tceph-03
>  1hdd  0.27199  osd.1 up   0.8  1.0
>  5hdd  0.27199  osd.5 up   1.0  1.0
>  7hdd  0.27199  osd.7  destroyed 0  1.0
>
> I tried already to change some tunables thinking about 
> https://docs.ceph.com/en/octopus/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon,
>  but giving up too soon is obviously not the problem. It is accepting a wrong 
> mapping.
>
> Is there a way out of this? Clearly this is calling for trouble if not data 
> loss and should not happen at all.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Downside of many rgw bucket shards?

2022-08-29 Thread Boris Behrens
Hi there,

I have some buckets that would require >100 shards and I would like to ask
if there are any downsides to have these many shards on a bucket?

Cheers
 Boris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Downside of many rgw bucket shards?

2022-08-29 Thread J. Eric Ivancich
Generally it’s a good thing. There’s less contention for bucket index updates 
when, for example, lots of writes are happening together. Dynamic resharding 
will take things up to 1999 shards on its own with the default config.

Given that we use hashing of objet names to determine which shard they go to, 
the most complicated operation is bucket listing, which has to retrieve entries 
from each shard, order them, and return them to the client. And it has to do 
this in batches of about 1000 at a time.

It looks like you’re expecting on the order of 10,000,000 objects in these 
buckets, so I imagine you’re not going to be listing them with any regularity.

Eric
(he/him)

> On Aug 29, 2022, at 12:06 PM, Boris Behrens  wrote:
> 
> Hi there,
> 
> I have some buckets that would require >100 shards and I would like to ask
> if there are any downsides to have these many shards on a bucket?
> 
> Cheers
> Boris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Downside of many rgw bucket shards?

2022-08-29 Thread Anthony D'Atri
Do I recall that the number of shards is ideally odd, or even prime?  
Performance might be increased by indexless buckets if the application can 
handle

> On Aug 29, 2022, at 10:06 AM, J. Eric Ivancich  wrote:
> 
> Generally it’s a good thing. There’s less contention for bucket index 
> updates when, for example, lots of writes are happening together. Dynamic 
> resharding will take things up to 1999 shards on its own with the default 
> config.
> 
> Given that we use hashing of objet names to determine which shard they go to, 
> the most complicated operation is bucket listing, which has to retrieve 
> entries from each shard, order them, and return them to the client. And it 
> has to do this in batches of about 1000 at a time.
> 
> It looks like you’re expecting on the order of 10,000,000 objects in these 
> buckets, so I imagine you’re not going to be listing them with any regularity.
> 
> Eric
> (he/him)
> 
>> On Aug 29, 2022, at 12:06 PM, Boris Behrens  wrote:
>> 
>> Hi there,
>> 
>> I have some buckets that would require >100 shards and I would like to ask
>> if there are any downsides to have these many shards on a bucket?
>> 
>> Cheers
>> Boris
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Downside of many rgw bucket shards?

2022-08-29 Thread Matt Benjamin
We choose prime number shard counts, yes.
Indexless buckets do increase insert-delete performance, but by definition,
though, an indexless bucket cannot be listed.

Matt

On Mon, Aug 29, 2022 at 1:46 PM Anthony D'Atri 
wrote:

> Do I recall that the number of shards is ideally odd, or even prime?
> Performance might be increased by indexless buckets if the application can
> handle
>
> > On Aug 29, 2022, at 10:06 AM, J. Eric Ivancich 
> wrote:
> >
> > Generally it’s a good thing. There’s less contention for bucket index
> updates when, for example, lots of writes are happening together. Dynamic
> resharding will take things up to 1999 shards on its own with the default
> config.
> >
> > Given that we use hashing of objet names to determine which shard they
> go to, the most complicated operation is bucket listing, which has to
> retrieve entries from each shard, order them, and return them to the
> client. And it has to do this in batches of about 1000 at a time.
> >
> > It looks like you’re expecting on the order of 10,000,000 objects in
> these buckets, so I imagine you’re not going to be listing them with any
> regularity.
> >
> > Eric
> > (he/him)
> >
> >> On Aug 29, 2022, at 12:06 PM, Boris Behrens  wrote:
> >>
> >> Hi there,
> >>
> >> I have some buckets that would require >100 shards and I would like to
> ask
> >> if there are any downsides to have these many shards on a bucket?
> >>
> >> Cheers
> >> Boris
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs growing beyond full ratio

2022-08-29 Thread Wyll Ingersoll


Can anyone explain why OSDs (ceph pacific, bluestore osds) continue to grow 
well after they have exceeded the "full" level (95%) and is there any way to 
stop this?

"The full_ratio is 0.95 but we have several osds that continue to grow and are 
approaching 100% utilization.  They are reweighted to almost 0, but yet 
continue to grow.
Why is this happening?  I thought the cluster would stop writing to the osd 
when it was at above the full ratio."

thanks...


From: Wyll Ingersoll 
Sent: Monday, August 29, 2022 9:24 AM
To: Jarett ; ceph-users@ceph.io 
Subject: [ceph-users] Re: OSDs growing beyond full ratio


I would think so, but it isn't happening nearly fast enough.

It's literally been over 10 days with 40 new drives across 2 new servers and 
they barely have any PGs yet. A few, but not nearly enough to help with the 
imbalance.

From: Jarett 
Sent: Sunday, August 28, 2022 8:19 PM
To: Wyll Ingersoll ; ceph-users@ceph.io 

Subject: RE: [ceph-users] OSDs growing beyond full ratio


Isn’t rebalancing onto the empty OSDs default behavior?



From: Wyll Ingersoll
Sent: Sunday, August 28, 2022 10:31 AM
To: ceph-users@ceph.io
Subject: [ceph-users] OSDs growing beyond full ratio



We have a pacific cluster that is overly filled and is having major trouble 
recovering.  We are desperate for help in improving recovery speed.  We have 
modified all of the various recovery throttling parameters.



The full_ratio is 0.95 but we have several osds that continue to grow and are 
approaching 100% utilization.  They are reweighted to almost 0, but yet 
continue to grow.

Why is this happening?  I thought the cluster would stop writing to the osd 
when it was at above the full ratio.





We have added additional capacity to the cluster but the new OSDs are being 
used very very slowly.  The primary pool in the cluster is the RGW data pool 
which is a 12+4 EC pool using "host" placement rules across 18 hosts, 2 new 
hosts with 20x10TB osds each were recently added but they are only very very 
slowly being filled up.  I don't see how to force recovery on that particular 
pool.   From what I understand, we cannot modify the EC parameters without 
destroying the pool and we cannot offload that pool to any others because there 
is no other place to store the amount of data.





We have been running "ceph osd reweight-by-utilization"  periodically and it 
works for a while (a few hours) but then recovery and backfill IO numbers drop 
to negligible values.



The balancer module will not run because the current misplaced % is about 97%.



Would it be more effective to use the osmaptool and generate a bunch of upmap 
commands to manually move data around or keep trying to get 
reweight-by-utlilization to work?



Any suggestions (other than deleting data which we cannot do at this point, the 
pools are not accessible) or adding more storage (we already did and it is not 
being utilized very heavily yet for some reason).









___

ceph-users mailing list -- ceph-users@ceph.io

To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs growing beyond full ratio

2022-08-29 Thread Dave Schulz

Hi Wyll,

Any chance you're using CephFS and have some really large files in the 
CephFS filesystem?  Erasure coding? I recently encountered a similar 
problem and as soon as the end-user deleted the really large files our 
problem became much more managable.


I had issues reweighting OSDs too and in the end I changed the crush 
weights and had to chase them around every couple of days reweighting 
the OSDs >70% to zero and then setting them back to 12 when they were 
mostly empty (12TB spinning rust buckets).  Note that I'm really not 
recommending this course of action it's just the only option that seemed 
to have any effect.


-Dave

On 2022-08-29 3:00 p.m., Wyll Ingersoll wrote:

[△EXTERNAL]



Can anyone explain why OSDs (ceph pacific, bluestore osds) continue to grow well after 
they have exceeded the "full" level (95%) and is there any way to stop this?

"The full_ratio is 0.95 but we have several osds that continue to grow and are 
approaching 100% utilization.  They are reweighted to almost 0, but yet continue to 
grow.
Why is this happening?  I thought the cluster would stop writing to the osd when it 
was at above the full ratio."

thanks...


From: Wyll Ingersoll 
Sent: Monday, August 29, 2022 9:24 AM
To: Jarett ; ceph-users@ceph.io 
Subject: [ceph-users] Re: OSDs growing beyond full ratio


I would think so, but it isn't happening nearly fast enough.

It's literally been over 10 days with 40 new drives across 2 new servers and 
they barely have any PGs yet. A few, but not nearly enough to help with the 
imbalance.

From: Jarett 
Sent: Sunday, August 28, 2022 8:19 PM
To: Wyll Ingersoll ; ceph-users@ceph.io 

Subject: RE: [ceph-users] OSDs growing beyond full ratio


Isn’t rebalancing onto the empty OSDs default behavior?



From: Wyll Ingersoll
Sent: Sunday, August 28, 2022 10:31 AM
To: ceph-users@ceph.io
Subject: [ceph-users] OSDs growing beyond full ratio



We have a pacific cluster that is overly filled and is having major trouble 
recovering.  We are desperate for help in improving recovery speed.  We have 
modified all of the various recovery throttling parameters.



The full_ratio is 0.95 but we have several osds that continue to grow and are 
approaching 100% utilization.  They are reweighted to almost 0, but yet 
continue to grow.

Why is this happening?  I thought the cluster would stop writing to the osd 
when it was at above the full ratio.





We have added additional capacity to the cluster but the new OSDs are being used very 
very slowly.  The primary pool in the cluster is the RGW data pool which is a 12+4 EC 
pool using "host" placement rules across 18 hosts, 2 new hosts with 20x10TB 
osds each were recently added but they are only very very slowly being filled up.  I 
don't see how to force recovery on that particular pool.   From what I understand, we 
cannot modify the EC parameters without destroying the pool and we cannot offload that 
pool to any others because there is no other place to store the amount of data.





We have been running "ceph osd reweight-by-utilization"  periodically and it 
works for a while (a few hours) but then recovery and backfill IO numbers drop to 
negligible values.



The balancer module will not run because the current misplaced % is about 97%.



Would it be more effective to use the osmaptool and generate a bunch of upmap 
commands to manually move data around or keep trying to get 
reweight-by-utlilization to work?



Any suggestions (other than deleting data which we cannot do at this point, the 
pools are not accessible) or adding more storage (we already did and it is not 
being utilized very heavily yet for some reason).









___

ceph-users mailing list -- ceph-users@ceph.io

To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd-mirror stops replaying journal on primary cluster

2022-08-29 Thread Josef Johansson
Hi,

There's nothing special in the cluster when it stops replaying. It
seems that a journal entry that the local replayer doesn't handle and
just stops. Since it's the local replayer that stops there's no logs
in rbd-mirror. The odd part is that rbd-mirror handles this totally
fine and is the one syncing correctly.

What's worse is that this is reported as HEALTHY in status
information, even though when restarting that VM it will stall until
replaying is complete. The replay function inside rbd client seems to
be fine handling the journal, but only on start of the vm. I will try
to get a ticket open on tracker.ceph.com as soon as my account is
approved.

I have tried to see what component is responsible for local replay but
I have not been successful yet.

Thanks for answering :)

On Mon, Aug 22, 2022 at 11:05 AM Eugen Block  wrote:
>
> Hi,
>
> IIRC the rbd mirror journals will grow if the sync stops to work,
> which seems to be the case here. Does the primary cluster experience
> any high load when the replay stops? How is the connection between the
> two sites and is the link saturated? Does the rbd-mirror log reveal
> anything useful (maybe also in debug mode)?
>
> Regards,
> Eugen
>
> Zitat von Josef Johansson :
>
> > Hi,
> >
> > I'm running ceph octopus 15.2.16 and I'm trying out two way mirroring.
> >
> > Everything seems to running fine except sometimes when the replay
> > stops at the primary clusters.
> >
> > This means that VMs will not start properly until all journal
> > entries are replayed, but also that the journal grows by time.
> >
> > I am trying to find out why this occurs, and where to look for more
> > information.
> >
> > I am currently using rbd --pool  --image  journal
> > status to see if the clients are in sync or not.
> >
> > Example output when things went sideways
> >
> > minimum_set: 0
> > active_set: 2
> > registered clients:
> > [id=, commit_position=[positions=[[object_number=0, tag_tid=1,
> > entry_tid=4592], [object_number=3, tag_tid=1, entry_tid=4591],
> > [object_number=2, tag_tid=1, entry_tid=4590], [object_number=1,
> > tag_tid=1, entry_tid=4589]]], state=connected]
> > [id=bdde9b90-df26-4e3d-84b3-66605dc45608,
> > commit_position=[positions=[[object_number=5, tag_tid=1,
> > entry_tid=19913], [object_number=4, tag_tid=1, entry_tid=19912],
> > [object_number=7, tag_tid=1, entry_tid=19911], [object_number=6,
> > tag_tid=1, entry_tid=19910]]], state=disconnected]
> >
> > Right now I'm trying to catch it red handed in the primary osd logs.
> > But I'm not even sure if that's the process that is replaying the
> > journal..
> >
> > Regards
> > Josef
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io