[ceph-users] Re: Grafana service fails to start due to bad directory name after Quincy upgrade

2023-06-23 Thread Nizamudeen A
Hi,

You can upgrade the grafana version individually by setting the config_opt
for grafana container image like:
ceph config set mgr mgr/cephadm/container_image_grafana
quay.io/ceph/ceph-grafana:8.3.5

and then redeploy the grafana container again either via dashboard or
cephadm.

Regards,
Nizam



On Fri, Jun 23, 2023 at 12:05 AM Adiga, Anantha 
wrote:

> Hi Eugen,
>
> Thank you so much for the details.  Here is the update (comments in-line
> >>):
>
> Regards,
> Anantha
> -Original Message-
> From: Eugen Block 
> Sent: Monday, June 19, 2023 5:27 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] Re: Grafana service fails to start due to bad
> directory name after Quincy upgrade
>
> Hi,
>
> so grafana is starting successfully now? What did you change?
> >>  I stopped and removed the Grafana image and  started it from "Ceph
> Dashboard" service. The version is still 6.7.4. I also had to change the
> following.
> I do not have a way to make  this permanent, if the service is redeployed
> I  will lose  the changes.
> I did not save the file that cephadm generated. This was one reason why
> Grafana service would not start. I had replace it with the one below to
> resolve this issue.
> [users]
>   default_theme = light
> [auth.anonymous]
>   enabled = true
>   org_name = 'Main Org.'
>   org_role = 'Viewer'
> [server]
>   domain = 'bootstrap.storage.lab'
>   protocol = https
>   cert_file = /etc/grafana/certs/cert_file
>   cert_key = /etc/grafana/certs/cert_key
>   http_port = 3000
>   http_addr =
> [snapshots]
>   external_enabled = false
> [security]
>   disable_initial_admin_creation = false
>   cookie_secure = true
>   cookie_samesite = none
>   allow_embedding = true
>   admin_password = paswd-value
>   admin_user = user-name
>
> Also this was the other change:
> # This file is generated by cephadm.
> apiVersion: 1   <--  This was the line added to
> var/lib/ceph/d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e/grafana.fl31ca104ja0201/etc/grafana/provisioning/datasources/ceph-dashboard.yml
> >>
> Regarding the container images, yes there are defaults in cephadm which
> can be overridden with ceph config. Can you share this output?
>
> ceph config dump | grep container_image
> >>
> Here it is
> root@fl31ca104ja0201:/# ceph config dump | grep container_image
> global   basic
>  container_image
> quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e
> *
> mgr  advanced
> mgr/cephadm/container_image_alertmanager
> docker.io/prom/alertmanager:v0.16.2
>   *
> mgr  advanced
> mgr/cephadm/container_image_base   quay.io/ceph/daemon
> mgr  advanced
> mgr/cephadm/container_image_grafanadocker.io/grafana/grafana:6.7.4
>   *
> mgr  advanced
> mgr/cephadm/container_image_node_exporter
> docker.io/prom/node-exporter:v0.17.0
>  *
> mgr  advanced
> mgr/cephadm/container_image_prometheus
> docker.io/prom/prometheus:v2.7.2
>  *
> client.rgw.default.default.fl31ca104ja0201.ninovsbasic
>  container_image
> quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e
> *
> client.rgw.default.default.fl31ca104ja0202.yhjkmbbasic
>  container_image
> quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e
> *
> client.rgw.default.default.fl31ca104ja0203.fqnriqbasic
>  container_image
> quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e
> *
> >>
> I tend to always use a specific image as described here [2]. I also
> haven't deployed grafana via dashboard yet so I can't really comment on
> that as well as on the warnings you report.
>
>
> >>OK. The need for that is, in Quincy when you enable Loki and Promtail,
> to view the daemon logs Ceph board pulls in Grafana  dashboard. I will let
> you know once that issue is resolved.
>
> Regards,
> Eugen
>
> [2]
>
> https://docs.ceph.com/en/latest/cephadm/services/monitoring/#using-custom-images
> >> Thank you I am following the document now
>
> Zitat von "Adiga, Anantha" :
>
> > Hi Eugene,
> >
> > Thank you for your response, here is the update.
> >
> > The upgrade to Quincy was done  following the cephadm orch upgrade
> > procedure ceph orch upgrade start --image quay.io/ceph/ceph:v17.2.6
> >
> > Upgrade completed with out errors. After the upgrade, upon creating
> > the Grafana service from Ceph dashboard, it deployed Grafana 6.7.4.
> > The version is hardcoded in the code, should it not be 8.3.5 as listed
> > below in Quincy documentation? See below
> >
> > [Grafana service started from Cephdashboard]
> >
> > Qu

[ceph-users] Re: changing crush map on the fly?

2023-06-23 Thread Nino Kotur
You are correct, but that will involve massive data movement.

You can change the failure domain osd/host/rack/datacenter/etc...
You can change the replica_count=2,3,4,5,6
You *CAN'T* change the EC value eg. 4+2 to something else



Kind regards,
Nino


On Fri, Jun 23, 2023 at 12:40 AM Angelo Höngens  wrote:

> Hey,
>
> Just to confirm my understanding: If I set up a 3-osd cluster really
> fast with an EC42 pool, and I set the crush map to osd failover
> domain, the data will be distributed among the osd's, and of course
> there won't be protection against host failure. And yes, I know that's
> a bad idea, but I need the extra storage really fast, and it's a
> backup of other data. So availability is important, but now critical.
>
> If I then add 5 more hosts a week later, I can just edit the crush map
> and change the failover domain from osd to host, put the crush map
> back in, and ceph should automatically distribute all the pg's over
> the osd's again to be fully host-fault tolerant, right?
>
> Am I understanding this correctly?
>
> Angelo.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph iSCSI GW is too slow when compared with Raw RBD performance

2023-06-23 Thread Maged Mokhtar



On 23/06/2023 04:18, Work Ceph wrote:

Hello guys,

We have a Ceph cluster that runs just fine with Ceph Octopus; we use RBD
for some workloads, RadosGW (via S3) for others, and iSCSI for some Windows
clients.

We started noticing some unexpected performance issues with iSCSI. I mean,
an SSD pool is reaching 100MB of write speed for an image, when it can
reach up to 600MB+ of write speed for the same image when mounted and
consumed directly via RBD.

Is that performance degradation expected? We would expect some degradation,
but not as much as this one.


Can't say on ceph-iscsi since we use a kernel based rbd backstore, but 
generally you should change the Windows iSCSI initiator registry setting


MaxTransferLength from 256KB -> 4MB
(a reboot is required)

This will have a large impact for large block writes performance, as the 
default in Windows is to chop such writes into 256 KB blocks which is 
too low for distributed systems where latency is higher than traditional 
SANs. It will also improve smaller sequential write performance that are 
buffered such as regular Windows file copy, as the Windows page cache 
will buffer those to 1MB in size.




Also, we have a question regarding the use of Intel Turbo boost. Should we
disable it? Is it possible that the root cause of the slowness in the iSCSI
GW is caused by the use of Intel Turbo boost feature, which reduces the
clock of some cores?


I would not recommend this, best is to set the highest sustained/steady 
state performance speced for the cpu. Best is to set the governor to 
performance and disable c-states


cpupower idle-set -D 0
cpupower frequency-set -g performance

/Maged



Any feedback is much appreciated.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Grafana service fails to start due to bad directory name after Quincy upgrade

2023-06-23 Thread Adiga, Anantha
Hi Nizam,

Thanks much for the detail.


Regards,
Anantha



From: Nizamudeen A 
Sent: Friday, June 23, 2023 12:25 AM
To: Adiga, Anantha 
Cc: Eugen Block ; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Grafana service fails to start due to bad 
directory name after Quincy upgrade

Hi,

You can upgrade the grafana version individually by setting the config_opt for 
grafana container image like:
ceph config set mgr mgr/cephadm/container_image_grafana  
quay.io/ceph/ceph-grafana:8.3.5

and then redeploy the grafana container again either via dashboard or cephadm.

Regards,
Nizam



On Fri, Jun 23, 2023 at 12:05 AM Adiga, Anantha 
mailto:anantha.ad...@intel.com>> wrote:
Hi Eugen,

Thank you so much for the details.  Here is the update (comments in-line >>):

Regards,
Anantha
-Original Message-
From: Eugen Block mailto:ebl...@nde.ag>>
Sent: Monday, June 19, 2023 5:27 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Grafana service fails to start due to bad directory 
name after Quincy upgrade

Hi,

so grafana is starting successfully now? What did you change?
>>  I stopped and removed the Grafana image and  started it from "Ceph 
>> Dashboard" service. The version is still 6.7.4. I also had to change the 
>> following.
I do not have a way to make  this permanent, if the service is redeployed I  
will lose  the changes.
I did not save the file that cephadm generated. This was one reason why  
Grafana service would not start. I had replace it with the one below to resolve 
this issue.
[users]
  default_theme = light
[auth.anonymous]
  enabled = true
  org_name = 'Main Org.'
  org_role = 'Viewer'
[server]
  domain = 'bootstrap.storage.lab'
  protocol = https
  cert_file = /etc/grafana/certs/cert_file
  cert_key = /etc/grafana/certs/cert_key
  http_port = 3000
  http_addr =
[snapshots]
  external_enabled = false
[security]
  disable_initial_admin_creation = false
  cookie_secure = true
  cookie_samesite = none
  allow_embedding = true
  admin_password = paswd-value
  admin_user = user-name

Also this was the other change:
# This file is generated by cephadm.
apiVersion: 1   <--  This was the line added to 
var/lib/ceph/d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e/grafana.fl31ca104ja0201/etc/grafana/provisioning/datasources/ceph-dashboard.yml
>>
Regarding the container images, yes there are defaults in cephadm which can be 
overridden with ceph config. Can you share this output?

ceph config dump | grep container_image
>>
Here it is
root@fl31ca104ja0201:/# ceph config dump | grep container_image
global   basic 
container_image
quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e
  *
mgr  advanced  
mgr/cephadm/container_image_alertmanager   
docker.io/prom/alertmanager:v0.16.2 
   *
mgr  advanced  
mgr/cephadm/container_image_base   
quay.io/ceph/daemon
mgr  advanced  
mgr/cephadm/container_image_grafana
docker.io/grafana/grafana:6.7.4 
   *
mgr  advanced  
mgr/cephadm/container_image_node_exporter  
docker.io/prom/node-exporter:v0.17.0
   *
mgr  advanced  
mgr/cephadm/container_image_prometheus 
docker.io/prom/prometheus:v2.7.2   
*
client.rgw.default.default.fl31ca104ja0201.ninovsbasic 
container_image
quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e
  *
client.rgw.default.default.fl31ca104ja0202.yhjkmbbasic 
container_image
quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e
  *
client.rgw.default.default.fl31ca104ja0203.fqnriqbasic 
container_image
quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e
  *
>>
I tend to always use a

[ceph-users] cephfs - unable to create new subvolume

2023-06-23 Thread karon karon
Hello,

I recently use cephfs in version 17.2.6
I have a pool named "*data*" and a fs "*kube*"
it was working fine until a few days ago, now i can no longer create a new
subvolume*, *it gives me the following error:

Error EINVAL: invalid value specified for ceph.dir.subvolume
>

here is the command used:

ceph fs subvolume create kube newcsivol --pool_layout data
>

from what I understand it seems that it creates the subvolume but
immediately puts it in the trash !? here is the log :

2023-06-23T08:30:53.307+ 7f2b929d2700  0 log_channel(audit) log [DBG] :
> from='client.86289 -' entity='client.admin' cmd=[{"prefix": "fs subvolume
> create", "vol_name": "kube", "sub_name": "newcsivol", "group_name": "csi",
> "pool_layout": "data", "target": ["mon-mgr", ""]}]: dispatch
> 2023-06-23T08:30:53.307+ 7f2b8a1d1700  0 [volumes INFO volumes.module]
> Starting _cmd_fs_subvolume_create(group_name:csi, pool_layout:data,
> prefix:fs subvolume create, sub_name:newcsivol, target:['mon-mgr', ''],
> vol_name:kube) < ""
> 2023-06-23T08:30:53.327+ 7f2b8a1d1700  0 [volumes INFO
> volumes.fs.operations.versions.subvolume_v2] cleaning up subvolume with
> path: newcsivol
> 2023-06-23T08:30:53.331+ 7f2b8a1d1700  0 [volumes INFO
> volumes.fs.operations.versions.subvolume_base] subvolume path
> 'b'/volumes/csi/newcsivol'' moved to trashcan
> 2023-06-23T08:30:53.331+ 7f2b8a1d1700  0 [volumes INFO
> volumes.fs.async_job] queuing job for volume 'kube'
> 2023-06-23T08:30:53.335+ 7f2b8a1d1700  0 [volumes INFO volumes.module]
> Finishing _cmd_fs_subvolume_create(group_name:csi, pool_layout:data,
> prefix:fs subvolume create, sub_name:newcsivol, target:['mon-mgr', ''],
> vol_name:kube) < ""
> 2023-06-23T08:30:53.335+ 7f2b8a1d1700 -1 mgr.server reply reply (22)
> Invalid argument invalid value specified for ceph.dir.subvolume
> 2023-06-23T08:30:53.339+ 7f2b461bf700 -1 client.0 error registering
> admin socket command: (17) File exists
> 2023-06-23T08:30:53.339+ 7f2b461bf700 -1 client.0 error registering
> admin socket command: (17) File exists
> 2023-06-23T08:30:53.339+ 7f2b461bf700 -1 client.0 error registering
> admin socket command: (17) File exists
> 2023-06-23T08:30:53.339+ 7f2b461bf700 -1 client.0 error registering
> admin socket command: (17) File exists
> 2023-06-23T08:30:53.339+ 7f2b461bf700 -1 client.0 error registering
> admin socket command: (17) File exists
> 2023-06-23T08:30:53.363+ 7f2b461bf700 -1 client.0 error registering
> admin socket command: (17) File exists
> 2023-06-23T08:30:53.363+ 7f2b461bf700 -1 client.0 error registering
> admin socket command: (17) File exists
> 2023-06-23T08:30:53.363+ 7f2b461bf700 -1 client.0 error registering
> admin socket command: (17) File exists
> 2023-06-23T08:30:53.363+ 7f2b461bf700 -1 client.0 error registering
> admin socket command: (17) File exists
> 2023-06-23T08:30:53.363+ 7f2b461bf700 -1 client.0 error registering
> admin socket command: (17) File exists
> 2023-06-23T08:30:53.383+ 7f2b479c2700 -1 client.0 error registering
> admin socket command: (17) File exists
> 2023-06-23T08:30:53.383+ 7f2b479c2700 -1 client.0 error registering
> admin socket command: (17) File exists
> 2023-06-23T08:30:53.383+ 7f2b479c2700 -1 client.0 error registering
> admin socket command: (17) File exists
> 2023-06-23T08:30:53.383+ 7f2b479c2700 -1 client.0 error registering
> admin socket command: (17) File exists
> 2023-06-23T08:30:53.383+ 7f2b479c2700 -1 client.0 error registering
> admin socket command: (17) File exists
> 2023-06-23T08:30:53.507+ 7f2b3ff33700  0 [prometheus INFO
> cherrypy.access.139824530773776] 192.168.240.231 - - [23/Jun/2023:08:30:53]
> "GET /metrics HTTP/1.1" 200 194558 "" "Prometheus/2.33.4"
> 2023-06-23T08:30:54.219+ 7f2b3ddaf700  0 [dashboard INFO request] [
> 172.29.2.142:33040] [GET] [200] [0.003s] [admin] [22.0B]
> /api/prometheus/notifications
> 2023-06-23T08:30:54.223+ 7f2b929d2700  0 log_channel(audit) log [DBG]
> : from='mon.0 -' entity='mon.' cmd=[{"prefix": "balancer status", "format":
> "json"}]: dispatch
> 2023-06-23T08:30:54.227+ 7f2b3a5a8700  0 [dashboard INFO request] [
> 172.29.2.142:49348] [GET] [200] [0.019s] [admin] [22.0B] /api/prometheus
> 2023-06-23T08:30:54.227+ 7f2b929d2700  0 log_channel(audit) log [DBG]
> : from='mon.0 -' entity='mon.' cmd=[{"prefix": "balancer status", "format":
> "json"}]: dispatch
> 2023-06-23T08:30:54.231+ 7f2b3d5ae700  0 [dashboard INFO request] [
> 172.29.2.142:39414] [GET] [200] [0.022s] [admin] [9.3K]
> /api/prometheus/rules
> 2023-06-23T08:30:54.275+ 7f2ba39d4700  0 log_channel(cluster) log
> [DBG] : pgmap v2116480: 145 pgs: 145 active+clean; 2.8 GiB data, 12 GiB
> used, 1.5 TiB / 1.5 TiB avail; 5.5 KiB/s wr, 0 op/s
>

my fs info :

# ceph fs ls
> name: kube, metadata pool: metadata, data pools: [data ]
>

thank for your help

best regards
Karim
___
ceph-users mailing l

[ceph-users] Re: radosgw hang under pressure

2023-06-23 Thread Rok Jaklič
We are experiencing something similar (slow GETs responses) when sending 1k
delete requests for example in ceph v16.2.13.

Rok

On Mon, Jun 12, 2023 at 7:16 PM grin  wrote:

> Hello,
>
> ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy
> (stable)
>
> There is a single (test) radosgw serving plenty of test traffic. When
> under heavy req/s ("heavy" in a low sense, about 1k rq/s) it pretty
> reliably hangs: low traffic threads seem to work (like handling occasional
> PUTs) but GETs are completely nonresponsive, all attention seems to be
> spent on futexes.
>
> The effect is extremely similar to
>
> https://ceph-users.ceph.narkive.com/I4uFVzH9/radosgw-civetweb-hangs-once-around-850-established-connections
> (subject: Radosgw (civetweb) hangs once around)
> except this is quincy so it's beast instead of civetweb. The effect is the
> same as described there, except the cluster is way smaller (about 20-40
> OSDs).
>
> I observed that when I start radosgw -f with debug 20/20 it almost never
> hangs, so my guess is some ugly race condition. However I am a bit clueless
> how to actually debug it since debugging makes it go away. Debug 1
> (default) with -d seems to hang after a while but it's not that simple to
> induce, I'm still testing under 4/4.
>
> Also I do not see much to configure about beast.
>
> As to answer the question in the original (2016) thread:
> - Debian stable
> - no visible limits issue
> - no obvious memory leak observed
> - no other visible resource shortage
> - strace says everyone's waiting on futexes, about 600-800 threads, apart
> from the one serving occasional PUTs
> - tcp port doesn't respond.
>
> IRC didn't react. ;-)
>
> Thanks,
> Peter
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph iSCSI GW is too slow when compared with Raw RBD performance

2023-06-23 Thread Work Ceph
Awesome, thanks for the info!

By any chance, do you happen to know what configurations you needed to
adjust to make Veeam perform a bit better?

On Fri, Jun 23, 2023 at 10:42 AM Anthony D'Atri  wrote:

> Yes, with someone I did some consulting for.  Veeam seems to be one of the
> prevalent uses for ceph-iscsi, though I'd try to use the native RBD client
> instead if possible.
>
> Veeam appears by default to store really tiny blocks, so there's a lot of
> protocol overhead.  I understand that Veeam can be configured to use "large
> blocks" that can make a distinct difference.
>
>
>
> On Jun 23, 2023, at 09:33, Work Ceph 
> wrote:
>
> Great question!
>
> Yes, one of the slowness was detected in a Veeam setup. Have you
> experienced that before?
>
> On Fri, Jun 23, 2023 at 10:32 AM Anthony D'Atri 
> wrote:
>
>> Are you using Veeam by chance?
>>
>> > On Jun 22, 2023, at 21:18, Work Ceph 
>> wrote:
>> >
>> > Hello guys,
>> >
>> > We have a Ceph cluster that runs just fine with Ceph Octopus; we use RBD
>> > for some workloads, RadosGW (via S3) for others, and iSCSI for some
>> Windows
>> > clients.
>> >
>> > We started noticing some unexpected performance issues with iSCSI. I
>> mean,
>> > an SSD pool is reaching 100MB of write speed for an image, when it can
>> > reach up to 600MB+ of write speed for the same image when mounted and
>> > consumed directly via RBD.
>> >
>> > Is that performance degradation expected? We would expect some
>> degradation,
>> > but not as much as this one.
>> >
>> > Also, we have a question regarding the use of Intel Turbo boost. Should
>> we
>> > disable it? Is it possible that the root cause of the slowness in the
>> iSCSI
>> > GW is caused by the use of Intel Turbo boost feature, which reduces the
>> > clock of some cores?
>> >
>> > Any feedback is much appreciated.
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph iSCSI GW not working with VMware VMFS and Windows Clustered Storage Volumes (CSV)

2023-06-23 Thread Work Ceph
Thanks for the help so far guys!

Has anybody used (made it work) the default ceph-iscsi implementation with
VMware and/or Windows CSV storage system with a single target/portal in
iSCSI?

On Wed, Jun 21, 2023 at 6:02 AM Maged Mokhtar  wrote:

>
> On 20/06/2023 01:16, Work Ceph wrote:
> > I see, thanks for the feedback guys!
> >
> > It is interesting that Ceph Manager does not allow us to export iSCSI
> > blocks without selecting 2 or more iSCSI portals. Therefore, we will
> > always use at least two, and as a consequence that feature is not
> > going to be supported. Can I export an RBD image via iSCSI gateway
> > using only one portal via GwCli?
> >
> > @Maged Mokhtar, I am not sure I follow. Do you guys have an iSCSI
> > implementation that we can use to somehow replace the default iSCSI
> > server in the default Ceph iSCSI Gateway? I didn't quite understand
> > what the petasan project is, and if it is an OpenSource solution that
> > we can somehow just pick/select/use one of its modules (e.g. just the
> > iSCSI implementation) that you guys have.
> >
>
> For sure PetaSAN is open source..you should see this from the home page :)
> we use Consul
> https://www.consul.io/use-cases/multi-platform-service-mesh
> to scale-out the service/protocol layers above Ceph in a scale-out
> active/active fashion.
> Most of our target use cases are non linux, such as VMWare and Windows,
> we provide easy to use deployment and management.
>
> For iSCSI, we use kernel/LIO rbd backstore originally developed by SUSE
> Enterprise storge. We have done some changes to send persistence
> reservations using the Ceph watch/notify, we also added changes to
> coordinate pre-snapshot quiescing/flushing across different gateways. We
> ported rbd backstore to 5.14 kernel.
>
> You should be able to use the iSCSI gateway by itself on existing non
> PetaSAN clusters but it is not a setup we support. You would use the LIO
> targercli to script the setup. There are some things to take care of
> such as setting the disk serial wwn to be the same across the different
> gateways serving the same image, setting up the multiple tpgs (target
> portal groups) for an image but only enabling the tpgs for local node.
> This setup will be using multi path MPIO to provide HA. Again it is not
> a setup we support, you could try it yourself in a test environment, you
> can also setup a test PetaSAN setup and examine the LIO configuration
> using targetcli. You can send me email if you need any clarifications.
>
> Cheers /Maged
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] users caps change unexpected

2023-06-23 Thread Alessandro Italiano
Hi

we have a brand new ceph instance deployed by ceph puppet module.
We are experiencing a funny issues. users caps change unexpected.

logs do not report any message about the user caps even with auth/debug_auth: 
5/5

who/what can change the caps ?

thanks in advance

Ale
root@cephmon1:~# ceph version
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
root@cephmon1:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 22.04.1 LTS
Release:22.04
Codename:   jammy
root@cephmon1:~# 


root@cephmon1:~# ceph auth del client.cinder-backup
updated
root@cephmon1:~# 
root@cephmon1:~# 
root@cephmon1:~# 
root@cephmon1:~# date
Fri Jun 23 07:52:40 AM CEST 2023
root@cephmon1:~# 
root@cephmon1:~# ceph auth add client.cinder-backup
added key for client.cinder-backup
root@cephmon1:~# ceph auth caps client.cinder-backup mon 'allow r' osd 'allow 
class-read object_prefix rbd_children, allow rwx pool=backup' mgr 'profile rbd 
pool=backup'
updated caps for client.cinder-backup
root@cephmon1:~# 
root@cephmon1:~# 
root@cephmon1:~# 
root@cephmon1:~# date
Fri Jun 23 07:53:18 AM CEST 2023
root@cephmon1:~# ceph auth list
client.cinder-backup
key: AQBEM5VkhIfJHBAA6WP9P3HHCTSySdTqZv4Ypg==
caps: [mgr] profile rbd pool=backup
caps: [mon] allow r
caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
pool=backup

root@cephmon1:~# ceph auth list
client.cinder-backup
key: AQAM0nJi8OYAFBAA+p+T2QWtKaq92Z/hFMgF4w==
caps: [mgr] profile rbd pool=backups
caps: [mon] profile rbd
caps: [osd] profile rbd pool=backups
root@cephmon1:~# date
Fri Jun 23 07:56:42 AM CEST 2023
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph-dashboard python warning with new pyo3 0.17 lib (debian12)

2023-06-23 Thread DERUMIER, Alexandre
Hi,

on debian12, ceph-dashboard is throwing a warning

"Module 'dashboard' has failed dependency: PyO3 modules may only be
initialized once per interpreter process"


Seem to be related to pyo3 0.17 change

https://github.com/PyO3/pyo3/blob/7bdc504252a2f972ba3490c44249b202a4ce6180/guide/src/migration.md#each-pymodule-can-now-only-be-initialized-once-per-process

"
Each #[pymodule] can now only be initialized once per process
To make PyO3 modules sound in the presence of Python sub-interpreters,
for now it has been necessary to explicitly disable the ability to
initialize a #[pymodule] more than once in the same process. Attempting
to do this will now raise an ImportError.
"



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: users caps change unexpected

2023-06-23 Thread Eugen Block

Hi,
without knowing the details I just assume that it’s just „translated“,  
the syntax you set is the older way of setting rbd caps, since a  
couple of years it’s sufficient to use „profile rbd“. Do you notice  
client access issues (which I would not expect) or are you just  
curious about the automatic change? There’s probably a pull request  
somewhere which I can’t search for right now. :-)


Regards
Eugen

Zitat von Alessandro Italiano :


Hi

we have a brand new ceph instance deployed by ceph puppet module.
We are experiencing a funny issues. users caps change unexpected.

logs do not report any message about the user caps even with  
auth/debug_auth: 5/5


who/what can change the caps ?

thanks in advance

Ale
root@cephmon1:~# ceph version
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757)  
quincy (stable)

root@cephmon1:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 22.04.1 LTS
Release:22.04
Codename:   jammy
root@cephmon1:~#


root@cephmon1:~# ceph auth del client.cinder-backup
updated
root@cephmon1:~#
root@cephmon1:~#
root@cephmon1:~#
root@cephmon1:~# date
Fri Jun 23 07:52:40 AM CEST 2023
root@cephmon1:~#
root@cephmon1:~# ceph auth add client.cinder-backup
added key for client.cinder-backup
root@cephmon1:~# ceph auth caps client.cinder-backup mon 'allow r'  
osd 'allow class-read object_prefix rbd_children, allow rwx  
pool=backup' mgr 'profile rbd pool=backup'

updated caps for client.cinder-backup
root@cephmon1:~#
root@cephmon1:~#
root@cephmon1:~#
root@cephmon1:~# date
Fri Jun 23 07:53:18 AM CEST 2023
root@cephmon1:~# ceph auth list
client.cinder-backup
key: AQBEM5VkhIfJHBAA6WP9P3HHCTSySdTqZv4Ypg==
caps: [mgr] profile rbd pool=backup
caps: [mon] allow r
	caps: [osd] allow class-read object_prefix rbd_children, allow rwx  
pool=backup


root@cephmon1:~# ceph auth list
client.cinder-backup
key: AQAM0nJi8OYAFBAA+p+T2QWtKaq92Z/hFMgF4w==
caps: [mgr] profile rbd pool=backups
caps: [mon] profile rbd
caps: [osd] profile rbd pool=backups
root@cephmon1:~# date
Fri Jun 23 07:56:42 AM CEST 2023
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs cannot join cluster anymore

2023-06-23 Thread Malte Stroem

Hello Eugen,

thanks.

We found the cause.

Somehow all

/var/lib/ceph/fsid/osd.XX/config

files on every host were still filled with expired information about the 
mons.


So refreshing the files helped to bring the osds up again. Damn.

All other configs for the mons, mds', rgws and so on were up to date.

I do not know why the osd config files did not get refreshed however I 
guess something went wrong draining the nodes we removed from the cluster.


Best regards,
Malte

Am 21.06.23 um 22:11 schrieb Eugen Block:
I still can’t really grasp what might have happened here. But could you 
please clarify which of the down OSDs (or Hosts) are supposed to be down 
and which you’re trying to bring back online? Obviously osd.40 is one of 
your attempts. But what about the hosts cephx01 and cephx08? Are those 
the ones refusing to start their OSDs? And the remaining up OSDs you 
haven’t touched yet, correct?
And regarding debug logs, you should set it with ceph config set because 
the local ceph.conf won’t have an effect. It could help to have the 
startup debug logs from one of the OSDs.


Zitat von Malte Stroem :


Hello Eugen,

recovery and rebalancing was finished however now all PGs show missing 
OSDs.


Everything looks like the PGs are missing OSDs although it finished 
correctly.


As if we shut down the servers immediately.

But we removed the nodes the way it is described in the documentation.

We just added new disks and they join the cluster immediately.

So the old OSDs removed from the cluster are available, I restored 
OSD.40 but it does not want to join the cluster.


Following are the outputs of the mentioned commands:

ceph -s

  cluster:
    id: X
    health: HEALTH_WARN
    1 failed cephadm daemon(s)
    1 filesystem is degraded
    1 MDSs report slow metadata IOs
    19 osds down
    4 hosts (50 osds) down
    Reduced data availability: 1220 pgs inactive
    Degraded data redundancy: 132 pgs undersized

  services:
    mon: 3 daemons, quorum cephx02,cephx04,cephx06 (age 4m)
    mgr: cephx02.xx(active, since 92s), standbys: cephx04.yy, 
cephx06.zz mds: 2/2 daemons up, 2 standby

    osd: 130 osds: 78 up (since 13m), 97 in (since 35m); 171 remapped pgs
    rgw: 1 daemon active (1 hosts, 1 zones)

  data:
    volumes: 1/2 healthy, 1 recovering
    pools:   12 pools, 1345 pgs
    objects: 11.02k objects, 1.9 GiB
    usage:   145 TiB used, 669 TiB / 814 TiB avail
    pgs: 86.617% pgs unknown
 4.089% pgs not active
 39053/33069 objects misplaced (118.095%)
 1165 unknown
 77   active+undersized+remapped
 55   undersized+remapped+peered
 38   active+clean+remapped
 10   active+clean

ceph osd tree

ID   CLASS  WEIGHT  TYPE NAME    STATUS  REWEIGHT  
PRI-AFF

-21    4.36646  root ssds
-61    0.87329  host cephx01-ssd
186    ssd 0.87329  osd.186    down   1.0  
1.0

-76    0.87329  host cephx02-ssd
263    ssd 0.87329  osd.263  up   1.0  
1.0

-85    0.87329  host cephx04-ssd
237    ssd 0.87329  osd.237  up   1.0  
1.0

-88    0.87329  host cephx06-ssd
236    ssd 0.87329  osd.236  up   1.0  
1.0

-94    0.87329  host cephx08-ssd
262    ssd 0.87329  osd.262    down   1.0  
1.0

 -1 1347.07397  root default
-62  261.93823  host cephx01
139    hdd    10.91409  osd.139    down 0  
1.0
140    hdd    10.91409  osd.140    down 0  
1.0
142    hdd    10.91409  osd.142    down 0  
1.0
144    hdd    10.91409  osd.144    down 0  
1.0
146    hdd    10.91409  osd.146    down 0  
1.0
148    hdd    10.91409  osd.148    down 0  
1.0
150    hdd    10.91409  osd.150    down 0  
1.0
152    hdd    10.91409  osd.152    down 0  
1.0
154    hdd    10.91409  osd.154    down   1.0  
1.0
156    hdd    10.91409  osd.156    down   1.0  
1.0
158    hdd    10.91409  osd.158    down   1.0  
1.0
160    hdd    10.91409  osd.160    down   1.0  
1.0
162    hdd    10.91409  osd.162    down   1.0  
1.0
164    hdd    10.91409  osd.164    down   1.0  
1.0
166    hdd    10.91409  osd.166    down   1.0  
1.0
168    hdd    10.91409  osd.168    down   1.0  
1.0
170    hdd    10.91409  osd.170    down   1.0  
1.0
172    hdd    10.91409  osd.172    down   1.0  
1.0
174    hdd    1