[ceph-users] Re: MAX AVAIL capacity mismatch || mimic(13.2)

2021-12-14 Thread Janne Johansson
Den ons 15 dec. 2021 kl 07:45 skrev Md. Hejbul Tawhid MUNNA
:
> Hi,
> We are observing MAX-Available capacity is not reflecting the full size of
> the cluster.

Max avail is dependent on several factors, one is that the OSD with
the least free space will be the one used for calculating it, just
because it could happen that all writes to a pool for randomness
reasons end up on that one single OSD, and hence the "promise" is
shown for the worst case. Normally this will not happen, but it could.
Secondly, the max-avail is per pool, and based on replication factor
on that single pool. A size=2 pool will show more avail than a size=5
pool, because writes to the size=5 pool obviously eats 5x the written
data and the size=2 only uses twice the amount.
Given the total "197TB avail" and a guess at replication size is set
to 3, the max-avail should end up close to 197/3, but since OSD 18 has
some 12% more data on it than OSD 9, the math probably considers the
per-pool MAX AVAIL as if all OSDs were like OSD 18, but TOTAL AVAIL
will of course count free space on OSD 9 too.

> # ceph df
> GLOBAL:
> SIZEAVAIL   RAW USED %RAW USED
> 272 TiB 197 TiB   75 TiB 27.68
> POOLS:
> NAME   ID USED%USED MAX AVAIL
>   OBJECTS
> images 16 243 GiB  0.6039 TiB
> 31955
> volumes17  22 TiB 36.3439 TiB
>   5951619


37   hdd 5.57100  1.0 5.6 TiB 2.1 TiB 3.5 TiB 37.92 1.37 796
38   hdd 5.57100  1.0 5.6 TiB 1.7 TiB 3.8 TiB 31.03 1.12 841
39   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.51 1.21 830

It's somewhat interesting to see that every OSD is at variance 1.x,
meaning all of them report as if they are above average in terms of
having data. Was this not the complete picture of your OSDs?
In case there are more OSDs (ssds,nvmes) then of course they will
count as TOTAL AVAIL for the "ceph df" command, even if all pools
can't use those due to crush rules.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-mon pacific doesn't enter to quorum of nautilus cluster

2021-12-14 Thread Chris Dunlop

On Wed, Dec 15, 2021 at 02:05:05PM +1000, Michael Uleysky wrote:

I try to upgrade three-node nautilus cluster to pacific. I am updating ceph
on one node and restarting daemons. OSD ok, but monitor cannot enter quorum.


Sounds like the same thing as:

Pacific mon won't join Octopus mons
https://tracker.ceph.com/issues/52488

Unforutunately there's no resolution.

For a bit more background, see also the thread starting:

New pacific mon won't join with octopus mons
https://www.spinics.net/lists/ceph-devel/msg52181.html
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] MAX AVAIL capacity mismatch || mimic(13.2)

2021-12-14 Thread Md. Hejbul Tawhid MUNNA
Hi,

We are observing MAX-Available capacity is not reflecting the full size of
the cluster.

We are running mimic version.

initially we installed 3 osd-host containing 5.5TB X 8 each . That time
max_available was 39TB. After two year we had installed two more servers
with the same spec(5.5TB X 8 each). So total capacity should around 75TB.
But still its showing 39TB.


# ceph df
GLOBAL:
SIZEAVAIL   RAW USED %RAW USED
272 TiB 197 TiB   75 TiB 27.68
POOLS:
NAME   ID USED%USED MAX AVAIL
  OBJECTS
images 16 243 GiB  0.6039 TiB
31955
volumes17  22 TiB 36.3439 TiB
  5951619
vms18  82 MiB 039 TiB
 1950
gnocchi32 1.7 GiB 039 TiB
   184973
.rgw.root  35 1.1 KiB 039 TiB
4
default.rgw.control36 0 B 039 TiB
8
default.rgw.meta   37  37 KiB 039 TiB
  189
default.rgw.log38 0 B 039 TiB
  207
default.rgw.buckets.index  39 0 B 039 TiB
   66
default.rgw.buckets.data   40 930 GiB  2.2539 TiB
   322068
default.rgw.buckets.non-ec 49 0 B 039 TiB
4

# ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS
 0   hdd 5.57100  1.0 5.6 TiB 2.1 TiB 3.5 TiB 37.45 1.35 871
 1   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.99 1.23 840
 2   hdd 5.57100  1.0 5.6 TiB 1.7 TiB 3.8 TiB 31.30 1.13 831
 3   hdd 5.57100  1.0 5.6 TiB 2.1 TiB 3.4 TiB 38.51 1.39 888
 4   hdd 5.57100  1.0 5.6 TiB 1.8 TiB 3.7 TiB 32.97 1.19 866
 5   hdd 5.57100  1.0 5.6 TiB 2.0 TiB 3.6 TiB 35.85 1.30 837
 6   hdd 5.57100  1.0 5.6 TiB 1.8 TiB 3.8 TiB 31.89 1.15 858
 7   hdd 5.57100  1.0 5.6 TiB 1.6 TiB 3.9 TiB 29.42 1.06 851
 8   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.31 1.20 799
 9   hdd 5.57100  1.0 5.6 TiB 1.6 TiB 4.0 TiB 28.53 1.03 793
10   hdd 5.57100  1.0 5.6 TiB 1.6 TiB 3.9 TiB 29.29 1.06 839
11   hdd 5.57100  1.0 5.6 TiB 2.0 TiB 3.6 TiB 35.92 1.30 860
12   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.37 1.21 904
13   hdd 5.57100  1.0 5.6 TiB 1.8 TiB 3.8 TiB 32.28 1.17 807
14   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.92 1.23 845
15   hdd 5.57100  1.0 5.6 TiB 2.1 TiB 3.5 TiB 37.33 1.35 836
16   hdd 5.57100  1.0 5.6 TiB 1.7 TiB 3.9 TiB 30.89 1.12 881
17   hdd 5.57100  1.0 5.6 TiB 1.8 TiB 3.8 TiB 32.42 1.17 876
18   hdd 5.57100  1.0 5.6 TiB 2.3 TiB 3.2 TiB 41.98 1.52 860
19   hdd 5.57100  1.0 5.6 TiB 1.7 TiB 3.9 TiB 29.70 1.07 828
20   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.7 TiB 34.39 1.24 854
21   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.38 1.21 845
22   hdd 5.57100  1.0 5.6 TiB 1.8 TiB 3.7 TiB 32.97 1.19 797
23   hdd 5.57100  1.0 5.6 TiB 2.0 TiB 3.5 TiB 36.47 1.32 839
24   hdd 5.57100  1.0 5.6 TiB 2.0 TiB 3.5 TiB 36.71 1.33 829
25   hdd 5.57100  1.0 5.6 TiB 1.7 TiB 3.9 TiB 30.63 1.11 878
26   hdd 5.57100  1.0 5.6 TiB 2.0 TiB 3.5 TiB 36.40 1.32 867
27   hdd 5.57100  1.0 5.6 TiB 1.7 TiB 3.8 TiB 30.90 1.12 842
28   hdd 5.57100  1.0 5.6 TiB 1.8 TiB 3.8 TiB 31.88 1.15 821
29   hdd 5.57100  1.0 5.6 TiB 1.8 TiB 3.7 TiB 33.20 1.20 871
30   hdd 5.57100  1.0 5.6 TiB 2.0 TiB 3.6 TiB 35.71 1.29 813
31   hdd 5.57100  1.0 5.6 TiB 1.7 TiB 3.9 TiB 30.37 1.10 812
32   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.7 TiB 34.39 1.24 836
33   hdd 5.57100  1.0 5.6 TiB 1.7 TiB 3.8 TiB 31.34 1.13 884
34   hdd 5.57100  1.0 5.6 TiB 2.0 TiB 3.5 TiB 36.40 1.32 829
35   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.6 TiB 34.54 1.25 900
36   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.51 1.21 838
37   hdd 5.57100  1.0 5.6 TiB 2.1 TiB 3.5 TiB 37.92 1.37 796
38   hdd 5.57100  1.0 5.6 TiB 1.7 TiB 3.8 TiB 31.03 1.12 841
39   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.51 1.21 830

Regards,
Munna
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is 100pg/osd still the rule of thumb?

2021-12-14 Thread Linh Vu
Pretty sure this rule of thumb was created during the days of 4TB and 6TB
spinning disks. Newer spinning disks and SSD / NVMe are faster so they can
have more PGs. Obviously a 16TB spinning disk isn't 4 times faster than a
4TB one, so it's not a linear increase, but I think going closer to 200
should be fine for the bigger disks.

On Wed, Dec 15, 2021 at 5:21 PM Szabo, Istvan (Agoda) <
istvan.sz...@agoda.com> wrote:

> Hi,
>
> Just curious is this still the best practice?
> I just gave a try in my 144x osd cluster to have 150-160pg/osd and the
> speed went up I guess because I have a lot of small objects, but for me the
> stability still the most important so would be good to know the answer.
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-mon pacific doesn't enter to quorum of nautilus cluster

2021-12-14 Thread Linh Vu
May not be directly related to your error, but they slap a DO NOT UPGRADE
FROM AN OLDER VERSION label on the Pacific release notes for a reason...

https://docs.ceph.com/en/latest/releases/pacific/

It means please don't upgrade right now.

On Wed, Dec 15, 2021 at 3:07 PM Michael Uleysky  wrote:

> I try to upgrade three-node nautilus cluster to pacific. I am updating ceph
> on one node and restarting daemons. OSD ok, but monitor cannot enter
> quorum.
> With debug_mon 20/20 I see repeating blocks in the logs of problem monitor
> like
>
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
> bootstrap
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
> sync_reset_requester
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
> unregister_cluster_logger - not registered
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
> cancel_probe_timeout 0x557603d82420
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
> monmap e4: 3 mons at {debian1=[v2:
> 172.16.21.101:3300/0,v1:172.16.21.101:6789/0],debian2=[v2:
> 172.16.21.102:3300/0,v1:172.16.21.102:6789/0],debian3=[v2:
> 172.16.21.103:3300/0,v1:172.16.21.103:6789/0]}
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
> _reset
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing).auth
> v0
> _set_mon_num_rank num 0 rank 0
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
> cancel_probe_timeout (none scheduled)
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
> timecheck_finish
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 15 mon.debian2@1(probing) e4
> health_tick_stop
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 15 mon.debian2@1(probing) e4
> health_interval_stop
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
> scrub_event_cancel
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
> scrub_reset
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
> cancel_probe_timeout (none scheduled)
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
> reset_probe_timeout 0x557603d82420 after 2 seconds
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
> probing other monitors
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 20 mon.debian2@1(probing) e4
> _ms_dispatch existing session 0x557603d60b40 for mon.2
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 20 mon.debian2@1(probing) e4
>  entity_name  global_id 0 (none) caps allow *
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 20 is_capable service=mon
> command= read addr v2:172.16.21.103:3300/0 on cap allow *
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 20  allow so far , doing grant
> allow *
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 20  allow all
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
> handle_probe mon_probe(reply 8deaaacb-c581-4c10-b58c-0ab261aa2865 name
> debian3 quorum 0,2 leader 0 paxos( fc 52724559 lc 52725302 ) mon_release
> octopus) v7
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
> handle_probe_reply mon.2 v2:172.16.21.103:3300/0 mon_probe(reply
> 8deaaacb-c581-4c10-b58c-0ab261aa2865 name debian3 quorum 0,2 leader 0
> paxos( fc 52724559 lc 52725302 ) mon_release octopus) v7
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
>  monmap is e4: 3 mons at {debian1=[v2:
> 172.16.21.101:3300/0,v1:172.16.21.101:6789/0],debian2=[v2:
> 172.16.21.102:3300/0,v1:172.16.21.102:6789/0],debian3=[v2:
> 172.16.21.103:3300/0,v1:172.16.21.103:6789/0]}
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
> got
> newer/committed monmap epoch 4, mine was 4
> 2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
> bootstrap
>
> On the nautilus monitor I see
>
> 2021-12-15T13:57:03.866+1000 7f109cf23700 20 mon.debian1@0(leader) e4
> _ms_dispatch existing session 0x55feee4f9b00 for mon.1
>
> 2021-12-15T13:57:03.866+1000 7f109cf23700 20 mon.debian1@0(leader) e4
>  entity_name  global_id 0 (none) caps allow *
>
> 2021-12-15T13:57:03.866+1000 7f109cf23700 20 is_capable service=mon
> command= read addr v2:172.16.21.102:3300/0 on cap allow *
> 2021-12-15T13:57:03.866+1000 7f109cf23700 20  allow so far , doing grant
> allow *
> 2021-12-15T13:57:03.866+1000 7f109cf23700 20  allow all
> 2021-12-15T13:57:03.866+1000 7f109cf23700 10 mon.debian1@0(leader) e4
> handle_probe mon_probe(probe 8deaaacb-c581-4c10-b58c-0ab261aa2865 name
> debian2 new mon_release unknown) v8
> 2021-12-15T13:57:03.866+1000 7f109cf23700 10 mon.debian1@0(leader) e4
> handle_probe_probe mon.1 v2:172.16.21.102:3300/0mon_probe(probe
> 8deaaacb-c581-4c10-b58c-0ab261aa2865 name debian2 new mon_release unknown)
> v8 features 4540138292840890367
> 2021-12-15T13:57:03.866+1000 7f109cf23700 20 mon.debian1@0(leader) e4
> _ms_dispatch existing session 0x55feee4f9b00 for mon.1
> 

[ceph-users] ceph-mon pacific doesn't enter to quorum of nautilus cluster

2021-12-14 Thread Michael Uleysky
I try to upgrade three-node nautilus cluster to pacific. I am updating ceph
on one node and restarting daemons. OSD ok, but monitor cannot enter quorum.
With debug_mon 20/20 I see repeating blocks in the logs of problem monitor
like

2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
bootstrap
2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
sync_reset_requester
2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
unregister_cluster_logger - not registered
2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
cancel_probe_timeout 0x557603d82420
2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
monmap e4: 3 mons at {debian1=[v2:
172.16.21.101:3300/0,v1:172.16.21.101:6789/0],debian2=[v2:
172.16.21.102:3300/0,v1:172.16.21.102:6789/0],debian3=[v2:
172.16.21.103:3300/0,v1:172.16.21.103:6789/0]}
2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
_reset
2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing).auth v0
_set_mon_num_rank num 0 rank 0
2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
cancel_probe_timeout (none scheduled)
2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
timecheck_finish
2021-12-15T13:34:57.075+1000 7f6e1b417700 15 mon.debian2@1(probing) e4
health_tick_stop
2021-12-15T13:34:57.075+1000 7f6e1b417700 15 mon.debian2@1(probing) e4
health_interval_stop
2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
scrub_event_cancel
2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
scrub_reset
2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
cancel_probe_timeout (none scheduled)
2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
reset_probe_timeout 0x557603d82420 after 2 seconds
2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
probing other monitors
2021-12-15T13:34:57.075+1000 7f6e1b417700 20 mon.debian2@1(probing) e4
_ms_dispatch existing session 0x557603d60b40 for mon.2
2021-12-15T13:34:57.075+1000 7f6e1b417700 20 mon.debian2@1(probing) e4
 entity_name  global_id 0 (none) caps allow *
2021-12-15T13:34:57.075+1000 7f6e1b417700 20 is_capable service=mon
command= read addr v2:172.16.21.103:3300/0 on cap allow *
2021-12-15T13:34:57.075+1000 7f6e1b417700 20  allow so far , doing grant
allow *
2021-12-15T13:34:57.075+1000 7f6e1b417700 20  allow all
2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
handle_probe mon_probe(reply 8deaaacb-c581-4c10-b58c-0ab261aa2865 name
debian3 quorum 0,2 leader 0 paxos( fc 52724559 lc 52725302 ) mon_release
octopus) v7
2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
handle_probe_reply mon.2 v2:172.16.21.103:3300/0 mon_probe(reply
8deaaacb-c581-4c10-b58c-0ab261aa2865 name debian3 quorum 0,2 leader 0
paxos( fc 52724559 lc 52725302 ) mon_release octopus) v7
2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
 monmap is e4: 3 mons at {debian1=[v2:
172.16.21.101:3300/0,v1:172.16.21.101:6789/0],debian2=[v2:
172.16.21.102:3300/0,v1:172.16.21.102:6789/0],debian3=[v2:
172.16.21.103:3300/0,v1:172.16.21.103:6789/0]}
2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4  got
newer/committed monmap epoch 4, mine was 4
2021-12-15T13:34:57.075+1000 7f6e1b417700 10 mon.debian2@1(probing) e4
bootstrap

On the nautilus monitor I see

2021-12-15T13:57:03.866+1000 7f109cf23700 20 mon.debian1@0(leader) e4
_ms_dispatch existing session 0x55feee4f9b00 for mon.1

2021-12-15T13:57:03.866+1000 7f109cf23700 20 mon.debian1@0(leader) e4
 entity_name  global_id 0 (none) caps allow *

2021-12-15T13:57:03.866+1000 7f109cf23700 20 is_capable service=mon
command= read addr v2:172.16.21.102:3300/0 on cap allow *
2021-12-15T13:57:03.866+1000 7f109cf23700 20  allow so far , doing grant
allow *
2021-12-15T13:57:03.866+1000 7f109cf23700 20  allow all
2021-12-15T13:57:03.866+1000 7f109cf23700 10 mon.debian1@0(leader) e4
handle_probe mon_probe(probe 8deaaacb-c581-4c10-b58c-0ab261aa2865 name
debian2 new mon_release unknown) v8
2021-12-15T13:57:03.866+1000 7f109cf23700 10 mon.debian1@0(leader) e4
handle_probe_probe mon.1 v2:172.16.21.102:3300/0mon_probe(probe
8deaaacb-c581-4c10-b58c-0ab261aa2865 name debian2 new mon_release unknown)
v8 features 4540138292840890367
2021-12-15T13:57:03.866+1000 7f109cf23700 20 mon.debian1@0(leader) e4
_ms_dispatch existing session 0x55feee4f9b00 for mon.1
2021-12-15T13:57:03.866+1000 7f109cf23700 20 mon.debian1@0(leader) e4
 entity_name  global_id 0 (none) caps allow *
2021-12-15T13:57:03.866+1000 7f109cf23700 20 is_capable service=mon
command= read addr v2:172.16.21.102:3300/0 on cap allow *
2021-12-15T13:57:03.866+1000 7f109cf23700 20  allow so far , doing grant
allow *
2021-12-15T13:57:03.866+1000 7f109cf23700 20  allow all
2021-12-15T13:57:03.866+1000 7f109cf23700 10 mon.debian1@0(leader) e4
handle_probe mon_probe(probe 

[ceph-users] Re: Experience reducing size 3 to 2 on production cluster?

2021-12-14 Thread Marco Pizzolo
Hi Joachim,

Understood on the risks.  Aside from the alt. cluster, we have 3 other
copies of the data outside of Ceph, so I feel pretty confident that it's a
question of time to repopulate and not data loss.

That said, I would be interested in your experience on what I'm trying to
do if you've attempted something similar previously.

Thanks,
Marco

On Sat, Dec 11, 2021 at 6:59 AM Joachim Kraftmayer (Clyso GmbH) <
joachim.kraftma...@clyso.com> wrote:

> Hi Marco,
>
> to quote an old colleague, this is one of the ways to break a Ceph
> cluster with its data.
>
> Perhaps the risks are not immediately visible in normal operation, but
> in the event of a failure, the potential loss of data must be accepted.
>
> Regards, Joachim
>
>
> ___
>
> Clyso GmbH - ceph foundation member
>
> Am 10.12.21 um 18:04 schrieb Marco Pizzolo:
> > Hello,
> >
> > As part of a migration process where we will be swinging Ceph hosts from
> > one cluster to another we need to reduce the size from 3 to 2 in order to
> > shrink the footprint sufficiently to allow safe removal of an OSD/Mon
> node.
> >
> > The cluster has about 500M objects as per dashboard, and is about 1.5PB
> in
> > size comprised solely of small files served through CephFS to Samba.
> >
> > Has anyone encountered a similar situation?  What (if any) problems did
> you
> > face?
> >
> > Ceph 14.2.22 bare metal deployment on Centos.
> >
> > Thanks in advance.
> >
> > Marco
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Experience reducing size 3 to 2 on production cluster?

2021-12-14 Thread Marco Pizzolo
Hi Martin,

Agreed on the min_size of 2.  I have no intention of worrying about uptime
in event of a host failure.  Once size of 2 is effectuated (and I'm unsure
how long it will take), it is our intention to evacuate all OSDs in one of
4 hosts, in order to migrate the host to the new cluster, where its OSDs
will then be added in.  Once added and balanced, we will complete the
copies (<3 days) and then migrate one more host allowing us to bring size
to 3.  Once balanced, we will collapse the last 2 nodes into the new
cluster.  I am hoping that inclusive of rebalancing the whole project will
only take 3 weeks, but time will tell.

Has anyone asked Ceph to reduce hundreds of millions if not billions of
files from size 3 to size 2, and if so, were you successful?  I know it
*should* be able to do this, but sometimes theory and practice don't
perfectly overlap.

Thanks,
Marco

On Sat, Dec 11, 2021 at 4:37 AM Martin Verges 
wrote:

> Hello,
>
> avoid size 2 whenever you can. As long as you know that you might lose
> data, it can be an acceptable risk while migrating the cluster. We had that
> in the past multiple time and it is a valid use case in our opinion.
> However make sure to monitor the state and recover as fast as possible.
> Leave min_size on 2 as well and accept the potential downtime!
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695  | Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
>
>
> On Fri, 10 Dec 2021 at 18:05, Marco Pizzolo 
> wrote:
>
>> Hello,
>>
>> As part of a migration process where we will be swinging Ceph hosts from
>> one cluster to another we need to reduce the size from 3 to 2 in order to
>> shrink the footprint sufficiently to allow safe removal of an OSD/Mon
>> node.
>>
>> The cluster has about 500M objects as per dashboard, and is about 1.5PB in
>> size comprised solely of small files served through CephFS to Samba.
>>
>> Has anyone encountered a similar situation?  What (if any) problems did
>> you
>> face?
>>
>> Ceph 14.2.22 bare metal deployment on Centos.
>>
>> Thanks in advance.
>>
>> Marco
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Single ceph client usage with multiple ceph cluster

2021-12-14 Thread Anthony D'Atri
At the risk of pedantry, I’d like to make a distinction, because this has 
tripped people up in the past.

Cluster names and config file names are two different things.  It’s easy to 
conflate them, which has caused some people a lot of technical debt and grief.  
Especially with `rbd-mirror`.  

Custom cluster names, ie any cluster that isn’t named “ceph” have been 
deprecated for a while, with upstream clearly indicating that code / support 
would be incrementally factored out.  They were originally intended for running 
more than one logical cluster on a given set of hardware, which it turns out 
very few people ever did.  They were also IMHO never 100% completely 
implemented.  Deploying a new cluster today with a custom name would be a bad 
idea.

My read of the OP’s post was multiple clusters, with no indication of their 
names.  Ideally all will be named “ceph”.  Reasons for multiple clusters 
include infrastructure generational shifts, geographical diversity, 
minimization of blast radius, and an operational desire to only grow a given 
cluster to a certain size.

So I fully expect `—cluster` to be removed from commands over time, and would 
not advocate implementing any infrastructure using it, either directly or via 
`CEPH_ARGS`.

`-c` and `-k` to point to conf and key files would be better choices.

Note that conf file naming is (mostly) arbitrary, and is not tied to the actual 
*cluster* name.  One might have for example:

/etc/ceph-cluster1/ceph.conf
/etc/ceph-clusster2.conf
/etc/ceph/cluster-1.conf
/etc/ceph/cluster-2.conf
/var/lib/tool/ethelmerman.conf

All would work, though for historical reasons I like to avoid having more than 
one .conf file under /etc/ceph.

> 
> Hello Mosharaf,
> 
> yes, that's no problem. On all of my clusters I did not have a ceph.conf
> in in the /etc/ceph folders on my nodes at all.
> 
> I have a .conf, .conf, 
> .conf ...
> configuration file in the /etc/ceph folder. One config file for each cluster.
> The same for the different key files e.g. .mon.keyring, 
> .mon.keyring
> Or the admin key .client.admin.keyring, 
> .mon.keyring
> 
> But keep in mind, if you go this way you have to provide the
> name of the cluster (points to the configuration file you are refering to)
> for nearly every ceph command together with the cluster keyword
> e.g: ceph --cluster  health detail
> or: rbd --cluster  ls 
> 
> 
> Regards
> Markus Baier
> -- 
> Markus Baier
> Systemadministrator
> Fachgebiet Self-Organizing Systems
> TU Darmstadt, Germany
> S3|19 1.7
> Rundeturmstrasse 12
> 64283 Darmstadt
> 
> Phone: +49 6151 16-57242
> Fax: +49 6151 16-57241
> E-Mail: markus.ba...@bcs.tu-darmstadt.de
> 
> Am 09.12.21 um 03:35 schrieb Mosharaf Hossain:
>> Hello Markus
>> Thank you for your direction.
>> I would like to let you know that the way you show it is quite meaningful 
>> but I am afraid how the ceph system would identify the configuration file as 
>> by default it uses ceph. conf in /etc/ceph folder. Can we define the config 
>> file as we want?
>> 
>> It will be helpful to give or show us guidelines to add the ceph client to 
>> multiple clusters.
>> 
>> 
>> 
>> 
>> Regards
>> Mosharaf Hossain
>> Deputy Manager, Product Development
>> IT Division
>> *
>> *
>> Bangladesh Export Import Company Ltd.
>> 
>> Level-8, SAM Tower, Plot #4, Road #22, Gulshan-1, Dhaka-1212,Bangladesh
>> 
>> Tel: +880 9609 000 999, +880 2 5881 5559, Ext: 14191, Fax: +880 2 9895757
>> 
>> Cell: +8801787680828, Email: mosharaf.hoss...@bol-online.com 
>> , Web: www.bol-online.com
>> 
>> 
>> 
>> 
>> On Tue, Nov 2, 2021 at 7:01 PM Markus Baier 
>>  wrote:
>> 
>>Hello,
>> 
>>yes you can use a single server to operate multiple clusters.
>>I have a configuration running, with two independent ceph clusters
>>running on the same node (of course multiple nodes for the two
>>clusters)
>> 
>>The trick is to work with multiple ceph.conf files, I use two
>>seperate ceph.conf files under /etc/ceph/
>>One is called .conf and the other .conf
>> 
>>Every cluster uses it's seperate network interfaces, so I use four
>>10GbE
>>Interfaces
>>for the two clusters, but you can also use vlans togehter with a
>>100GbE
>>Interface or a 100GbE NIC that can provide virtual ports for the
>>separation
>>of the networks and the distribution of the network load.
>> 
>>Every cluster uses also a seperate keyring, so e.g for the first
>>cluster you have a keyring named .mon.keyring
>>and for the second one .mon.keyring
>>inside of the /etc/ceph folder.
>> 
>>To administrate the whole thing, ceph provide
>>the --cluster parameter for the command line programs.
>>So ceph --cluster  -s
>>will show the outputs for cluster one and
>>ceph --cluster  -s
>>for the cluster two
>> 
>> 
>>

[ceph-users] Announcing go-ceph v0.13.0

2021-12-14 Thread John Mulligan
I'm happy to announce another release of the go-ceph API library. This is a 
regular release following our every-two-months release cadence.

https://github.com/ceph/go-ceph/releases/tag/v0.13.0

Changes include additions to the rbd and rados packages. More details are 
available at the link above.

The library includes bindings that aim to play a similar role to the "pybind" 
python bindings in the ceph tree but for the Go language. The library also 
includes additional APIs that can be used to administer cephfs, rbd, and rgw 
subsystems.
There are already a few consumers of this library in the wild, including the 
ceph-csi project.


-- 
John Mulligan

phlogistonj...@asynchrono.us
jmulli...@redhat.com


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph RESTful APIs and managing Cephx users

2021-12-14 Thread Ernesto Puerta
Hi Michał,

You're totally right there. That endpoint is for managing Ceph Dashboard
users. The Cephx auth/user management is not yet implemented in the
Dashboard. It's planned for Quincy 
 though.

Kind Regards,
Ernesto


On Tue, Dec 14, 2021 at 3:11 PM Michał Nasiadka  wrote:

> Hello,
>
> I’ve been investigating using Ceph RESTful API in Pacific to create Cephx
> users (along with a keyring) but it seems the functionality is not there.
> The documentation shows /api/user calls - but those seem to be related to
> Ceph Dashboard users?
>
> Is there a plan to add that functionality?
>
> Docs: https://docs.ceph.com/en/pacific/mgr/ceph_api/#user <
> https://docs.ceph.com/en/pacific/mgr/ceph_api/#user>
>
> Best regards,
>
> Michal
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Manager carries wrong information until killing it

2021-12-14 Thread 涂振南
Hi there,
I ask you to look for additional information and write me the end result. Down 
below I send the legal request.


crhconsultores.co.mz/minimanumquam/autnumquamqui

Hello, we have a recurring, funky problem with managers on Nautilus (and 
probably also earlier versions): the manager displays incorrect information. 
This is a recurring pattern and it also breaks the prometheus graphs, as the 
I/O is described insanely incorrectly: "recovery: 43 TiB/s, 3.62k keys/s, 
11.40M objects/s" - which basically changes the scale of any related graph to 
unusable. The latest example from today shows slow ops for an OSD that has been 
down for 17h: 

 [09:50:31] black2.place6:~# ceph -s cluster: id: 
1ccd84f6-e362-4c50-9ffe-59436745e445 health: HEALTH_WARN 18 slow ops, oldest 
one blocked for 975 sec, osd.53 has slow ops services: mon: 5 daemons, quorum 
server9,server2,server8,server6,server4 (age 2w) mgr: server2(active, since 
2w), standbys: server8, server4, server9, server6, ciara3 osd: 108 osds: 107 up 
(since 17h), 107 in (since 17h) data: pools: 4 pools, 2624 pgs objects: 42.52M 
objec
 ts, 162 TiB usage: 486 TiB used, 298 TiB / 784 TiB avail pgs: 2616 
active+clean 8 active+clean+scrubbing+deep io: client: 522 MiB/s rd, 22 MiB/s 
wr, 8.18k op/s rd, 689 op/s wr 

 Killing the manager on server2 changes the status to another temporary 
incorrect status, because the rebalance finished hours ago, paired with the 
incorrect rebalance speed that we see from time to time: 

 [09:51:59] black2.place6:~# ceph -s cluster: id: 
1ccd84f6-e362-4c50-9ffe-59436745e445 health: HEALTH_OK services: mon: 5 
daemons, quorum server9,server2,server8,server6,server4 (age 2w) mgr: 
server8(active, since 11s), standbys: server4, server9, server6, ciara3 osd: 
108 osds: 107 up (since 17h), 107 in (since 17h) data: pools: 4 pools, 2624 pgs 
objects: 42.52M objects, 162 TiB usage: 486 TiB used, 298 TiB / 784 TiB avail 
pgs: 2616 active+clean 8 acti
 ve+clean+scrubbing+deep io: client: 214 TiB/s rd, 54 TiB/s wr, 4.86G op/s rd, 
1.06G op/s wr recovery: 43 TiB/s, 3.62k keys/s, 11.40M objects/s progress: 
Rebalancing after osd.53 marked out [..] 

 Then a bit later, the status on the newly started manager is correct: 

 [09:52:18] black2.place6:~# ceph -s cluster: id: 
1ccd84f6-e362-4c50-9ffe-59436745e445 health: HEALTH_OK services: mon: 5 
daemons, quorum server9,server2,server8,server6,server4 (age 2w) mgr: 
server8(active, since 47s), standbys: server4, server9, server6, server2, 
ciara3 osd: 108 osds: 107 up (since 17h), 107 in (since 17h) data: pools: 4 
pools, 2624 pgs objects: 42.52M objects, 162 TiB usage: 486 TiB used, 298 TiB / 
784 TiB avail pgs: 2616 active+clean 8 active+clean+scrubbing+deep io: client: 
422 MiB/s rd, 39 MiB/s wr, 7.91k op/s rd, 7
 52 op/s wr 

 Question: is this a know bug, is anyone else seeing it or are we doing 
something wrong? Best regards, Nico -- Sustainable and modern Infrastructures 
by ungleich.ch ___ ceph-users 
mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Shall i set bluestore_fsck_quick_fix_on_mount now after upgrading to 16.2.7 ?

2021-12-14 Thread Christoph Adomeit
Hi,

I remember there was a bug in 16.2.6 for clusters upgraded from older versions 
where one had to set bluestore_fsck_quick_fix_on_mount to false .

Now I have upgraded from 16.2.6 to 16.2.7

Should I now set bluestore_fsck_quick_fix_on_mount to true ?

And if yes, what would be the command to activate the option ? Is restarting 
the osd afterwards enough ?


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Request for community feedback: Telemetry Performance Channel

2021-12-14 Thread Laura Flores
Hi Gregory,

It was intentional that I sent this email to the ceph-users list. The
telemetry module is designed as a relationship between developers and
users, where developers decide on the metrics to collect, and users
decide whether or not to opt in. Since the performance channel will be
a new addition to Quincy, it is important that we get feedback from
users while we are still in the development phase, since it will
always be up to the users to decide whether or not they want to share
these metrics with us.

To address your question about heap stats vs. heap dump, you are
correct in your assumption that I meant to say stats:

2. tcmalloc_heap_stats: A dump of tcmalloc heap profiles on a
per-osd basis. These metrics would be derived from the `ceph tell
osd.* heap stats` command.

We are interested in collecting these metrics to detect scenarios
where the osd has freed memory, but the kernel has not reclaimed it
from tcmalloc.


On Mon, Dec 13, 2021 at 6:57 PM Gregory Farnum  wrote:
>
> [ Moved to dev@ceph as this is a technical thing, not user feedback or
> concerns ]
>
> On Mon, Dec 13, 2021 at 1:54 PM Laura Flores  wrote:
> >
> > Dear Ceph users,
> >
> > I'm writing to inform the community about a new performance channel
> > that will be added to the telemetry module in the upcoming Quincy
> > release. Like all other channels, this channel is also on an opt-in
> > basis, but we’d like to know if there are any concerns regarding this
> > new collection and whether users would feel comfortable sharing
> > performance related data with us. Please review the details below and
> > respond to this email with any thoughts or questions you might have.
> >
> > We’ll also discuss this topic live at our next "Ceph User + Dev
> > Monthly Meetup", which will be held this week on December 16th. Feel
> > free to join this meeting and provide direct feedback to developers.
> >
> > U+D meeting details:
> > https://calendar.google.com/calendar/u/0/embed?src=9ts9c7lt7u1vic2ijvvqqlf...@group.calendar.google.com
> >
> > ---
> >
> > The telemetry module has been around since Luminous v12.2.13.
> > Operating on a strict opt-in basis, the telemetry module sends
> > anonymous data about Ceph users’ clusters securely back to the Ceph
> > developers. This data, which is displayed on public dashboards [1],
> > helps developers understand how Ceph is used and what problems users
> > may be experiencing. The telemetry module is divided into several
> > channels, each of which collects a different set of information.
> > Existing channels include "basic", "crash", "device", and "ident". All
> > existing channels, as well as future channels, offer users the choice
> > to opt-in. See the latest documentation for more details:
> > https://docs.ceph.com/en/latest/mgr/telemetry/
> >
> > For the upcoming Quincy release, we have designed a new performance
> > channel ("perf") that collects various performance metrics across a
> > cluster. As developers, we would like to use data from the perf
> > channel to:
> >
> > 1. gain a better understanding of how clusters are used
> > 2. discover changes in cluster usage over time
> > 3. identify performance-related bottlenecks
> > 4. model benchmarks used in upstream testing on workload patterns
> > seen in the field
> > 5. suggest better Ceph configuration values based on use case
> >
> > In addition, and most importantly, we have designed the perf channel
> > with users in mind. Our goal is to provide users with better access to
> > detailed performance information about their clusters that they can
> > find all in one place. With this performance data, we aim to provide
> > users the ability to:
> >
> > 1. monitor their own cluster’s performance by daemon type
> > 2. access detailed information about their cluster's overall health
> > 3. identify patterns in their cluster’s workload
> > 4. troubleshoot performance issues in their cluster, e.g. issues
> > with latency, throttling, or memory management
> >
> > In the process of designing the perf channel, we also saw a need for
> > users to be able to view the data they are sending when telemetry is
> > on, as well as the data that is available to send when telemetry is
> > off. With this new design, a user can look at which collections they
> > are reporting when telemetry is on with the command `ceph telemetry
> > show`. If telemetry is off, meaning the user has not opted in to
> > sending data, they can preview a sample report with `ceph telemetry
> > preview`. This same flow can be followed by channel, if preferred:
> > `ceph telemetry show ` or `ceph telemetry preview
> > `.
> >
> > In the case of the perf channel, a user who is opted into telemetry
> > (telemetry is on) may view a report of the perf collection with `ceph
> > telemetry show perf`.  A user who is not opted into 

[ceph-users] Re: Ceph container image repos

2021-12-14 Thread Gregory Farnum
I generated a quick doc PR so this doesn't trip over other users:
https://github.com/ceph/ceph/pull/44310. Thanks all!
-Greg

On Mon, Dec 13, 2021 at 10:59 AM John Petrini  wrote:
>
> "As of August 2021, new container images are pushed to quay.io
> registry only. Docker hub won't receive new content for that specific
> image but current images remain available.As of August 2021, new
> container images are pushed to quay.io registry only. Docker hub won't
> receive new content for that specific image but current images remain
> available."
>
> https://hub.docker.com/r/ceph/ceph
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph RESTful APIs and managing Cephx users

2021-12-14 Thread Michał Nasiadka
Hello,

I’ve been investigating using Ceph RESTful API in Pacific to create Cephx users 
(along with a keyring) but it seems the functionality is not there.
The documentation shows /api/user calls - but those seem to be related to Ceph 
Dashboard users?

Is there a plan to add that functionality?

Docs: https://docs.ceph.com/en/pacific/mgr/ceph_api/#user 


Best regards,

Michal
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Support for alternative RHEL derivatives

2021-12-14 Thread Manuel Lausch
We switched some month ago from CentOS 7 and 8 to Oracle Linux 8.
They promise to be 100% compatible with RHEL. 

I hope the provided ceph packages will work with this in the future. I
wouldn't be lucky with the container solution.


Manuel

On Mon, 13 Dec 2021 15:01:06 +
Benoit Knecht  wrote:

> Hi,
> 
> As we're getting closer to CentOS 8 EOL, I'm sure plenty of Ceph users are
> looking to migrate from CentOS 8 to CentOS Stream 8 or one of the new RHEL
> derivatives, e.g. Rocky and Alma.
> 
> The question of upstream support has already been raised in the past, but at
> the time Rocky and Alma were pretty much clones of CentOS. However now they're
> about to diverge in subtle ways, so I'm wondering if
> 
> 1. The upstream Ceph project plans on building and QA testing against Rocky
>and/or Alma;
> 2. Specific packages will be provided on https://download.ceph.com/ for Rocky
>and/or Alma, or if the packages built for CentOS Stream are expected to be
>compatible (which goes back to the QA testing question above);
> 3. Any Ceph developers have been in touch with Rocky and/or Alma Storage SIGs;
> 4. Any Ceph users have already or are planning to migrate to Rocky or Alma.
> 
> If anyone else is interested in this topic, maybe it can be added to the 
> agenda
> for the next users+dev meetup.
> 
> Cheers,
> 
> --
> Ben
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to clean up data in OSDS

2021-12-14 Thread Nagaraj Akkina
Hello Team,

After testing our cluster we removed and recreated all  ceph pools which
actually cleaned up all users and buckets, but we can still see data in the
disks.
is there a easy way to clean up all osds without actually removing and
reconfiguring them?
what can be the best way to solve this problem? currently we are
experiencing RGW demon crashes as rados still try to look in to old buckets.

Any help is much appreciated.

Regards,
Akkina
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Single ceph client usage with multiple ceph cluster

2021-12-14 Thread Markus Baier

Hello Mosharaf,

yes, that's no problem. On all of my clusters I did not have a ceph.conf
in in the /etc/ceph folders on my nodes at all.

I have a .conf, .conf, 
.conf ...
configuration file in the /etc/ceph folder. One config file for each 
cluster.
The same for the different key files e.g. 
.mon.keyring, .mon.keyring
Or the admin key .client.admin.keyring, 
.mon.keyring


But keep in mind, if you go this way you have to provide the
name of the cluster (points to the configuration file you are refering to)
for nearly every ceph command together with the cluster keyword
e.g: ceph --cluster  health detail
or: rbd --cluster  ls 


Regards
Markus Baier
--
Markus Baier
Systemadministrator
Fachgebiet Self-Organizing Systems
TU Darmstadt, Germany
S3|19 1.7
Rundeturmstrasse 12
64283 Darmstadt

Phone: +49 6151 16-57242
Fax: +49 6151 16-57241
E-Mail: markus.ba...@bcs.tu-darmstadt.de

Am 09.12.21 um 03:35 schrieb Mosharaf Hossain:

Hello Markus
Thank you for your direction.
I would like to let you know that the way you show it is quite 
meaningful but I am afraid how the ceph system would identify the 
configuration file as by default it uses ceph. conf in /etc/ceph 
folder. Can we define the config file as we want?


It will be helpful to give or show us guidelines to add the ceph 
client to multiple clusters.





Regards
Mosharaf Hossain
Deputy Manager, Product Development
IT Division
*
*
Bangladesh Export Import Company Ltd.

Level-8, SAM Tower, Plot #4, Road #22, Gulshan-1, Dhaka-1212,Bangladesh

Tel: +880 9609 000 999, +880 2 5881 5559, Ext: 14191, Fax: +880 2 9895757

Cell: +8801787680828, Email: mosharaf.hoss...@bol-online.com 
, Web: www.bol-online.com





On Tue, Nov 2, 2021 at 7:01 PM Markus Baier 
 wrote:


Hello,

yes you can use a single server to operate multiple clusters.
I have a configuration running, with two independent ceph clusters
running on the same node (of course multiple nodes for the two
clusters)

The trick is to work with multiple ceph.conf files, I use two
seperate ceph.conf files under /etc/ceph/
One is called .conf and the other .conf

Every cluster uses it's seperate network interfaces, so I use four
10GbE
Interfaces
for the two clusters, but you can also use vlans togehter with a
100GbE
Interface or a 100GbE NIC that can provide virtual ports for the
separation
of the networks and the distribution of the network load.

Every cluster uses also a seperate keyring, so e.g for the first
cluster you have a keyring named .mon.keyring
and for the second one .mon.keyring
inside of the /etc/ceph folder.

To administrate the whole thing, ceph provide
the --cluster parameter for the command line programs.
So ceph --cluster  -s
will show the outputs for cluster one and
ceph --cluster  -s
for the cluster two


Regards
Markus Baier

Am 02.11.21 um 13:30 schrieb Mosharaf Hossain:
> Hi Users
> We have two ceph clusters in our lab. We are experimenting to
use a single
> server as a client for two ceph clusters. Can we use the same
client server
> to store keyring for different clusters in ceph.conf file.
>
>
>
>
> Regards
> Mosharaf Hossain
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io