Re: [ceph-users] Re-weight Entire Cluster?

2017-05-29 Thread Udo Lembke
Hi Mike,

On 30.05.2017 01:49, Mike Cave wrote:
>
> Greetings All,
>
>  
>
> I recently started working with our ceph cluster here and have been
> reading about weighting.
>
>  
>
> It appears the current best practice is to weight each OSD according
> to it’s size (3.64 for 4TB drive, 7.45 for 8TB drive, etc).
>
>  
>
> As it turns out, it was not configured this way at all; all of the
> OSDs are weighted at 1.
>
>  
>
> So my questions are:
>
>  
>
> Can we re-weight the entire cluster to 3.64 and then re-weight the 8TB
> drives afterwards at a slow rate which won’t impact performance?
>
> If we do an entire re-weight will we have any issues?
>
I would set osd_max_backfills + osd_recovery_max_active to 1 (with
injectargs) before start the reweight to minimize the impact for running
clients.
After set all to 3.64 you can raise the weight for the 8TB-drives one by
one.
Depends on your cluster/OSDs, it's perhaps an good idea to adjust the
primary affinity for the 8-TB drives during reweight?! Otherwise you got
more reads from the (slower) 8TB-drives.


> Would it be better to just reweight the 8TB drives to 2 gradually?
>
I would go for 3.64 - than you have the right settings if you init
further OSDs with ceph-deploy.

Udo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Re-weight Entire Cluster?

2017-05-29 Thread Mike Cave
Greetings All,

I recently started working with our ceph cluster here and have been reading 
about weighting.

It appears the current best practice is to weight each OSD according to it’s 
size (3.64 for 4TB drive, 7.45 for 8TB drive, etc).

As it turns out, it was not configured this way at all; all of the OSDs are 
weighted at 1.

So my questions are:

Can we re-weight the entire cluster to 3.64 and then re-weight the 8TB drives 
afterwards at a slow rate which won’t impact performance?
If we do an entire re-weight will we have any issues?
Would it be better to just reweight the 8TB drives to 2 gradually?

Any and all suggestions are welcome.

Cheers,
Mike Cave
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multi-Tenancy: Network Isolation

2017-05-29 Thread Deepak Naidu
Thanks much Vlad and Dave for suggestions appreciate it !

--
Deepak

On May 29, 2017, at 1:04 AM, Дробышевский, Владимир 
> wrote:

Hi, Deepak!

  The easiest way I can imagine is to use multiple VLANs, put all ceph hosts 
ports into every VLAN and use a wider subnet. For example, you can set 
192.168.0.0/16 for the public ceph network, use 
192.168.0.1-254 IPs for ceph hosts, 192.168.1.1-254/16 IPs for the first 
tenant, 192.168.2.1-254/16 for the second and so on. You'll have to be sure 
that no ceph hosts have any routing facilities running and then get a number of 
isolated L2 networks with the common part. Actually it's not a good way and 
lead to many errors (your tenants must carefully use provided IPs and do not 
cross with other IPs spaces despite of the /16 bitmask).


  An another option is - like David said - L3 routed network. In this case you 
will probably face with network bandwidth problems: all your traffic will go 
through one interface. But if your switches have L3 functionality you can route 
packets there. And again, the problem would be in bandwidth: usually switches 
doesn't have a lot of power and routed bandwidth leaves a lot to desire.


  And the craziest one :-). It just a theory, never tried this in production 
and even in a lab.

  As with previous options you go with multiple per-tenant VLANs and ceph hosts 
ports in all of these VLANs.

  You need to choose a different network for public interfaces, for ex., 
10.0.0.0/24. Then set loopback interface on each ceph host 
and attach a single unique IP to it, like 10.0.0.1/32, 
10.0.0.2/32 and so on. Enable IP forwarding and start RIP 
routing daemon on each ceph host. Setup and configure ceph, use attached IP as 
MON IP.

  Create ceph VLAN with all ceph hosts and set a common network IP subnet (for 
ex, 172.16.0.0/24), attach IP from this network to every 
ceph host. Check that you can reach any of the public (loopback) IPs from any 
ceph host.

  Now create multiple per-tenant VLANs and put ceph hosts ports into every one. 
Set isolated subnets for your tenant's networks, for example, 
192.168.0.0/23, use 192.168.0.x IPs as the additional 
addresses for the ceph hosts, 192.168.1.x as tenant network. Start RIP routing 
daemon on every tenant host. Check that you can reach every ceph public IPs 
(10.0.0.x/32).

  I would also configure RIP daemon to advertise only 10.0.0.x/32 network on 
each ceph host and set RIP daemon on passive mode on client hosts. It's better 
to configure firewall on ceph hosts as well to prevent extra-subnets 
communications.

  In theory it should work but can't say much on how stable would it be.

Best regards,
Vladimir

2017-05-26 20:36 GMT+05:00 Deepak Naidu 
>:
Hi Vlad,

Thanks for chiming in.

>>It's not clear what you want to achieve from the ceph point of view?
Multiple tenancy. We will have multiple tenants from different isolated 
subnet/network accessing single ceph cluster which can support multiple 
tenants. The only problem I see with ceph in a physical env setup is I cannot 
isolate public networks , example mon,mds for multiple subnet/network/tenants.

>>For example, for the network isolation you can use managed switches, set 
>>different VLANs and put ceph hosts to the every VLAN.
Yes we have managed switches with VLAN. And if I add for example 2x public 
interferences on Net1(subnet 192.168.1.0/24) and 
Net2(subnet 192.168.2.0/24) how does the ceph.conf look 
like. How does my mon and MDS server config look like, that's the 
challenge/question.

>>But it's a shoot in the dark as I don't know what exactly you need. For 
>>example, what services (block storage, object storage, API etc) you want to 
>>offer to your tenants and so on

CephFS and Object. I am familiar on how to get the ceph storage part "tenant 
friendly", it's just the network part I need to isolate.

--
Deepak

> On May 26, 2017, at 12:03 AM, Дробышевский, Владимир 
> > wrote:
>
>   It's not clear what you want to achieve from the ceph point of view? For 
> example, for the network isolation you can use managed switches, set 
> different VLANs and put ceph hosts to the every VLAN. But it's a shoot in the 
> dark as I don't know what exactly you need. For example, what services (block 
> storage, object storage, API etc) you want to offer to your tenants and so on
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all 

Re: [ceph-users] strange remap on host failure

2017-05-29 Thread Laszlo Budai

Dear all,

How should ceph react in case of a host failure when from a total of 72 OSDs 12 
are out?
is it normal that for the remapping of the PGs it is not following the rule set 
for in the crush map? (according to the rule the OSDs should be selected from 
different chassis).

in the attached file you can find the crush map, and the results of:
ceph health detail
ceph osd dump
ceph osd tree
ceph -s

I can send the pg dump in a separate mail on request. Its compressed size is 
exceeding the size accepted by this mailing list.

Thank you for any help/directions.

Kind regards,
Laszlo

On 29.05.2017 14:58, Laszlo Budai wrote:


Hello all,

We have a ceph cluster with 72 OSDs distributed on 6 hosts, in 3 chassis. In 
our crush map the we are distributing the PGs on chassis (complete crush map 
below):

# rules
rule replicated_ruleset {
 ruleset 0
 type replicated
 min_size 1
 max_size 10
 step take default
 step chooseleaf firstn 0 type chassis
 step emit
}

We had a host failure, and I can see that ceph is using 2 OSDs from the same 
chassis for a lot of the remapped PGs. Even worse, I can see that there are 
cases when a PG is using two OSDs from the same host like here:

3.5f6   37  0   4   37  0   149446656   30403040
active+remapped 2017-05-26 11:29:23.122820  61820'22207461820:158025
[52,39] 52  [52,39,3]   52  61488'1983562017-05-23 
23:51:56.210597  61488'1983562017-05-23 23:51:56.210597

I have tis in the log:
2017-05-26 11:26:53.244424 osd.52 10.12.193.69:6801/7044 1510 : cluster [INF] 
3.5f6 restarting backfill on osd.39 from (0'0,0'0] MAX to 61488'203000


What can be wrong?


Our crush map looks like this:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3

device 69 osd.69
device 70 osd.70
device 71 osd.71

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host tv-c1-al01 {
 id -7   # do not change unnecessarily
 # weight 21.840
 alg straw
 hash 0  # rjenkins1
 item osd.5 weight 1.820
 item osd.11 weight 1.820
 item osd.17 weight 1.820
 item osd.23 weight 1.820
 item osd.29 weight 1.820
 item osd.35 weight 1.820
 item osd.41 weight 1.820
 item osd.47 weight 1.820
 item osd.53 weight 1.820
 item osd.59 weight 1.820
 item osd.65 weight 1.820
 item osd.71 weight 1.820
}
host tv-c1-al02 {
 id -3   # do not change unnecessarily
 # weight 21.840
 alg straw
 hash 0  # rjenkins1
 item osd.1 weight 1.820
 item osd.7 weight 1.820
 item osd.13 weight 1.820
 item osd.19 weight 1.820
 item osd.25 weight 1.820
 item osd.31 weight 1.820
 item osd.37 weight 1.820
 item osd.43 weight 1.820
 item osd.49 weight 1.820
 item osd.55 weight 1.820
 item osd.61 weight 1.820
 item osd.67 weight 1.820
}
chassis tv-c1 {
 id -8   # do not change unnecessarily
 # weight 43.680
 alg straw
 hash 0  # rjenkins1
 item tv-c1-al01 weight 21.840
 item tv-c1-al02 weight 21.840
}
host tv-c2-al01 {
 id -5   # do not change unnecessarily
 # weight 21.840
 alg straw
 hash 0  # rjenkins1
 item osd.3 weight 1.820
 item osd.9 weight 1.820
 item osd.15 weight 1.820
 item osd.21 weight 1.820
 item osd.27 weight 1.820
 item osd.33 weight 1.820
 item osd.39 weight 1.820
 item osd.45 weight 1.820
 item osd.51 weight 1.820
 item osd.57 weight 1.820
 item osd.63 weight 1.820
 item osd.70 weight 1.820
}
host tv-c2-al02 {
 id -2   # do not change unnecessarily
 # weight 21.840
 alg straw
 hash 0  # rjenkins1
 item osd.0 weight 1.820
 item osd.6 weight 1.820
 item osd.12 weight 1.820
 item osd.18 weight 1.820
 item osd.24 weight 1.820
 item osd.30 weight 1.820
 item osd.36 weight 1.820
 item osd.42 weight 1.820
 item osd.48 weight 1.820
 item osd.54 weight 1.820
 item osd.60 weight 1.820
 item osd.66 weight 1.820
}
chassis tv-c2 {
 id -9   # do not change unnecessarily
 # weight 43.680
 alg straw
 hash 0  # rjenkins1
 item tv-c2-al01 weight 21.840
 item tv-c2-al02 weight 21.840
}
host tv-c1-al03 {
 id -6   # do not change unnecessarily
 # weight 

Re: [ceph-users] Network redundancy...

2017-05-29 Thread Timofey Titovets
2017-05-29 11:37 GMT+03:00 Marco Gaiarin :
>
> I've setup a little Ceph cluster (3 host, 12 OSD), all belonging to a
> single switch, using 2-1Gbit/s LACP links.
>
> Supposing to have two identical switches, there's some way to setup a
> ''redundant'' configuration?
> For example, something similar to 'iSCSI multipath'?
>
>
> I'm reading switch manuals and ceph documentations, but with no luck.
>
>
> Thanks.

Just use balance-alb, this will do a trick with no stack switches

-- 
Have a nice day,
Timofey.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network redundancy...

2017-05-29 Thread Ashley Merrick
The switches your using can they stack? 

If so you could spread the LACP across the two switches.

Sent from my iPhone

> On 29 May 2017, at 4:38 PM, Marco Gaiarin  wrote:
> 
> 
> I've setup a little Ceph cluster (3 host, 12 OSD), all belonging to a
> single switch, using 2-1Gbit/s LACP links.
> 
> Supposing to have two identical switches, there's some way to setup a
> ''redundant'' configuration?
> For example, something similar to 'iSCSI multipath'?
> 
> 
> I'm reading switch manuals and ceph documentations, but with no luck.
> 
> 
> Thanks.
> 
> -- 
> dott. Marco GaiarinGNUPG Key ID: 240A3D66
>  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
>  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
>  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797
> 
>Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
>  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
>(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bug in OSD Maps

2017-05-29 Thread Vincent Godin
We had similar problem few month ago when migrating from hammer to
jewel. We encountered some old bugs (which were declared closed on
Hammer !!!l). We had some OSDs refusing to start because of lack of pg
map like yours, some others which were completly busy and start
declaring valid OSDs losts => the cluster was flapping. These OSDs
were localized on some hosts only, and these hosts were first in giant
then in hammer version before jewel.Hosts which never knew giant were
OK. So in fact, we ported some old bugs from giant to jewel !!! The
solution was to isolate one by one the hosts which were one day in
Giant and to recreate them with a fresh jewel version. That solved the
problem (but it took a lot of time). I hope this will help you ...
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Network redundancy...

2017-05-29 Thread Marco Gaiarin

I've setup a little Ceph cluster (3 host, 12 OSD), all belonging to a
single switch, using 2-1Gbit/s LACP links.

Supposing to have two identical switches, there's some way to setup a
''redundant'' configuration?
For example, something similar to 'iSCSI multipath'?


I'm reading switch manuals and ceph documentations, but with no luck.


Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multi-Tenancy: Network Isolation

2017-05-29 Thread Дробышевский , Владимир
Hi, Deepak!

  The easiest way I can imagine is to use multiple VLANs, put all ceph
hosts ports into every VLAN and use a wider subnet. For example, you can
set 192.168.0.0/16 for the public ceph network, use 192.168.0.1-254 IPs for
ceph hosts, 192.168.1.1-254/16 IPs for the first tenant, 192.168.2.1-254/16
for the second and so on. You'll have to be sure that no ceph hosts have
any routing facilities running and then get a number of isolated L2
networks with the common part. Actually it's not a good way and lead to
many errors (your tenants must carefully use provided IPs and do not cross
with other IPs spaces despite of the /16 bitmask).


  An another option is - like David said - L3 routed network. In this case
you will probably face with network bandwidth problems: all your traffic
will go through one interface. But if your switches have L3 functionality
you can route packets there. And again, the problem would be in bandwidth:
usually switches doesn't have a lot of power and routed bandwidth leaves a
lot to desire.


  And the craziest one :-). It just a theory, never tried this in
production and even in a lab.

  As with previous options you go with multiple per-tenant VLANs and ceph
hosts ports in all of these VLANs.

  You need to choose a different network for public interfaces, for ex.,
10.0.0.0/24. Then set loopback interface on each ceph host and attach a
single unique IP to it, like 10.0.0.1/32, 10.0.0.2/32 and so on. Enable IP
forwarding and start RIP routing daemon on each ceph host. Setup and
configure ceph, use attached IP as MON IP.

  Create ceph VLAN with all ceph hosts and set a common network IP subnet
(for ex, 172.16.0.0/24), attach IP from this network to every ceph host.
Check that you can reach any of the public (loopback) IPs from any ceph
host.

  Now create multiple per-tenant VLANs and put ceph hosts ports into every
one. Set isolated subnets for your tenant's networks, for example,
192.168.0.0/23, use 192.168.0.x IPs as the additional addresses for the
ceph hosts, 192.168.1.x as tenant network. Start RIP routing daemon on
every tenant host. Check that you can reach every ceph public IPs
(10.0.0.x/32).

  I would also configure RIP daemon to advertise only 10.0.0.x/32 network
on each ceph host and set RIP daemon on passive mode on client hosts. It's
better to configure firewall on ceph hosts as well to prevent extra-subnets
communications.

  In theory it should work but can't say much on how stable would it be.

Best regards,
Vladimir

2017-05-26 20:36 GMT+05:00 Deepak Naidu :

> Hi Vlad,
>
> Thanks for chiming in.
>
> >>It's not clear what you want to achieve from the ceph point of view?
> Multiple tenancy. We will have multiple tenants from different isolated
> subnet/network accessing single ceph cluster which can support multiple
> tenants. The only problem I see with ceph in a physical env setup is I
> cannot isolate public networks , example mon,mds for multiple
> subnet/network/tenants.
>
> >>For example, for the network isolation you can use managed switches, set
> different VLANs and put ceph hosts to the every VLAN.
> Yes we have managed switches with VLAN. And if I add for example 2x public
> interferences on Net1(subnet 192.168.1.0/24) and Net2(subnet
> 192.168.2.0/24) how does the ceph.conf look like. How does my mon and MDS
> server config look like, that's the challenge/question.
>
> >>But it's a shoot in the dark as I don't know what exactly you need. For
> example, what services (block storage, object storage, API etc) you want to
> offer to your tenants and so on
>
> CephFS and Object. I am familiar on how to get the ceph storage part
> "tenant friendly", it's just the network part I need to isolate.
>
> --
> Deepak
>
> > On May 26, 2017, at 12:03 AM, Дробышевский, Владимир 
> wrote:
> >
> >   It's not clear what you want to achieve from the ceph point of view?
> For example, for the network isolation you can use managed switches, set
> different VLANs and put ceph hosts to the every VLAN. But it's a shoot in
> the dark as I don't know what exactly you need. For example, what services
> (block storage, object storage, API etc) you want to offer to your tenants
> and so on
> 
> ---
> This email message is for the sole use of the intended recipient(s) and
> may contain
> confidential information.  Any unauthorized review, use, disclosure or
> distribution
> is prohibited.  If you are not the intended recipient, please contact the
> sender by
> reply email and destroy all copies of the original message.
> 
> ---
>



-- 

С уважением,
Дробышевский Владимир
Компания "АйТи Город"
+7 343 192

ИТ-консалтинг
Поставка проектов "под ключ"
Аутсорсинг ИТ-услуг
Аутсорсинг ИТ-инфраструктуры
___
ceph-users mailing list