Re: [ceph-users] Separate disk sets for high IO?

2019-12-16 Thread Paul Mezzanini
We use custom device classes to split data nvme from metadata nvme drives.  If 
a device has a class set it does not get overwritten at startup.

Once you set the class it works just like it says on the tin.  Put this pool on 
these classes, this other pool on this other class etc.


--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu

Sent from my phone. Please excuse any brevity or typoos.

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.



From: ceph-users  on behalf of 
dhils...@performair.com 
Sent: Monday, December 16, 2019 6:51:46 PM
To: ceph-users@lists.ceph.com 
Subject: Re: [ceph-users] Separate disk sets for high IO?

Philip;

Ah, ok.  I suspect that isn't documented because the developers don't want 
average users doing it.

It's also possible that it won't work as expected, as there is discussion on 
the web of device classes being changed at startup of the OSD daemon.

That said...

"ceph osd crush class create " is the command to create a custom device 
class, at least in Nautilus 14.2.4.

Theoretically, a custom device class can then be used the same as the built in 
device classes.

Caveat: I'm a user, not a developer of Ceph.

Thank you,

Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com<http://www.PerformAir.com>



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Philip 
Brown
Sent: Monday, December 16, 2019 4:42 PM
To: ceph-users
Subject: Re: [ceph-users] Separate disk sets for high IO?

Yes I saw that thanks.

Unfortunately, that doesnt show use of "custom classes" as someone hinted at.



- Original Message -
From: dhils...@performair.com
To: "ceph-users" 
Cc: "Philip Brown" 
Sent: Monday, December 16, 2019 3:38:49 PM
Subject: RE: Separate disk sets for high IO?

Philip;

There's isn't any documentation that shows specifically how to do that, though 
the below comes close.

Here's the documentation, for Nautilus, on CRUSH operations:
https://docs.ceph.com/docs/nautilus/rados/operations/crush-map/

About a third of the way down the page is a discussion of "Device Classes."  In 
that sections it talks about creating CRUSH rules that target certain device 
classes (hdd, ssd, nvme, by default).

Once you have a rule, you can configure a pool to use the rule.

Thank you,

Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com<http://www.PerformAir.com>


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Philip 
Brown
Sent: Monday, December 16, 2019 3:43 PM
To: Nathan Fish
Cc: ceph-users
Subject: Re: [ceph-users] Separate disk sets for high IO?

Sounds very useful.

Any online example documentation for this?
havent found any so far?


- Original Message -
From: "Nathan Fish" 
To: "Marc Roos" 
Cc: "ceph-users" , "Philip Brown" 
Sent: Monday, December 16, 2019 2:07:44 PM
Subject: Re: [ceph-users] Separate disk sets for high IO?

Indeed, you can set device class to pretty much arbitrary strings and
specify them. By default, 'hdd', 'ssd', and I think 'nvme' are
autodetected - though my Optanes showed up as 'ssd'.

On Mon, Dec 16, 2019 at 4:58 PM Marc Roos  wrote:
>
>
>
> You can classify osd's, eg as ssd. And you can assign this class to a
> pool you create. This way you have have rbd's running on only ssd's. I
> think you have also a class for nvme and you can create custom classes.
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to set osd_crush_initial_weight 0 without restart any service

2019-10-04 Thread Paul Mezzanini
That would accomplish what you are looking for, yes.

Keep in mind that with norebalance that won't stop NEW data from landing there. 
 It will only keep old data from migrating in.  This shouldn't pose too much of 
an issue for most use cases.

--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.



From: Satish Patel 
Sent: Tuesday, October 1, 2019 2:45 PM
To: Paul Mezzanini
Cc: ceph-users
Subject: Re: [ceph-users] how to set osd_crush_initial_weight 0 without restart 
any service

You are saying set "ceph osd set norebalance" before running
ceph-ansible playbook to add OSD

once osd visible in "ceph osd tree"  then i should do reweight to 0
and then do "ceph osd unset norebalance"

On Tue, Oct 1, 2019 at 2:41 PM Paul Mezzanini  wrote:
>
> You could also:
> ceph osd set norebalance
>
>
> --
> Paul Mezzanini
> Sr Systems Administrator / Engineer, Research Computing
> Information & Technology Services
> Finance & Administration
> Rochester Institute of Technology
> o:(585) 475-3245 | pfm...@rit.edu
>
> CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
> intended only for the person(s) or entity to which it is addressed and may
> contain confidential and/or privileged material. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon this
> information by persons or entities other than the intended recipient is
> prohibited. If you received this in error, please contact the sender and
> destroy any copies of this information.
> 
>
> 
> From: ceph-users  on behalf of Satish 
> Patel 
> Sent: Tuesday, October 1, 2019 2:34 PM
> To: ceph-users
> Subject: [ceph-users] how to set osd_crush_initial_weight 0 without restart 
> any service
>
> Folks,
>
> Method: 1
>
> In my lab i am playing with ceph and trying to understand how to add
> new OSD without starting rebalancing.
>
> I want to add this option on fly so i don't need to restart any
> services or anything.
>
> $ ceph tell mon.* injectargs '--osd_crush_initial_weight 0'
>
> $ ceph daemon /var/run/ceph/ceph-mon.*.asok config show | grep
> osd_crush_initial_weight
> "osd_crush_initial_weight": "0.00",
>
> All looks good, now i am adding OSD with ceph-ansible and you know
> what look like it don't honer that option and adding OSD with default
> weight (In my case i have 1.9TB SSD so weight is 1.7)
>
> Can someone confirm injectargs work with osd_crush_initial_weight ?
>
>
> Method: 2
>
> Now i have added that option in ceph-ansible playbook like following
>
> ceph_conf_overrides:
>   osd:
> osd_crush_initial_weight: 0
>
> and i run playbook and it did magic and added OSD with weight zero (0)
>  but i have notice it restarted all OSD daemon on that node, i am
> worried is it safe to restart osd daemon on ceph in production?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to set osd_crush_initial_weight 0 without restart any service

2019-10-01 Thread Paul Mezzanini
You could also:
ceph osd set norebalance


--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.



From: ceph-users  on behalf of Satish Patel 

Sent: Tuesday, October 1, 2019 2:34 PM
To: ceph-users
Subject: [ceph-users] how to set osd_crush_initial_weight 0 without restart any 
service

Folks,

Method: 1

In my lab i am playing with ceph and trying to understand how to add
new OSD without starting rebalancing.

I want to add this option on fly so i don't need to restart any
services or anything.

$ ceph tell mon.* injectargs '--osd_crush_initial_weight 0'

$ ceph daemon /var/run/ceph/ceph-mon.*.asok config show | grep
osd_crush_initial_weight
"osd_crush_initial_weight": "0.00",

All looks good, now i am adding OSD with ceph-ansible and you know
what look like it don't honer that option and adding OSD with default
weight (In my case i have 1.9TB SSD so weight is 1.7)

Can someone confirm injectargs work with osd_crush_initial_weight ?


Method: 2

Now i have added that option in ceph-ansible playbook like following

ceph_conf_overrides:
  osd:
osd_crush_initial_weight: 0

and i run playbook and it did magic and added OSD with weight zero (0)
 but i have notice it restarted all OSD daemon on that node, i am
worried is it safe to restart osd daemon on ceph in production?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-28 Thread Paul Mezzanini
I'll throw my $.02 in from when I was growing our cluster.

My method ended up being to script up the LVM creation so the lvm names reflect 
OSD/Journal serial numbers for easy location later,  "ceph-volume prepare" the 
whole node to get it ready for insertion followed by "ceph-volume activate".  I 
typically see more of an impact on performance with peering instead of with 
rebalancing.  

If I'm doing a whole node, I make sure the node's weight is set to 0 and slowly 
walk it up in chunks.  If it's anything less I just let it fly as-is.  

My workloads didn't seem to mind the increased latency during a huge rebalance 
but another admin has some latency sensitive VMs hosted and by moving it up 
slowly I could easily wait for things to settle if he saw the numbers get too 
high.  It's a simple knob twist to make another admin happy when doing storage 
changes so I do it.


--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.



From: ceph-users  on behalf of Anthony 
D'Atri 
Sent: Sunday, July 28, 2019 4:09 AM
To: ceph-users
Subject: Re: [ceph-users] How to add 100 new OSDs...

Paul Emmerich wrote:

> +1 on adding them all at the same time.
>
> All these methods that gradually increase the weight aren't really
> necessary in newer releases of Ceph.

Because the default backfill/recovery values are lower than they were in, say, 
Dumpling?

Doubling (or more) the size of a cluster in one swoop still means a lot of 
peering and a lot of recovery I/O, I’ve seen a cluster’s data rate go to or 
near 0 for a brief but nonzero length of time.  If something goes wrong with 
the network (cough cough subtle jumbo frame lossage cough) , if one has 
fat-fingered something along the way, etc. going in increments means that a ^C 
lets the cluster stablize before very long.  Then you get to troubleshoot with 
HEALTH_OK instead of HEALTH_WARN or HEALTH_ERR.

Having experienced a cluster be DoS’d for hours when its size was tripled in 
one go, I’m once bitten twice shy.  Yes, that was Dumpling, but even with SSDs 
on Jewel and Luminous I’ve seen sigificant client performance impact from 
en-masse topology changes.

— aad

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] disk usage reported incorrectly

2019-07-17 Thread Paul Mezzanini
Oh my.  That's going to hurt with 788 OSDs.   Time for some creative shell 
scripts and stepping through the nodes.  I'll report back.

--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.



From: Igor Fedotov 
Sent: Wednesday, July 17, 2019 11:33 AM
To: Paul Mezzanini; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] disk usage reported incorrectly

Forgot to provide a workaround...

If that's the case then you need to repair each OSD with corresponding
command in ceph-objectstore-tool...

Thanks,

Igor.


On 7/17/2019 6:29 PM, Paul Mezzanini wrote:
> Sometime after our upgrade to Nautilus our disk usage statistics went off the 
> rails wrong.  I can't tell you exactly when it broke but I know that after 
> the initial upgrade it worked at least for a bit.
>
> Correct numbers should be something similar to: (These are copy/pasted from 
> the autoscale-status report)
>
> POOLSIZE
> cephfs_metadata 327.1G
> cold-ec98.36T
> ceph-bulk-3r142.6T
> cephfs_data31890G
> ceph-hot-2r5276G
> kgcoe-cinder103.2T
> rbd   3098
>
>
> Instead, we now show:
>
> POOL SIZE
> cephfs_metadata362.9G (correct)
> cold-ec607.2G(wrong)
> ceph-bulk-3r5186G (wrong)
> cephfs_data1654G (wrong)
> ceph-hot-2r5884G (correct I think)
> kgcoe-cinder5761G   (wrong)
> rbd128.0k
>
>
> `ceph fs status` reports similar numbers.  cold-ec, ceph-hot-2r and 
> cephfs_data are all cephfs data pools and cephfs_metadata is unsurprisingly, 
> cephfs metadata.  The remaining pools are all used for rbd.
>
>
> Interestingly, the `ceph df` outpool for raw storage feels correct for each 
> drive class while the pool usage is wrong:
>
> RAW STORAGE:
>  CLASS SIZEAVAIL   USEDRAW USED %RAW USED
>  hdd   6.3 PiB 5.2 PiB 1.1 PiB  1.1 PiB 17.08
>  nvme  175 TiB 161 TiB  14 TiB   14 TiB  7.82
>  nvme-meta  14 TiB  11 TiB 2.2 TiB  2.5 TiB 18.45
>  TOTAL 6.5 PiB 5.4 PiB 1.1 PiB  1.1 PiB 16.84
>
> POOLS:
>  POOLID STORED  OBJECTS USED%USED 
> MAX AVAIL
>  kgcoe-cinder24 1.9 TiB  29.49M 5.6 TiB  0.32 
>   582 TiB
>  ceph-bulk-3r32 1.7 TiB  88.28M 5.1 TiB  0.29 
>   582 TiB
>  cephfs_data 35 518 GiB 135.68M 1.6 TiB  0.09 
>   582 TiB
>  cephfs_metadata 36 363 GiB   5.63M 363 GiB  3.35 
>   3.4 TiB
>  rbd 37   931 B   5 128 KiB 0 
>   582 TiB
>  ceph-hot-2r 50 5.7 TiB  18.63M 5.7 TiB  3.72 
>74 TiB
>  cold-ec 51 417 GiB 105.23M 607 GiB  0.02 
>   2.1 PiB
>
>
> Everything is on "ceph version 14.2.1 
> (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable)" and kernel 
> 5.0.21 or 5.0.9.  I'm actually doing the patching now to pull the ceph 
> cluster up to 5.0.21, same as the clients.  I'm not really sure where to dig 
> into this one.  Everything is working fine except disk usage reporting.  This 
> also completely blows up the autoscaler.
>
> I feel like the question is obvious but I'll state it anyway.  How do I get 
> this issue resolved?
>
> Thanks
> -paul
>
> --
> Paul Mezzanini
> Sr Systems Administrator / Engineer, Research Computing
> Information & Technology Services
> Finance & Administration
> Rochester Institute of Technology
> o:(585) 475-3245 | pfm...@rit.edu
>
> CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
> intended only for the person(s) or entity to which it is addressed and may
> contain confidential and/or privileged material. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon this
> information by persons or entities other than the intended recipient is
> prohib

[ceph-users] disk usage reported incorrectly

2019-07-17 Thread Paul Mezzanini
Sometime after our upgrade to Nautilus our disk usage statistics went off the 
rails wrong.  I can't tell you exactly when it broke but I know that after the 
initial upgrade it worked at least for a bit.  

Correct numbers should be something similar to: (These are copy/pasted from the 
autoscale-status report)

POOLSIZE
cephfs_metadata 327.1G  
cold-ec98.36T 
ceph-bulk-3r142.6T  
cephfs_data31890G 
ceph-hot-2r5276G  
kgcoe-cinder103.2T  
rbd   3098 


Instead, we now show:

POOL SIZE
cephfs_metadata362.9G (correct)
cold-ec607.2G(wrong)
ceph-bulk-3r5186G (wrong)
cephfs_data1654G (wrong)
ceph-hot-2r5884G (correct I think)
kgcoe-cinder5761G   (wrong)
rbd128.0k 


`ceph fs status` reports similar numbers.  cold-ec, ceph-hot-2r and cephfs_data 
are all cephfs data pools and cephfs_metadata is unsurprisingly, cephfs 
metadata.  The remaining pools are all used for rbd.


Interestingly, the `ceph df` outpool for raw storage feels correct for each 
drive class while the pool usage is wrong:

RAW STORAGE:
CLASS SIZEAVAIL   USEDRAW USED %RAW USED 
hdd   6.3 PiB 5.2 PiB 1.1 PiB  1.1 PiB 17.08 
nvme  175 TiB 161 TiB  14 TiB   14 TiB  7.82 
nvme-meta  14 TiB  11 TiB 2.2 TiB  2.5 TiB 18.45 
TOTAL 6.5 PiB 5.4 PiB 1.1 PiB  1.1 PiB 16.84 
 
POOLS:
POOLID STORED  OBJECTS USED%USED 
MAX AVAIL 
kgcoe-cinder24 1.9 TiB  29.49M 5.6 TiB  0.32   
582 TiB 
ceph-bulk-3r32 1.7 TiB  88.28M 5.1 TiB  0.29   
582 TiB 
cephfs_data 35 518 GiB 135.68M 1.6 TiB  0.09   
582 TiB 
cephfs_metadata 36 363 GiB   5.63M 363 GiB  3.35   
3.4 TiB 
rbd 37   931 B   5 128 KiB 0   
582 TiB 
ceph-hot-2r 50 5.7 TiB  18.63M 5.7 TiB  3.72
74 TiB 
cold-ec 51 417 GiB 105.23M 607 GiB  0.02   
2.1 PiB 


Everything is on "ceph version 14.2.1 
(d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable)" and kernel 5.0.21 
or 5.0.9.  I'm actually doing the patching now to pull the ceph cluster up to 
5.0.21, same as the clients.  I'm not really sure where to dig into this one.  
Everything is working fine except disk usage reporting.  This also completely 
blows up the autoscaler.  

I feel like the question is obvious but I'll state it anyway.  How do I get 
this issue resolved? 

Thanks
-paul

--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com