Re: [ceph-users] Error bluestore doesn't support lvm

2018-07-20 Thread Satish Patel
after google and digging i found this BUG, why its not pushed to all branches ?

https://github.com/ceph/ceph-ansible/commit/d3b427e16990f9ebcde7575aae367fd7dfe36a8d#diff-34d2eea5f7de9a9e89c1e66b15b4cd0a

On Fri, Jul 20, 2018 at 11:26 PM, Satish Patel  wrote:
> My Ceph version is
>
> [root@ceph-osd-02 ~]# ceph -v
> ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous 
> (stable)
>
> On Fri, Jul 20, 2018 at 11:24 PM, Satish Patel  wrote:
>> I am using openstack-ansible with ceph-ansible to deploy my Ceph
>> custer and here is my config in yml file
>>
>> ---
>> osd_objectstore: bluestore
>> osd_scenario: lvm
>> lvm_volumes:
>>   - data: /dev/sdb
>>   - data: /dev/sdc
>>   - data: /dev/sdd
>>   - data: /dev/sde
>>
>>
>> This is the error i am getting..
>>
>> TASK [ceph-osd : check if osd_scenario lvm is supported by the
>> selected ceph version]
>> ***
>> Friday 20 July 2018  23:15:26 -0400 (0:00:00.034)   0:02:00.577 
>> ***
>>  [WARNING]: when statements should not include jinja2 templating
>> delimiters such as {{ }} or {% %}. Found: ceph_release_num.{{
>> ceph_release }} < ceph_release_num.luminous
>>
>>
>> TASK [ceph-osd : verify osd_objectstore is 'filestore' when using the
>> lvm osd_scenario]
>> *
>> Friday 20 July 2018  23:15:27 -0400 (0:00:00.047)   0:02:00.624 
>> ***
>> fatal: [osd2]: FAILED! => {"changed": false, "failed": true, "msg":
>> "the lvm osd_scenario currently only works for filestore, not
>> bluestore"}
>> fatal: [osd1]: FAILED! => {"changed": false, "failed": true, "msg":
>> "the lvm osd_scenario currently only works for filestore, not
>> bluestore"}
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error bluestore doesn't support lvm

2018-07-20 Thread Satish Patel
My Ceph version is

[root@ceph-osd-02 ~]# ceph -v
ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable)

On Fri, Jul 20, 2018 at 11:24 PM, Satish Patel  wrote:
> I am using openstack-ansible with ceph-ansible to deploy my Ceph
> custer and here is my config in yml file
>
> ---
> osd_objectstore: bluestore
> osd_scenario: lvm
> lvm_volumes:
>   - data: /dev/sdb
>   - data: /dev/sdc
>   - data: /dev/sdd
>   - data: /dev/sde
>
>
> This is the error i am getting..
>
> TASK [ceph-osd : check if osd_scenario lvm is supported by the
> selected ceph version]
> ***
> Friday 20 July 2018  23:15:26 -0400 (0:00:00.034)   0:02:00.577 
> ***
>  [WARNING]: when statements should not include jinja2 templating
> delimiters such as {{ }} or {% %}. Found: ceph_release_num.{{
> ceph_release }} < ceph_release_num.luminous
>
>
> TASK [ceph-osd : verify osd_objectstore is 'filestore' when using the
> lvm osd_scenario]
> *
> Friday 20 July 2018  23:15:27 -0400 (0:00:00.047)   0:02:00.624 
> ***
> fatal: [osd2]: FAILED! => {"changed": false, "failed": true, "msg":
> "the lvm osd_scenario currently only works for filestore, not
> bluestore"}
> fatal: [osd1]: FAILED! => {"changed": false, "failed": true, "msg":
> "the lvm osd_scenario currently only works for filestore, not
> bluestore"}
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Error bluestore doesn't support lvm

2018-07-20 Thread Satish Patel
I am using openstack-ansible with ceph-ansible to deploy my Ceph
custer and here is my config in yml file

---
osd_objectstore: bluestore
osd_scenario: lvm
lvm_volumes:
  - data: /dev/sdb
  - data: /dev/sdc
  - data: /dev/sdd
  - data: /dev/sde


This is the error i am getting..

TASK [ceph-osd : check if osd_scenario lvm is supported by the
selected ceph version]
***
Friday 20 July 2018  23:15:26 -0400 (0:00:00.034)   0:02:00.577 ***
 [WARNING]: when statements should not include jinja2 templating
delimiters such as {{ }} or {% %}. Found: ceph_release_num.{{
ceph_release }} < ceph_release_num.luminous


TASK [ceph-osd : verify osd_objectstore is 'filestore' when using the
lvm osd_scenario]
*
Friday 20 July 2018  23:15:27 -0400 (0:00:00.047)   0:02:00.624 ***
fatal: [osd2]: FAILED! => {"changed": false, "failed": true, "msg":
"the lvm osd_scenario currently only works for filestore, not
bluestore"}
fatal: [osd1]: FAILED! => {"changed": false, "failed": true, "msg":
"the lvm osd_scenario currently only works for filestore, not
bluestore"}
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 12.2.7 - Available space decreasing when adding disks

2018-07-20 Thread Glen Baars
Hello Ceph Users,

We have added more ssd storage to our ceph cluster last night. We added 4 x 1TB 
drives and the available space went from 1.6TB to 0.6TB ( in `ceph df` for the 
SSD pool ).

I would assume that the weight needs to be changed but I didn't think I would 
need to? Should I change them to 0.75 from 0.9 and hopefully it will rebalance 
correctly?

#ceph osd tree | grep -v hdd
ID  CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF
-1   534.60309 root default
-1962.90637 host NAS-AUBUN-RK2-CEPH06
115   ssd   0.43660 osd.115   up  1.0 1.0
116   ssd   0.43660 osd.116   up  1.0 1.0
117   ssd   0.43660 osd.117   up  1.0 1.0
118   ssd   0.43660 osd.118   up  1.0 1.0
-22   105.51169 host NAS-AUBUN-RK2-CEPH07
138   ssd   0.90970 osd.138   up  1.0 1.0 Added
139   ssd   0.90970 osd.139   up  1.0 1.0 Added
-25   105.51169 host NAS-AUBUN-RK2-CEPH08
140   ssd   0.90970 osd.140   up  1.0 1.0 Added
141   ssd   0.90970 osd.141   up  1.0 1.0 Added
-356.32617 host NAS-AUBUN-RK3-CEPH01
60   ssd   0.43660 osd.60up  1.0 1.0
61   ssd   0.43660 osd.61up  1.0 1.0
62   ssd   0.43660 osd.62up  1.0 1.0
63   ssd   0.43660 osd.63up  1.0 1.0
-556.32617 host NAS-AUBUN-RK3-CEPH02
64   ssd   0.43660 osd.64up  1.0 1.0
65   ssd   0.43660 osd.65up  1.0 1.0
66   ssd   0.43660 osd.66up  1.0 1.0
67   ssd   0.43660 osd.67up  1.0 1.0
-756.32617 host NAS-AUBUN-RK3-CEPH03
68   ssd   0.43660 osd.68up  1.0 1.0
69   ssd   0.43660 osd.69up  1.0 1.0
70   ssd   0.43660 osd.70up  1.0 1.0
71   ssd   0.43660 osd.71up  1.0 1.0
-1345.84741 host NAS-AUBUN-RK3-CEPH04
72   ssd   0.54579 osd.72up  1.0 1.0
73   ssd   0.54579 osd.73up  1.0 1.0
76   ssd   0.54579 osd.76up  1.0 1.0
77   ssd   0.54579 osd.77up  1.0 1.0
-1645.84741 host NAS-AUBUN-RK3-CEPH05
74   ssd   0.54579 osd.74up  1.0 1.0
75   ssd   0.54579 osd.75up  1.0 1.0
78   ssd   0.54579 osd.78up  1.0 1.0
79   ssd   0.54579 osd.79up  1.0 1.0

# ceph osd df | grep -v hdd
ID  CLASS WEIGHT  REWEIGHT SIZE  USE   AVAIL %USE  VAR  PGS
115   ssd 0.43660  1.0  447G  250G  196G 56.00 1.72 103
116   ssd 0.43660  1.0  447G  191G  255G 42.89 1.32  84
117   ssd 0.43660  1.0  447G  213G  233G 47.79 1.47  92
118   ssd 0.43660  1.0  447G  208G  238G 46.61 1.43  85
138   ssd 0.90970  1.0  931G  820G  111G 88.08 2.71 216 Added
139   ssd 0.90970  1.0  931G  771G  159G 82.85 2.55 207 Added
140   ssd 0.90970  1.0  931G  709G  222G 76.12 2.34 197 Added
141   ssd 0.90970  1.0  931G  664G  267G 71.31 2.19 184 Added
60   ssd 0.43660  1.0  447G  275G  171G 61.62 1.89 100
61   ssd 0.43660  1.0  447G  237G  209G 53.04 1.63  90
62   ssd 0.43660  1.0  447G  275G  171G 61.58 1.89  95
63   ssd 0.43660  1.0  447G  260G  187G 58.15 1.79  97
64   ssd 0.43660  1.0  447G  232G  214G 52.08 1.60  83
65   ssd 0.43660  1.0  447G  207G  239G 46.36 1.42  75
66   ssd 0.43660  1.0  447G  217G  230G 48.54 1.49  84
67   ssd 0.43660  1.0  447G  252G  195G 56.36 1.73  92
68   ssd 0.43660  1.0  447G  248G  198G 55.56 1.71  94
69   ssd 0.43660  1.0  447G  229G  217G 51.25 1.57  84
70   ssd 0.43660  1.0  447G  259G  187G 58.01 1.78  87
71   ssd 0.43660  1.0  447G  267G  179G 59.83 1.84  97
72   ssd 0.54579  1.0  558G  217G  341G 38.96 1.20 100
73   ssd 0.54579  1.0  558G  283G  275G 50.75 1.56 121
76   ssd 0.54579  1.0  558G  286G  272G 51.33 1.58 129
77   ssd 0.54579  1.0  558G  246G  312G 44.07 1.35 104
74   ssd 0.54579  1.0  558G  273G  285G 48.91 1.50 122
75   ssd 0.54579  1.0  558G  281G  276G 50.45 1.55 114
78   ssd 0.54579  1.0  558G  289G  269G 51.80 1.59 133
79   ssd 0.54579  1.0  558G  276G  282G 49.39 1.52 119
Kind regards,
Glen Baars
BackOnline Manager

This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the 

Re: [ceph-users] JBOD question

2018-07-20 Thread Oliver Freyermuth
Hi Satish,

that really completely depends on your controller. 

For what it's worth: We have AVAGO MegaRAID controllers (9361 series). 
They can be switched to a "JBOD personality". After doing so and reinitializing 
(poewrcycling),
the cards change PCI-ID and run a different firmware, optimized for JBOD mode 
(with different caching etc.). Also, the block devices are ordered differently. 

In that mode, new disks will be exported as JBOD by default, but you can still 
do RAID1 and RAID0. 
I think RAID5 and RAID6 are disabled, though. 

We are using those to have a RAID 1 for our OS and export the rest as JBOD for 
CephFS. 

So there surely are controllers which can only do JBOD in addition (without a 
special controller mode / "personality"),
controllers which can be switched, but simple RAID levels are still possible,
and I'm also sure there are controllers out there which can be switched to JBOD 
mode and can't do anything RAID anymore in that mode. 

If that's the case, just go with software RAID for the OS, or install your 
servers with a good deployment tool so you can just reinstall them
if the OS breaks (we also do that for some Ceph servers with simpler RAID 
controllers). With a good deployment tool,
reinstalling takes 1 click and waiting 40 minutes - but of course, the server 
will still be down until a broken OS HDD is replaced physically. 
But Ceph has redundancy for that :-). 

Cheers,
Oliver


Am 20.07.2018 um 23:52 schrieb Satish Patel:
> Thanks Brian,
> 
> That make sense because i was reading document and found you can
> either choose RAID or JBOD
> 
> On Fri, Jul 20, 2018 at 5:33 PM, Brian :  wrote:
>> Hi Satish
>>
>> You should be able to choose different modes of operation for each
>> port / disk. Most dell servers will let you do RAID and JBOD in
>> parallel.
>>
>> If you can't do that and can only either turn RAID on or off then you
>> can use SW RAID for your OS
>>
>>
>> On Fri, Jul 20, 2018 at 9:01 PM, Satish Patel  wrote:
>>> Folks,
>>>
>>> I never used JBOD mode before and now i am planning so i have stupid
>>> question if i switch RAID controller to JBOD mode in that case how
>>> does my OS disk will get mirror?
>>>
>>> Do i need to use software raid for OS disk when i use JBOD mode?
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] JBOD question

2018-07-20 Thread Satish Patel
Thanks Brian,

That make sense because i was reading document and found you can
either choose RAID or JBOD

On Fri, Jul 20, 2018 at 5:33 PM, Brian :  wrote:
> Hi Satish
>
> You should be able to choose different modes of operation for each
> port / disk. Most dell servers will let you do RAID and JBOD in
> parallel.
>
> If you can't do that and can only either turn RAID on or off then you
> can use SW RAID for your OS
>
>
> On Fri, Jul 20, 2018 at 9:01 PM, Satish Patel  wrote:
>> Folks,
>>
>> I never used JBOD mode before and now i am planning so i have stupid
>> question if i switch RAID controller to JBOD mode in that case how
>> does my OS disk will get mirror?
>>
>> Do i need to use software raid for OS disk when i use JBOD mode?
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] JBOD question

2018-07-20 Thread Brian :
Hi Satish

You should be able to choose different modes of operation for each
port / disk. Most dell servers will let you do RAID and JBOD in
parallel.

If you can't do that and can only either turn RAID on or off then you
can use SW RAID for your OS


On Fri, Jul 20, 2018 at 9:01 PM, Satish Patel  wrote:
> Folks,
>
> I never used JBOD mode before and now i am planning so i have stupid
> question if i switch RAID controller to JBOD mode in that case how
> does my OS disk will get mirror?
>
> Do i need to use software raid for OS disk when i use JBOD mode?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mon fail to start for disk issue

2018-07-20 Thread Satish Patel
I am getting this error, why its complaining about disk even we have
enough space

2018-07-20 16:04:58.313331 7f0c047f8ec0  0 set uid:gid to 167:167 (ceph:ceph)
2018-07-20 16:04:58.313350 7f0c047f8ec0  0 ceph version 12.2.7
(3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable), process
ceph-mon, pid 1958
2018-07-20 16:04:58.313406 7f0c047f8ec0 -1 error: monitor data
filesystem reached concerning levels of available storage space
(available: 3% 321 MB)
you may adjust 'mon data avail crit' to a lower value to make this go
away (default: 5%)


This is my disk usage

[root@ostack-infra-01-ceph-mon-container-692bea95 log]# df -h
Filesystem Size  Used Avail Use% Mounted on
/dev/mapper/rootvg01-lv04  432G   30G  403G   7% /
none   492K 0  492K   0% /dev
/dev/mapper/rootvg01-lv01  9.8G  9.5G  320M  97% /var/log
cgroup_root 10M 0   10M   0% /sys/fs/cgroup
tmpfs   16G 0   16G   0% /dev/shm
tmpfs   16G  8.1M   16G   1% /run
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] JBOD question

2018-07-20 Thread Satish Patel
Folks,

I never used JBOD mode before and now i am planning so i have stupid
question if i switch RAID controller to JBOD mode in that case how
does my OS disk will get mirror?

Do i need to use software raid for OS disk when i use JBOD mode?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Ceph-deploy] Cluster Name

2018-07-20 Thread Vasu Kulkarni
On Fri, Jul 20, 2018 at 7:29 AM, Thode Jocelyn  wrote:
> Hi,
>
>
>
> I noticed that in commit
> https://github.com/ceph/ceph-deploy/commit/b1c27b85d524f2553af2487a98023b60efe421f3,
> the ability to specify a cluster name was removed. Is there a reason for
> this removal ?
>
>
>
> Because right now, there are no possibility to create a ceph cluster with a
> different name with ceph-deploy which is a big problem when having two
> clusters replicating with rbd-mirror as we need different names.
>
>
>
> And even when following the doc here:
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/block_device_guide/block_device_mirroring#rbd-mirroring-clusters-with-the-same-name
>
>
>
> This is not sufficient as once we change the CLUSTER variable in the
> sysconfig file, mon,osd, mds etc. all use it and fail to start on a reboot
> as they then try to load data from a path in /var/lib/ceph containing the
> cluster name.

Is you rbd-mirror client also colocated with mon/osd? This needs to be
changed only on the client side where you are doing mirroring, rest of
the nodes are not affected?


>
>
>
> Is there a solution to this problem ?
>
>
>
> Best Regards
>
> Jocelyn Thode
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] [Ceph-deploy] Cluster Name

2018-07-20 Thread Thode Jocelyn
Hi,

I noticed that in commit 
https://github.com/ceph/ceph-deploy/commit/b1c27b85d524f2553af2487a98023b60efe421f3,
 the ability to specify a cluster name was removed. Is there a reason for this 
removal ?

Because right now, there are no possibility to create a ceph cluster with a 
different name with ceph-deploy which is a big problem when having two clusters 
replicating with rbd-mirror as we need different names.

And even when following the doc here: 
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/block_device_guide/block_device_mirroring#rbd-mirroring-clusters-with-the-same-name

This is not sufficient as once we change the CLUSTER variable in the sysconfig 
file, mon,osd, mds etc. all use it and fail to start on a reboot as they then 
try to load data from a path in /var/lib/ceph containing the cluster name.

Is there a solution to this problem ?

Best Regards
Jocelyn Thode
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Default erasure code profile and sustaining loss of one host containing 4 OSDs

2018-07-20 Thread Ziggy Maes
Hello Caspar

That makes a great deal of sense, thank you for elaborating. Am I correct to 
assume that if we were to use a k=2, m=2 profile, it would be identical to a 
replicated pool (since there would be an equal amount of data and parity 
chunks)? Furthermore, how should the proper erasure profile be determined then? 
Are we to strive for a as high as possible data chunk value (k) and a low 
parity/coding value (m)?

Kind regards
Ziggy Maes
DevOps Engineer
CELL +32 478 644 354
SKYPE Ziggy.Maes
[http://www.be-mobile.com/mail/bemobile_email.png]
www.be-mobile.com


From: Caspar Smit 
Date: Friday, 20 July 2018 at 14:15
To: Ziggy Maes 
Cc: "ceph-users@lists.ceph.com" 
Subject: Re: [ceph-users] Default erasure code profile and sustaining loss of 
one host containing 4 OSDs

Ziggy,

For EC pools: min_size = k+1

So in your case (m=1) -> min_size is 3  which is the same as the number of 
shards. So if ANY shard goes down, IO is freezed.

If you choose m=2 min_size will still be 3 but you now have 4 shards (k+m = 4) 
so you can loose a shard and still remain availability.

Of course a failure domain of 'host' is required to do this but since you have 
6 hosts that would be ok.

Met vriendelijke groet,

Caspar Smit
Systemengineer
SuperNAS
Dorsvlegelstraat 13
1445 PA Purmerend

t: (+31) 299 410 414
e: caspars...@supernas.eu
w: www.supernas.eu

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [RBD]Replace block device cluster

2018-07-20 Thread Nino Bosteels
In response to my own questions, I read that you shouldn't separate your 
journal / rocksDB from the disks where your data resides, with bluestore. And 
the general rule of one core per OSD seems to be unnecessary, since in the 
current clusters we've got 4 cores with 5 disks and CPU usage never goes over 
20-30%.


New questions are if I should separate the admin / monitor nodes from the data 
storage nodes (separate HDD, or separate machine?). And if I could use a 
separate machine with an SSD for caching? We can't add SSD's to these dedicated 
machines. So perhaps then the network will be the bottleneck and no remarkable 
speed-boost will be noticed.


Back to the interwebz for research 



From: ceph-users  on behalf of Nino Bosteels 

Sent: 19 July 2018 16:01
To: ceph-users@lists.ceph.com
Subject: [ceph-users] [RBD]Replace block device cluster


We’re looking to replace our existing RBD cluster, which makes and stores our 
backups. Atm we’ve got one machine running backuppc, where the RBD is mounted 
and 8 ceph nodes.



The idea is to gain in speed and/or pay less (or pay equally for moar speed).



Doubting to get SSD in the mix. Have I understood correctly that it’s useful 
for setting up a cache pool and / or for separating the journal? Can I use a 
different server for this?





Old specs (8 machines):

CPU:  Intel Xeon D1520 4c/8t 2.2 GHz/2.6 GHz

RAM:32 GB DDR4 ECC 2133 MHz

Disks:5x 6 TB SAS2

Public network card:  1 x 1  Gbps



40 disks, total of 1159.92 euro



Consideration for new specs:

3 machines:

CPU:  Intel  Xeon E5-2620v3 - 6c/12t - 2.4GHz /3.2GHz

RAM:64GB DDR4 ECC 1866 MHz

Disks:12x 4 TB SAS2

Public network card:  1 x 1  Gbps



36 disks for a total of 990 euro



10 machines:

CPU:  Intel  Xeon D-1521 - 4c/8t - 2,4GHz /2,7GHz

RAM:16GB DDR4 ECC 2133MHz

Disks:4x 6TB

Public network card:  1 x 1  Gbps



40 disks for a total of 940 euro

Perhaps in combination with SSD, this last option?!



Any advice is greatly appreciated.



How do you make your decisions / comparisons? 1 disk per OSD I guess, but  
then, how many cores per disk or stuff like that?



Thanks in advance.



Nino Bosteels
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Default erasure code profile and sustaining loss of one host containing 4 OSDs

2018-07-20 Thread Caspar Smit
Ziggy,

For EC pools: min_size = k+1

So in your case (m=1) -> min_size is 3  which is the same as the number of
shards. So if ANY shard goes down, IO is freezed.

If you choose m=2 min_size will still be 3 but you now have 4 shards (k+m =
4) so you can loose a shard and still remain availability.

Of course a failure domain of 'host' is required to do this but since you
have 6 hosts that would be ok.

Met vriendelijke groet,

Caspar Smit
Systemengineer
SuperNAS
Dorsvlegelstraat 13
1445 PA Purmerend

t: (+31) 299 410 414
e: caspars...@supernas.eu
w: www.supernas.eu

2018-07-20 14:02 GMT+02:00 Ziggy Maes :

> Caspar,
>
>
>
> Thank you for your reply. I’m in all honesty still not clear on what value
> to use for min_size. From what I understand, it should be be set to the sum
> of k+m for erasure coded pools, as it is set by default.
>
>
>
> Additionally, could you elaborate why m=2 would be able to sustain a node
> failure? As stated, we have 6 hosts containing 4 OSDs (so 24) total. What
> would m=2 achieve that m=1 would not?
>
>
>
> Kind regards
>
>
> *Ziggy Maes *DevOps Engineer
> CELL +32 478 644 354
> SKYPE Ziggy.Maes
>
> [image: http://www.be-mobile.com/mail/bemobile_email.png]
> 
>
> *www.be-mobile.com *
>
>
>
>
>
> *From: *Caspar Smit 
> *Date: *Friday, 20 July 2018 at 13:36
> *To: *Ziggy Maes 
> *Cc: *"ceph-users@lists.ceph.com" 
> *Subject: *Re: [ceph-users] Default erasure code profile and sustaining
> loss of one host containing 4 OSDs
>
>
>
> Ziggy,
>
>
>
> The default min_size for your pool is 3 so losing ANY single OSD (not even
> host) will result in reduced data availability:
>
> https://patchwork.kernel.org/patch/8546771/
>
> Use m=2 to be able to handle a node failure.
>
>
>
>
> Met vriendelijke groet,
>
> Caspar Smit
> Systemengineer
> SuperNAS
> Dorsvlegelstraat 13
> 
> 1445 PA Purmerend
>
> t: (+31) 299 410 414
> e: caspars...@supernas.eu
> w: www.supernas.eu
>
>
>
> 2018-07-20 13:11 GMT+02:00 Ziggy Maes :
>
> Hello
>
>
>
> I am currently trying to find out if Ceph can sustain the loss of a full
> host (containing 4 OSDs) in a default erasure coded pool (k=2, m=1). We
> have currently have a production EC pool with the default erasure profile,
> but would like to make sure the data on this pool remains accessible even
> after one of our hosts fail. Since we have a very small cluster (6 hosts, 4
> OSDs per host), I created a custom CRUSH rule to make sure the 3 chunks are
> spread over 3 hosts, screenshot here: https://gyazo.com/
> 1a3ddd6895df0d5e0e425774d2bcb257 .
>
>
>
> Unfortunately, taking one node offline results  in reduced data
> availability and incomplete PGs, as shown here: https://gyazo.com/
> db56d5a52c9de2fd71bf9ae8eb03dbbc .
>
>
>
> My question summed up: is it possible to sustain the loss of a host
> containing 4 OSDs using a k=2, m=1 erasure profile using a CRUSH map that
> spreads data over at least 3 hosts? If so, what am I doing wrong? I realize
> the documentation states that m equals the amount of OSDs that can be lost,
> but assuming a balanced CRUSH map is used I fail to see how this is
> required.
>
>
>
> Many thanks in advance.
>
>
>
> Kind regards
>
>
> *Ziggy Maes *DevOps Engineer
>
> [image: http://www.be-mobile.com/mail/bemobile_email.png]
> 
>
> *www.be-mobile.com *
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Default erasure code profile and sustaining loss of one host containing 4 OSDs

2018-07-20 Thread Ziggy Maes
Caspar,

Thank you for your reply. I’m in all honesty still not clear on what value to 
use for min_size. From what I understand, it should be be set to the sum of k+m 
for erasure coded pools, as it is set by default.

Additionally, could you elaborate why m=2 would be able to sustain a node 
failure? As stated, we have 6 hosts containing 4 OSDs (so 24) total. What would 
m=2 achieve that m=1 would not?

Kind regards
Ziggy Maes
DevOps Engineer
CELL +32 478 644 354
SKYPE Ziggy.Maes
[http://www.be-mobile.com/mail/bemobile_email.png]
www.be-mobile.com


From: Caspar Smit 
Date: Friday, 20 July 2018 at 13:36
To: Ziggy Maes 
Cc: "ceph-users@lists.ceph.com" 
Subject: Re: [ceph-users] Default erasure code profile and sustaining loss of 
one host containing 4 OSDs

Ziggy,

The default min_size for your pool is 3 so losing ANY single OSD (not even 
host) will result in reduced data availability:

https://patchwork.kernel.org/patch/8546771/

Use m=2 to be able to handle a node failure.


Met vriendelijke groet,

Caspar Smit
Systemengineer
SuperNAS
Dorsvlegelstraat 13
1445 PA Purmerend

t: (+31) 299 410 414
e: caspars...@supernas.eu
w: www.supernas.eu

2018-07-20 13:11 GMT+02:00 Ziggy Maes 
mailto:ziggy.m...@be-mobile.com>>:
Hello

I am currently trying to find out if Ceph can sustain the loss of a full host 
(containing 4 OSDs) in a default erasure coded pool (k=2, m=1). We have 
currently have a production EC pool with the default erasure profile, but would 
like to make sure the data on this pool remains accessible even after one of 
our hosts fail. Since we have a very small cluster (6 hosts, 4 OSDs per host), 
I created a custom CRUSH rule to make sure the 3 chunks are spread over 3 
hosts, screenshot here: https://gyazo.com/1a3ddd6895df0d5e0e425774d2bcb257 .

Unfortunately, taking one node offline results  in reduced data availability 
and incomplete PGs, as shown here: 
https://gyazo.com/db56d5a52c9de2fd71bf9ae8eb03dbbc .

My question summed up: is it possible to sustain the loss of a host containing 
4 OSDs using a k=2, m=1 erasure profile using a CRUSH map that spreads data 
over at least 3 hosts? If so, what am I doing wrong? I realize the 
documentation states that m equals the amount of OSDs that can be lost, but 
assuming a balanced CRUSH map is used I fail to see how this is required.

Many thanks in advance.

Kind regards
Ziggy Maes
DevOps Engineer
[http://www.be-mobile.com/mail/bemobile_email.png]
www.be-mobile.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Pool size (capacity)

2018-07-20 Thread Sébastien VIGNERON
Correct, sorry, I have just read the first question and answered too quickly.

As fas as I know the space available is "shared" (the space is a combination of 
OSD drives and crushmap ) between pools using the same device class but you can 
define quota for each pool if needed.
ceph osd pool set-quota  max_objects|max_bytes set 
object or byte limit on pool
ceph osd pool get-quota obtain 
object or byte limits for pool

You can use "ceph df detail" to see you pools usage including quota. As the 
space is "shared", you can't determine a max size for just one pool (except if 
you have only one pool).

# ceph df detail
GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
144T  134T   10254G  6.93   1789k
POOLS:
NAMEID QUOTA OBJECTS QUOTA BYTES USED   
%USED MAX AVAIL OBJECTS DIRTY READ  WRITE  RAW USED
pool1   9  N/A   N/A  7131G 
14.7041369G 1826183 1783k 3847k 14959k   21394G
pool2   10 N/A   N/A 24735M 
 0.0641369G6236  6236 1559k   226k   74205M
pool3   11 N/A   N/A 30188k 
041369G  2929 1259k  4862k   90564k
pool4   12 N/A   N/A  0 
041369G   0 0 0  00
pool5   13 N/A   N/A  0 
041369G   0 0 0  00
pool6   14 N/A   N/A  0 
041369G   0 0 0  00
pool7   15 N/A   N/A  0 
041369G   0 0 0  00
pool8   16 N/A   N/A  0 
041369G   0 0 0  00
pool9   17 N/A   N/A  0 
041369G   0 0 0  00
pool10  18 N/A   N/A  0 
041369G   0 0 0  00
.rgw.root   19 N/A   N/A   2134 
041369G   6 6   231  6 6402
default.rgw.control 20 N/A   N/A  0 
041369G   8 8 0  00
default.rgw.meta21 N/A   N/A363 
041369G   2 212  3 1089
default.rgw.log 22 N/A   N/A  0 
041369G 207   207 8949k  5962k0

You should seek for used and max sizes for images, not pools.
# rbd disk-usage your_pool/your_image
NAME PROVISIONEDUSED
image-1  51200M 102400k

You can see the total provisioned and used sizes for a whole pool using:
# rbd disk-usage -p your_pool --format json | jq .
{
  "images": [
{
  "name": "image-1",
  "provisioned_size": 53687091200,
  "used_size": 104857600
}
  ],
  "total_provisioned_size": 53687091200,
  "total_used_size": 104857600
}

A reminder: most ceph commands can output in json format ( --format=json  or -f 
json), useful with the jq tool.

> Le 20 juil. 2018 à 12:26, si...@turka.nl a écrit :
> 
> Hi Sebastien,
> 
> Your command(s) returns the replication size and not the size in terms of
> bytes.
> 
> I want to see the size of a pool in terms of bytes.
> The MAX AVAIL in "ceph df" is:
> [empty space of an OSD disk with the least empty space] multiplied by
> [amount of OSD]
> 
> That is not what I am looking for.
> 
> Thanks.
> Sinan
> 
>> # for a specific pool:
>> 
>> ceph osd pool get your_pool_name size
>> 
>> 
>>> Le 20 juil. 2018 à 10:32, Sébastien VIGNERON
>>> mailto:sebastien.vigne...@criann.fr>> a 
>>> écrit :
>>> 
>>> #for all pools:
>>> ceph osd pool ls detail
>>> 
>>> 
 Le 20 juil. 2018 à 09:02, si...@turka.nl  a écrit 
 :
 
 Hi,
 
 How can I see the size of a pool? When I create a new empty pool I can
 see
 the capacity of the pool using 'ceph df', but as I start putting data
 in
 the pool the capacity is decreasing.
 
 So the capacity in 'ceph df' is returning the space left on the pool
 and
 not the 'capacity size'.
 
 Thanks!
 
 Sinan
 
 ___
 ceph-users mailing list
 

Re: [ceph-users] Default erasure code profile and sustaining loss of one host containing 4 OSDs

2018-07-20 Thread Caspar Smit
Ziggy,

The default min_size for your pool is 3 so losing ANY single OSD (not even
host) will result in reduced data availability:

https://patchwork.kernel.org/patch/8546771/

Use m=2 to be able to handle a node failure.


Met vriendelijke groet,

Caspar Smit
Systemengineer
SuperNAS
Dorsvlegelstraat 13
1445 PA Purmerend

t: (+31) 299 410 414
e: caspars...@supernas.eu
w: www.supernas.eu

2018-07-20 13:11 GMT+02:00 Ziggy Maes :

> Hello
>
>
>
> I am currently trying to find out if Ceph can sustain the loss of a full
> host (containing 4 OSDs) in a default erasure coded pool (k=2, m=1). We
> have currently have a production EC pool with the default erasure profile,
> but would like to make sure the data on this pool remains accessible even
> after one of our hosts fail. Since we have a very small cluster (6 hosts, 4
> OSDs per host), I created a custom CRUSH rule to make sure the 3 chunks are
> spread over 3 hosts, screenshot here: https://gyazo.com/
> 1a3ddd6895df0d5e0e425774d2bcb257 .
>
>
>
> Unfortunately, taking one node offline results  in reduced data
> availability and incomplete PGs, as shown here: https://gyazo.com/
> db56d5a52c9de2fd71bf9ae8eb03dbbc .
>
>
>
> My question summed up: is it possible to sustain the loss of a host
> containing 4 OSDs using a k=2, m=1 erasure profile using a CRUSH map that
> spreads data over at least 3 hosts? If so, what am I doing wrong? I realize
> the documentation states that m equals the amount of OSDs that can be lost,
> but assuming a balanced CRUSH map is used I fail to see how this is
> required.
>
>
>
> Many thanks in advance.
>
>
>
> Kind regards
>
>
> *Ziggy Maes *DevOps Engineer
>
> [image: http://www.be-mobile.com/mail/bemobile_email.png]
> 
>
> *www.be-mobile.com *
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Default erasure code profile and sustaining loss of one host containing 4 OSDs

2018-07-20 Thread Ziggy Maes
Hello

I am currently trying to find out if Ceph can sustain the loss of a full host 
(containing 4 OSDs) in a default erasure coded pool (k=2, m=1). We have 
currently have a production EC pool with the default erasure profile, but would 
like to make sure the data on this pool remains accessible even after one of 
our hosts fail. Since we have a very small cluster (6 hosts, 4 OSDs per host), 
I created a custom CRUSH rule to make sure the 3 chunks are spread over 3 
hosts, screenshot here: https://gyazo.com/1a3ddd6895df0d5e0e425774d2bcb257 .

Unfortunately, taking one node offline results  in reduced data availability 
and incomplete PGs, as shown here: 
https://gyazo.com/db56d5a52c9de2fd71bf9ae8eb03dbbc .

My question summed up: is it possible to sustain the loss of a host containing 
4 OSDs using a k=2, m=1 erasure profile using a CRUSH map that spreads data 
over at least 3 hosts? If so, what am I doing wrong? I realize the 
documentation states that m equals the amount of OSDs that can be lost, but 
assuming a balanced CRUSH map is used I fail to see how this is required.

Many thanks in advance.

Kind regards
Ziggy Maes
DevOps Engineer
[http://www.be-mobile.com/mail/bemobile_email.png]
www.be-mobile.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] design question - NVME + NLSAS, SSD or SSD + NLSAS

2018-07-20 Thread Satish Patel
No way I'm expert and let see what other folks suggesting but I would say go 
with Intel if you only care about performance. 

Sent from my iPhone

> On Jul 19, 2018, at 12:54 PM, Steven Vacaroaia  wrote:
> 
> Hi,
> I would appreciate any advice ( with arguments , if possible) regarding the 
> best design approach considering below facts
> 
> - budget is set to XX amount 
> - goal is to get as much performance / capacity as possible using XX 
> - 4 to 6 servers,  DELL R620/R630 with 8 disk slots, 64 G RAM and  8 cores
> - CEPH usage - mainly provide block device through SCST to VMWare 
> 
> I  consider the following options ( if there is one that I miss, please let 
> me know)
> 
> 1.
>NVME + NLSAS, OS running on internal SD cards 
>   8 to 1 NVME/rotational drives ratio
> 
> 2.
>   2 x write intensive SSD RAID1 + 6 x NLSAS, OS running on a partition of 
> the RAID1 SSD
>   6 to 1 SSD / rotational drive ratio
> 
> 3. 
>  2 x write intesive SSD + 6 x NLSAS, OS on SD card
>   3 to 1 SSD / rotationl ratio
> 
> 4. "regular" SSD  , OS on SD card  
> 
> For the drives, I have the following
> 
> 'regular" SSD
> Intel DC S4500 ( 900$)  or  Micron 5100 PRO ( 1100$) or SM863a (1300$)
> NLSAS
>Segate   ST2000NX0403
> 
> Still debating the model for SSD write intensive
>Models that I am considering 
>   HGST Ultrastar SS300 HUSMM3240ASS200 ( 1200$)
> SDLTMDKW- 200G-5CA1 ( 1600$)
> 
> Still debating on NVME between Intel and Samsung
> 
> Many thanks for any/all advice 
> 
> Steven
>   
>  
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Converting to BlueStore, and external journal devices

2018-07-20 Thread Marc Roos
 
I had similar question a while ago, maybe these you want to read.

https://www.mail-archive.com/ceph-users@lists.ceph.com/msg46768.html

https://www.mail-archive.com/ceph-users@lists.ceph.com/msg46799.html





-Original Message-
From: Satish Patel [mailto:satish@gmail.com] 
Sent: vrijdag 20 juli 2018 13:00
To: Eugen Block
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Converting to BlueStore, and external journal 
devices

What is the use of LVM in blurstore, I have seen people using LVM but 
don't know why ?

Sent from my iPhone

> On Jul 19, 2018, at 10:00 AM, Eugen Block  wrote:
> 
> Hi,
> 
> if you have SSDs for RocksDB, you should provide that in the command 
(--block.db $DEV), otherwise Ceph will use the one provided disk for all 
data and RocksDB/WAL.
> Before you create that OSD you probably should check out the help page 
for that command, maybe there are more options you should be aware of, 
e.g. a separate WAL on NVMe.
> 
> Regards,
> Eugen
> 
> 
> Zitat von Robert Stanford :
> 
>> I am following the steps here:
>> http://docs.ceph.com/docs/mimic/rados/operations/bluestore-migration/
>> 
>> The final step is:
>> 
>> ceph-volume lvm create --bluestore --data $DEVICE --osd-id $ID
>> 
>> 
>> I notice this command doesn't specify a device to use as the journal. 
 
>> Is it implied that BlueStore will use the same (OSD) device for the 
function?
>> I don't think that's what I want (I have spinning disks for data, and 

>> SSDs for journals).  Is there any reason *not* to specify which 
>> device to use for a journal, when creating an OSD with BlueStore 
capability?
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Converting to BlueStore, and external journal devices

2018-07-20 Thread Satish Patel
What is the use of LVM in blurstore, I have seen people using LVM but don't 
know why ?

Sent from my iPhone

> On Jul 19, 2018, at 10:00 AM, Eugen Block  wrote:
> 
> Hi,
> 
> if you have SSDs for RocksDB, you should provide that in the command 
> (--block.db $DEV), otherwise Ceph will use the one provided disk for all data 
> and RocksDB/WAL.
> Before you create that OSD you probably should check out the help page for 
> that command, maybe there are more options you should be aware of, e.g. a 
> separate WAL on NVMe.
> 
> Regards,
> Eugen
> 
> 
> Zitat von Robert Stanford :
> 
>> I am following the steps here:
>> http://docs.ceph.com/docs/mimic/rados/operations/bluestore-migration/
>> 
>> The final step is:
>> 
>> ceph-volume lvm create --bluestore --data $DEVICE --osd-id $ID
>> 
>> 
>> I notice this command doesn't specify a device to use as the journal.  Is
>> it implied that BlueStore will use the same (OSD) device for the function?
>> I don't think that's what I want (I have spinning disks for data, and SSDs
>> for journals).  Is there any reason *not* to specify which device to use
>> for a journal, when creating an OSD with BlueStore capability?
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Pool size (capacity)

2018-07-20 Thread Marc Roos
 

That is the used column not?


[@c01 ~]# ceph df
GLOBAL:
SIZE   AVAIL  RAW USED %RAW USED
G G   G 60.78
POOLS:
NAME  ID USED   %USED MAX 
AVAIL OBJECTS
iscsi-images  16 37 0 
2668G   4
rbd   17   873G 24.65 
2668G  257490
fs_meta   19   203M 0 
2668G 2482591




-Original Message-
From: si...@turka.nl [mailto:si...@turka.nl] 
Sent: vrijdag 20 juli 2018 12:27
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Pool size (capacity)

Hi Sebastien,

Your command(s) returns the replication size and not the size in terms 
of bytes.

I want to see the size of a pool in terms of bytes.
The MAX AVAIL in "ceph df" is:
[empty space of an OSD disk with the least empty space] multiplied by 
[amount of OSD]

That is not what I am looking for.

Thanks.
Sinan

> # for a specific pool:
>
> ceph osd pool get your_pool_name size
>
>
>> Le 20 juil. 2018 à 10:32, Sébastien VIGNERON 
>>  a écrit :
>>
>> #for all pools:
>> ceph osd pool ls detail
>>
>>
>>> Le 20 juil. 2018 à 09:02, si...@turka.nl a écrit :
>>>
>>> Hi,
>>>
>>> How can I see the size of a pool? When I create a new empty pool I 
>>> can see the capacity of the pool using 'ceph df', but as I start 
>>> putting data in the pool the capacity is decreasing.
>>>
>>> So the capacity in 'ceph df' is returning the space left on the pool 

>>> and not the 'capacity size'.
>>>
>>> Thanks!
>>>
>>> Sinan
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Pool size (capacity)

2018-07-20 Thread sinan
Hi Sebastien,

Your command(s) returns the replication size and not the size in terms of
bytes.

I want to see the size of a pool in terms of bytes.
The MAX AVAIL in "ceph df" is:
[empty space of an OSD disk with the least empty space] multiplied by
[amount of OSD]

That is not what I am looking for.

Thanks.
Sinan

> # for a specific pool:
>
> ceph osd pool get your_pool_name size
>
>
>> Le 20 juil. 2018 à 10:32, Sébastien VIGNERON
>>  a écrit :
>>
>> #for all pools:
>> ceph osd pool ls detail
>>
>>
>>> Le 20 juil. 2018 à 09:02, si...@turka.nl a écrit :
>>>
>>> Hi,
>>>
>>> How can I see the size of a pool? When I create a new empty pool I can
>>> see
>>> the capacity of the pool using 'ceph df', but as I start putting data
>>> in
>>> the pool the capacity is decreasing.
>>>
>>> So the capacity in 'ceph df' is returning the space left on the pool
>>> and
>>> not the 'capacity size'.
>>>
>>> Thanks!
>>>
>>> Sinan
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Be careful with orphans find (was Re: Lost TB for Object storage)

2018-07-20 Thread CUZA Frédéric
Hi Matthew,
Thanks for the advice but we are no longer using orphans find since the problem 
does not seem to be solved with it.

Regards,

-Message d'origine-
De : Matthew Vernon  
Envoyé : 20 July 2018 11:03
À : CUZA Frédéric ; ceph-users@lists.ceph.com
Objet : Be careful with orphans find (was Re: [ceph-users] Lost TB for Object 
storage)

Hi,

On 19/07/18 17:19, CUZA Frédéric wrote:

> After that we tried to remove the orphans :
> 
> radosgw-admin orphans find -pool= default.rgw.buckets.data 
> --job-id=ophans_clean
> 
> radosgw-admin orphans finish --job-id=ophans_clean
> 
> It finds some orphans : 85, but the command finish seems not to work, 
> so we decided to manually delete those ophans by piping the output of 
> find in a log file.

I would advise caution with using the "orphans find" code in radosgw-admin. On 
the advice of our vendor, we ran this and automatically removed the resulting 
objects. Unfortunately, a small proportion of the objects found and removed 
thus were not in fact orphans - meaning we ended up with some damaged S3 
objects; they appeared in bucket listings, but you'd get 404 if you tried to 
download them.

We have asked our vendor to make the wider community aware of the issue, but 
they have not (yet) done so.

Regards,

Matthew


--
 The Wellcome Sanger Institute is operated by Genome Research  Limited, a 
charity registered in England with number 1021457 and a  company registered in 
England with number 2742969, whose registered  office is 215 Euston Road, 
London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Be careful with orphans find (was Re: Lost TB for Object storage)

2018-07-20 Thread Matthew Vernon
Hi,

On 19/07/18 17:19, CUZA Frédéric wrote:

> After that we tried to remove the orphans :
> 
> radosgw-admin orphans find –pool= default.rgw.buckets.data
> --job-id=ophans_clean
> 
> radosgw-admin orphans finish --job-id=ophans_clean
> 
> It finds some orphans : 85, but the command finish seems not to work, so
> we decided to manually delete those ophans by piping the output of find
> in a log file.

I would advise caution with using the "orphans find" code in
radosgw-admin. On the advice of our vendor, we ran this and
automatically removed the resulting objects. Unfortunately, a small
proportion of the objects found and removed thus were not in fact
orphans - meaning we ended up with some damaged S3 objects; they
appeared in bucket listings, but you'd get 404 if you tried to download
them.

We have asked our vendor to make the wider community aware of the issue,
but they have not (yet) done so.

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Pool size (capacity)

2018-07-20 Thread Eugen Block

Hi,


ceph osd pool get your_pool_name size
ceph osd pool ls detail


these are commands to get the size of a pool regarding the  
replication, not the available storage.



So the capacity in 'ceph df' is returning the space left on the pool and
not the 'capacity size'.


I'm not aware of a limitation in pool capacity except for your OSD  
sizes, CRUSH device classes and such. The more OSDs you have the more  
available space you'll have. An example:


POOLS:
NAMEID USED   %USED MAX AVAIL OBJECTS
pool-1  1   2694G 52.67 2420G  646186
pool-1-cache33 15130M  1.97  733G3734
cephfs-data 35   945G 28.08 2420G 3148723
cephfs-metadata 36   232M  0.03  733G  432347
test1   45309 0 2420G   4
test2   48309 0 2420G   4

You see the exact same "MAX AVAIL" sizes for all HDD pools (pool-1,  
cephfs-data, test1, test2) of 2420G. The cache tier pool-1-cache has  
the same space available as the cephfs-metadata as they both reside on  
the same SSD-OSDs.


Hope this clears it up a little.

Regards,
Eugen


Zitat von Sébastien VIGNERON :


# for a specific pool:

ceph osd pool get your_pool_name size


Le 20 juil. 2018 à 10:32, Sébastien VIGNERON  
 a écrit :


#for all pools:
ceph osd pool ls detail



Le 20 juil. 2018 à 09:02, si...@turka.nl a écrit :

Hi,

How can I see the size of a pool? When I create a new empty pool I can see
the capacity of the pool using 'ceph df', but as I start putting data in
the pool the capacity is decreasing.

So the capacity in 'ceph df' is returning the space left on the pool and
not the 'capacity size'.

Thanks!

Sinan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 12.2.6 CRC errors

2018-07-20 Thread Stefan Schneebeli
In the meantime I upgraded the cluster to 12.2.7 and added the osd 
distrust data digest = true setting in ceph.conf because it's mixed 
cluster.


But I still see a constantly growing number of inconsistent PG's and 
Scrub errors.
If I check the running ceph config with ceph --admin-daemon 
/var/run/ceph/ceph-osd.0.asok config show I can't find this setting.


Did I something wrong? Is it safe to go ahead with the migration to 
bluestore?


Stefan




We are in the process of building the 12.2.7 release now that will fix
this.  (If you don't want to wait you can also install the autobuilt
packages from shaman.ceph.com... official packages are only a few hours
away from being ready though).

I would set data migration for the time being (norebalance).  Once the 
new

version is install it will stop creating the crc mismatches and it
will prevent them from triggering an incorrect EIO on read.  However,
scrub doesn't repair them yet.  They will tend to go away on their own
as normal IO touches the affected objects.  In 12.2.8 scrub will repair
the CRCs.

In the meantime, while waiting for the fix, you can set
osd_skip_data_digest = false to avoid generating more errors.  But note
that once you upgrade you need to turn that back on (or
osd_distruct_data_digest) to apply the fix/workaround.

You'll want to read the 12.2.7 release notes carefully (PR at
https://github.com/ceph/ceph/pull/23057).

The bug doesn't corrupt data; only the whole-object checksums.  
However,
some reads (when the entire object is read) will see the bad checksum 
and

return EIO.  This could break applications at a higher layer (although
hopefully they will just abort and exit cleanly; it is hard to tell 
given

the breadth of workloads).

I hope that helps, and I'm very sorry this regression crept in!
sage


On Mon, 16 Jul 2018, Stefan Schneebeli wrote:


hello guys,

unfortunately I missed the warning on friday and upgraded my cluster 
on

saturday to 12.2.6.
The cluster is in a migration state from filestore to bluestore (10/2) 
and I

get constantly inconsistent PG's only on the two bluestore OSD's.
If I run a rados list-inconsistent-obj 2.17 --format=json-pretty for 
example I

see at the end this mismatches:

"shards": [
{
"osd": 0,
"primary": true,
"errors": [],
"size": 4194304,
"omap_digest": "0x"
},
{
"osd": 1,
"primary": false,
"errors": [
"data_digest_mismatch_info"
],
"size": 4194304,
"omap_digest": "0x",
"data_digest": "0x21b21973"

Is this the issue you talking about ?
I can repair this PG's wth ceph pg repair and it reports the error is 
fixed.

But is it really fixed?
Do I have to be afraid to have now corrupted data?
Would it be an option to noout this bluestore OSD's and stop them?
When do you expect the new 12.2.7 Release? Will it fix all the errors?

Thank you in advance for your answers!

Stefan





-- Originalnachricht --
Von: "Sage Weil" 
An: "Glen Baars" 
Cc: "ceph-users@lists.ceph.com" 
Gesendet: 14.07.2018 19:15:57
Betreff: Re: [ceph-users] 12.2.6 CRC errors

> On Sat, 14 Jul 2018, Glen Baars wrote:
> > Hello Ceph users!
> >
> > Note to users, don't install new servers on Friday the 13th!
> >
> > We added a new ceph node on Friday and it has received the latest 
12.2.6
> > update. I started to see CRC errors and investigated hardware 
issues. I
> > have since found that it is caused by the 12.2.6 release. About 
80TB

> > copied onto this server.
> >
> > I have set noout,noscrub,nodeepscrub and repaired the affected PGs 
(

> > ceph pg repair ) . This has cleared the errors.
> >
> > * no idea if this is a good way to fix the issue. From the bug
> > report this issue is in the deepscrub and therefore I suppose 
stopping

> > it will limit the issues. ***
> >
> > Can anyone tell me what to do? Downgrade seems that it won't fix 
the
> > issue. Maybe remove this node and rebuild with 12.2.5 and resync 
data?

> > Wait a few days for 12.2.7?
>
> I would sit tight for now.  I'm working on the right fix and hope to
> having something to test shortly, and possibly a release by 
tomorrow.

>
> There is a remaining danger is that for the objects with bad 
full-object
> digests, that a read of the entire object will throw an EIO.  It's 
up
> to you whether you want to try to quiesce workloads to avoid that 
(to

> prevent corruption at higher layers) or avoid a service
> degradation/outage.  :(  Unfortunately I don't have super precise 
guidance

> as far as how likely that is.
>
> Are you using bluestore only, or is it a mix of bluestore and 
filestore?

>
> sage
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> 

Re: [ceph-users] Pool size (capacity)

2018-07-20 Thread Sébastien VIGNERON
# for a specific pool:

ceph osd pool get your_pool_name size


> Le 20 juil. 2018 à 10:32, Sébastien VIGNERON  a 
> écrit :
> 
> #for all pools:
> ceph osd pool ls detail
> 
> 
>> Le 20 juil. 2018 à 09:02, si...@turka.nl a écrit :
>> 
>> Hi,
>> 
>> How can I see the size of a pool? When I create a new empty pool I can see
>> the capacity of the pool using 'ceph df', but as I start putting data in
>> the pool the capacity is decreasing.
>> 
>> So the capacity in 'ceph df' is returning the space left on the pool and
>> not the 'capacity size'.
>> 
>> Thanks!
>> 
>> Sinan
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Pool size (capacity)

2018-07-20 Thread Sébastien VIGNERON
#for all pools:
ceph osd pool ls detail


> Le 20 juil. 2018 à 09:02, si...@turka.nl a écrit :
> 
> Hi,
> 
> How can I see the size of a pool? When I create a new empty pool I can see
> the capacity of the pool using 'ceph df', but as I start putting data in
> the pool the capacity is decreasing.
> 
> So the capacity in 'ceph df' is returning the space left on the pool and
> not the 'capacity size'.
> 
> Thanks!
> 
> Sinan
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 12.2.6 upgrade

2018-07-20 Thread Glen Baars
Thanks, we are fully bluestore and therefore just set osd skip data digest = 
true

Kind regards,
Glen Baars

-Original Message-
From: Dan van der Ster 
Sent: Friday, 20 July 2018 4:08 PM
To: Glen Baars 
Cc: ceph-users 
Subject: Re: [ceph-users] 12.2.6 upgrade

That's right. But please read the notes carefully to understand if you need to 
set
   osd skip data digest = true
or
   osd distrust data digest = true

.. dan

On Fri, Jul 20, 2018 at 10:02 AM Glen Baars  wrote:
>
> I saw that on the release notes.
>
> Does that mean that the active+clean+inconsistent PGs will be OK?
>
> Is the data still getting replicated even if inconsistent?
>
> Kind regards,
> Glen Baars
>
> -Original Message-
> From: Dan van der Ster 
> Sent: Friday, 20 July 2018 3:57 PM
> To: Glen Baars 
> Cc: ceph-users 
> Subject: Re: [ceph-users] 12.2.6 upgrade
>
> CRC errors are expected in 12.2.7 if you ran 12.2.6 with bluestore.
> See
> https://ceph.com/releases/12-2-7-luminous-released/#upgrading-from-v12
> -2-6
>
> On Fri, Jul 20, 2018 at 8:30 AM Glen Baars  
> wrote:
> >
> > Hello Ceph Users,
> >
> >
> >
> > We have upgraded all nodes to 12.2.7 now. We have 90PGs ( ~2000 scrub 
> > errors ) to fix from the time when we ran 12.2.6. It doesn’t seem to be 
> > affecting production at this time.
> >
> >
> >
> > Below is the log of a PG repair. What is the best way to correct these 
> > errors? Is there any further information required?
> >
> >
> >
> > rados list-inconsistent-obj 1.275 --format=json-pretty
> >
> > {
> >
> > "epoch": 38481,
> >
> > "inconsistents": []
> >
> > }
> >
> >
> >
> > Is it odd that it doesn’t list any inconsistents?
> >
> >
> >
> > Ceph.log entries for this PG.
> >
> > 2018-07-20 12:13:28.381903 osd.124 osd.124 10.4.35.36:6810/1865422
> > 81 : cluster [ERR] 1.275 shard 100: soid
> > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head
> > data_digest 0x1a131dab != data_digest 0x92f2c4c8 from auth oi
> > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head(37917'3148
> > 36 client.1079025.0:24453722 dirty|data_digest|omap_digest s 4194304
> > uv 314836 dd 92f2c4c8 od  alloc_hint [4194304 4194304 0])
> >
> > 2018-07-20 12:13:28.381907 osd.124 osd.124 10.4.35.36:6810/1865422
> > 82 : cluster [ERR] 1.275 shard 124: soid
> > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head
> > data_digest 0x1a131dab != data_digest 0x92f2c4c8 from auth oi
> > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head(37917'3148
> > 36 client.1079025.0:24453722 dirty|data_digest|omap_digest s 4194304
> > uv 314836 dd 92f2c4c8 od  alloc_hint [4194304 4194304 0])
> >
> > 2018-07-20 12:13:28.381909 osd.124 osd.124 10.4.35.36:6810/1865422
> > 83 : cluster [ERR] 1.275 soid
> > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head: failed to
> > pick suitable auth object
> >
> > 2018-07-20 12:15:15.310579 osd.124 osd.124 10.4.35.36:6810/1865422
> > 84 : cluster [ERR] 1.275 shard 100: soid
> > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head
> > data_digest 0xdf907335 != data_digest 0x38400b00 from auth oi
> > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head(38269'3306
> > 51 client.232404.0:23912666 dirty|data_digest|omap_digest s 4194304
> > uv 307138 dd 38400b00 od  alloc_hint [4194304 4194304 0])
> >
> > 2018-07-20 12:15:15.310582 osd.124 osd.124 10.4.35.36:6810/1865422
> > 85 : cluster [ERR] 1.275 shard 124: soid
> > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head
> > data_digest 0xdf907335 != data_digest 0x38400b00 from auth oi
> > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head(38269'3306
> > 51 client.232404.0:23912666 dirty|data_digest|omap_digest s 4194304
> > uv 307138 dd 38400b00 od  alloc_hint [4194304 4194304 0])
> >
> > 2018-07-20 12:15:15.310584 osd.124 osd.124 10.4.35.36:6810/1865422
> > 86 : cluster [ERR] 1.275 soid
> > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head: failed to
> > pick suitable auth object
> >
> > 2018-07-20 12:16:07.518970 osd.124 osd.124 10.4.35.36:6810/1865422
> > 87 : cluster [ERR] 1.275 shard 100: soid
> > 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head
> > data_digest 0x6555a7c9 != data_digest 0xbad822f from auth oi
> > 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head(37917'3148
> > 79 client.1079025.0:24564045 dirty|data_digest|omap_digest s 4194304
> > uv 314879 dd bad822f od  alloc_hint [4194304 4194304 0])
> >
> > 2018-07-20 12:16:07.518975 osd.124 osd.124 10.4.35.36:6810/1865422
> > 88 : cluster [ERR] 1.275 shard 124: soid
> > 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head
> > data_digest 0x6555a7c9 != data_digest 0xbad822f from auth oi
> > 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head(37917'3148
> > 79 client.1079025.0:24564045 dirty|data_digest|omap_digest s 4194304
> > uv 314879 dd bad822f od  alloc_hint [4194304 4194304 0])
> >
> > 2018-07-20 12:16:07.518977 osd.124 osd.124 10.4.35.36:6810/1865422
> > 89 : 

Re: [ceph-users] 12.2.6 upgrade

2018-07-20 Thread Dan van der Ster
That's right. But please read the notes carefully to understand if you
need to set
   osd skip data digest = true
or
   osd distrust data digest = true

.. dan

On Fri, Jul 20, 2018 at 10:02 AM Glen Baars  wrote:
>
> I saw that on the release notes.
>
> Does that mean that the active+clean+inconsistent PGs will be OK?
>
> Is the data still getting replicated even if inconsistent?
>
> Kind regards,
> Glen Baars
>
> -Original Message-
> From: Dan van der Ster 
> Sent: Friday, 20 July 2018 3:57 PM
> To: Glen Baars 
> Cc: ceph-users 
> Subject: Re: [ceph-users] 12.2.6 upgrade
>
> CRC errors are expected in 12.2.7 if you ran 12.2.6 with bluestore. See
> https://ceph.com/releases/12-2-7-luminous-released/#upgrading-from-v12-2-6
>
> On Fri, Jul 20, 2018 at 8:30 AM Glen Baars  
> wrote:
> >
> > Hello Ceph Users,
> >
> >
> >
> > We have upgraded all nodes to 12.2.7 now. We have 90PGs ( ~2000 scrub 
> > errors ) to fix from the time when we ran 12.2.6. It doesn’t seem to be 
> > affecting production at this time.
> >
> >
> >
> > Below is the log of a PG repair. What is the best way to correct these 
> > errors? Is there any further information required?
> >
> >
> >
> > rados list-inconsistent-obj 1.275 --format=json-pretty
> >
> > {
> >
> > "epoch": 38481,
> >
> > "inconsistents": []
> >
> > }
> >
> >
> >
> > Is it odd that it doesn’t list any inconsistents?
> >
> >
> >
> > Ceph.log entries for this PG.
> >
> > 2018-07-20 12:13:28.381903 osd.124 osd.124 10.4.35.36:6810/1865422 81 : 
> > cluster [ERR] 1.275 shard 100: soid 
> > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head data_digest 
> > 0x1a131dab != data_digest 0x92f2c4c8 from auth oi 
> > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head(37917'314836 
> > client.1079025.0:24453722 dirty|data_digest|omap_digest s 4194304 uv 314836 
> > dd 92f2c4c8 od  alloc_hint [4194304 4194304 0])
> >
> > 2018-07-20 12:13:28.381907 osd.124 osd.124 10.4.35.36:6810/1865422 82 : 
> > cluster [ERR] 1.275 shard 124: soid 
> > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head data_digest 
> > 0x1a131dab != data_digest 0x92f2c4c8 from auth oi 
> > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head(37917'314836 
> > client.1079025.0:24453722 dirty|data_digest|omap_digest s 4194304 uv 314836 
> > dd 92f2c4c8 od  alloc_hint [4194304 4194304 0])
> >
> > 2018-07-20 12:13:28.381909 osd.124 osd.124 10.4.35.36:6810/1865422 83 : 
> > cluster [ERR] 1.275 soid 
> > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head: failed to pick 
> > suitable auth object
> >
> > 2018-07-20 12:15:15.310579 osd.124 osd.124 10.4.35.36:6810/1865422 84 : 
> > cluster [ERR] 1.275 shard 100: soid 
> > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head data_digest 
> > 0xdf907335 != data_digest 0x38400b00 from auth oi 
> > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head(38269'330651 
> > client.232404.0:23912666 dirty|data_digest|omap_digest s 4194304 uv 307138 
> > dd 38400b00 od  alloc_hint [4194304 4194304 0])
> >
> > 2018-07-20 12:15:15.310582 osd.124 osd.124 10.4.35.36:6810/1865422 85 : 
> > cluster [ERR] 1.275 shard 124: soid 
> > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head data_digest 
> > 0xdf907335 != data_digest 0x38400b00 from auth oi 
> > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head(38269'330651 
> > client.232404.0:23912666 dirty|data_digest|omap_digest s 4194304 uv 307138 
> > dd 38400b00 od  alloc_hint [4194304 4194304 0])
> >
> > 2018-07-20 12:15:15.310584 osd.124 osd.124 10.4.35.36:6810/1865422 86 : 
> > cluster [ERR] 1.275 soid 
> > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head: failed to pick 
> > suitable auth object
> >
> > 2018-07-20 12:16:07.518970 osd.124 osd.124 10.4.35.36:6810/1865422 87 : 
> > cluster [ERR] 1.275 shard 100: soid 
> > 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head data_digest 
> > 0x6555a7c9 != data_digest 0xbad822f from auth oi 
> > 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head(37917'314879 
> > client.1079025.0:24564045 dirty|data_digest|omap_digest s 4194304 uv 314879 
> > dd bad822f od  alloc_hint [4194304 4194304 0])
> >
> > 2018-07-20 12:16:07.518975 osd.124 osd.124 10.4.35.36:6810/1865422 88 : 
> > cluster [ERR] 1.275 shard 124: soid 
> > 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head data_digest 
> > 0x6555a7c9 != data_digest 0xbad822f from auth oi 
> > 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head(37917'314879 
> > client.1079025.0:24564045 dirty|data_digest|omap_digest s 4194304 uv 314879 
> > dd bad822f od  alloc_hint [4194304 4194304 0])
> >
> > 2018-07-20 12:16:07.518977 osd.124 osd.124 10.4.35.36:6810/1865422 89 : 
> > cluster [ERR] 1.275 soid 
> > 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head: failed to pick 
> > suitable auth object
> >
> > 2018-07-20 12:16:29.476778 osd.124 osd.124 10.4.35.36:6810/1865422 90 : 
> > cluster [ERR] 1.275 shard 

Re: [ceph-users] 12.2.6 upgrade

2018-07-20 Thread Glen Baars
I saw that on the release notes.

Does that mean that the active+clean+inconsistent PGs will be OK?

Is the data still getting replicated even if inconsistent?

Kind regards,
Glen Baars

-Original Message-
From: Dan van der Ster 
Sent: Friday, 20 July 2018 3:57 PM
To: Glen Baars 
Cc: ceph-users 
Subject: Re: [ceph-users] 12.2.6 upgrade

CRC errors are expected in 12.2.7 if you ran 12.2.6 with bluestore. See
https://ceph.com/releases/12-2-7-luminous-released/#upgrading-from-v12-2-6

On Fri, Jul 20, 2018 at 8:30 AM Glen Baars  wrote:
>
> Hello Ceph Users,
>
>
>
> We have upgraded all nodes to 12.2.7 now. We have 90PGs ( ~2000 scrub errors 
> ) to fix from the time when we ran 12.2.6. It doesn’t seem to be affecting 
> production at this time.
>
>
>
> Below is the log of a PG repair. What is the best way to correct these 
> errors? Is there any further information required?
>
>
>
> rados list-inconsistent-obj 1.275 --format=json-pretty
>
> {
>
> "epoch": 38481,
>
> "inconsistents": []
>
> }
>
>
>
> Is it odd that it doesn’t list any inconsistents?
>
>
>
> Ceph.log entries for this PG.
>
> 2018-07-20 12:13:28.381903 osd.124 osd.124 10.4.35.36:6810/1865422 81 : 
> cluster [ERR] 1.275 shard 100: soid 
> 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head data_digest 
> 0x1a131dab != data_digest 0x92f2c4c8 from auth oi 
> 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head(37917'314836 
> client.1079025.0:24453722 dirty|data_digest|omap_digest s 4194304 uv 314836 
> dd 92f2c4c8 od  alloc_hint [4194304 4194304 0])
>
> 2018-07-20 12:13:28.381907 osd.124 osd.124 10.4.35.36:6810/1865422 82 : 
> cluster [ERR] 1.275 shard 124: soid 
> 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head data_digest 
> 0x1a131dab != data_digest 0x92f2c4c8 from auth oi 
> 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head(37917'314836 
> client.1079025.0:24453722 dirty|data_digest|omap_digest s 4194304 uv 314836 
> dd 92f2c4c8 od  alloc_hint [4194304 4194304 0])
>
> 2018-07-20 12:13:28.381909 osd.124 osd.124 10.4.35.36:6810/1865422 83 : 
> cluster [ERR] 1.275 soid 
> 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head: failed to pick 
> suitable auth object
>
> 2018-07-20 12:15:15.310579 osd.124 osd.124 10.4.35.36:6810/1865422 84 : 
> cluster [ERR] 1.275 shard 100: soid 
> 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head data_digest 
> 0xdf907335 != data_digest 0x38400b00 from auth oi 
> 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head(38269'330651 
> client.232404.0:23912666 dirty|data_digest|omap_digest s 4194304 uv 307138 dd 
> 38400b00 od  alloc_hint [4194304 4194304 0])
>
> 2018-07-20 12:15:15.310582 osd.124 osd.124 10.4.35.36:6810/1865422 85 : 
> cluster [ERR] 1.275 shard 124: soid 
> 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head data_digest 
> 0xdf907335 != data_digest 0x38400b00 from auth oi 
> 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head(38269'330651 
> client.232404.0:23912666 dirty|data_digest|omap_digest s 4194304 uv 307138 dd 
> 38400b00 od  alloc_hint [4194304 4194304 0])
>
> 2018-07-20 12:15:15.310584 osd.124 osd.124 10.4.35.36:6810/1865422 86 : 
> cluster [ERR] 1.275 soid 
> 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head: failed to pick 
> suitable auth object
>
> 2018-07-20 12:16:07.518970 osd.124 osd.124 10.4.35.36:6810/1865422 87 : 
> cluster [ERR] 1.275 shard 100: soid 
> 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head data_digest 
> 0x6555a7c9 != data_digest 0xbad822f from auth oi 
> 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head(37917'314879 
> client.1079025.0:24564045 dirty|data_digest|omap_digest s 4194304 uv 314879 
> dd bad822f od  alloc_hint [4194304 4194304 0])
>
> 2018-07-20 12:16:07.518975 osd.124 osd.124 10.4.35.36:6810/1865422 88 : 
> cluster [ERR] 1.275 shard 124: soid 
> 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head data_digest 
> 0x6555a7c9 != data_digest 0xbad822f from auth oi 
> 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head(37917'314879 
> client.1079025.0:24564045 dirty|data_digest|omap_digest s 4194304 uv 314879 
> dd bad822f od  alloc_hint [4194304 4194304 0])
>
> 2018-07-20 12:16:07.518977 osd.124 osd.124 10.4.35.36:6810/1865422 89 : 
> cluster [ERR] 1.275 soid 
> 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head: failed to pick 
> suitable auth object
>
> 2018-07-20 12:16:29.476778 osd.124 osd.124 10.4.35.36:6810/1865422 90 : 
> cluster [ERR] 1.275 shard 100: soid 
> 1:ae47e410:::rbd_data.37c2374b0dc51.00024b09:head data_digest 
> 0xa394e845 != data_digest 0xd8aa931c from auth oi 
> 1:ae47e410:::rbd_data.37c2374b0dc51.00024b09:head(33683'302224 
> client.1079025.0:22963765 dirty|data_digest|omap_digest s 4194304 uv 302224 
> dd d8aa931c od  alloc_hint [4194304 4194304 0])
>
> 2018-07-20 12:16:29.476783 osd.124 osd.124 10.4.35.36:6810/1865422 91 : 
> cluster [ERR] 1.275 

Re: [ceph-users] 12.2.6 upgrade

2018-07-20 Thread Dan van der Ster
CRC errors are expected in 12.2.7 if you ran 12.2.6 with bluestore. See
https://ceph.com/releases/12-2-7-luminous-released/#upgrading-from-v12-2-6

On Fri, Jul 20, 2018 at 8:30 AM Glen Baars  wrote:
>
> Hello Ceph Users,
>
>
>
> We have upgraded all nodes to 12.2.7 now. We have 90PGs ( ~2000 scrub errors 
> ) to fix from the time when we ran 12.2.6. It doesn’t seem to be affecting 
> production at this time.
>
>
>
> Below is the log of a PG repair. What is the best way to correct these 
> errors? Is there any further information required?
>
>
>
> rados list-inconsistent-obj 1.275 --format=json-pretty
>
> {
>
> "epoch": 38481,
>
> "inconsistents": []
>
> }
>
>
>
> Is it odd that it doesn’t list any inconsistents?
>
>
>
> Ceph.log entries for this PG.
>
> 2018-07-20 12:13:28.381903 osd.124 osd.124 10.4.35.36:6810/1865422 81 : 
> cluster [ERR] 1.275 shard 100: soid 
> 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head data_digest 
> 0x1a131dab != data_digest 0x92f2c4c8 from auth oi 
> 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head(37917'314836 
> client.1079025.0:24453722 dirty|data_digest|omap_digest s 4194304 uv 314836 
> dd 92f2c4c8 od  alloc_hint [4194304 4194304 0])
>
> 2018-07-20 12:13:28.381907 osd.124 osd.124 10.4.35.36:6810/1865422 82 : 
> cluster [ERR] 1.275 shard 124: soid 
> 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head data_digest 
> 0x1a131dab != data_digest 0x92f2c4c8 from auth oi 
> 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head(37917'314836 
> client.1079025.0:24453722 dirty|data_digest|omap_digest s 4194304 uv 314836 
> dd 92f2c4c8 od  alloc_hint [4194304 4194304 0])
>
> 2018-07-20 12:13:28.381909 osd.124 osd.124 10.4.35.36:6810/1865422 83 : 
> cluster [ERR] 1.275 soid 
> 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head: failed to pick 
> suitable auth object
>
> 2018-07-20 12:15:15.310579 osd.124 osd.124 10.4.35.36:6810/1865422 84 : 
> cluster [ERR] 1.275 shard 100: soid 
> 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head data_digest 
> 0xdf907335 != data_digest 0x38400b00 from auth oi 
> 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head(38269'330651 
> client.232404.0:23912666 dirty|data_digest|omap_digest s 4194304 uv 307138 dd 
> 38400b00 od  alloc_hint [4194304 4194304 0])
>
> 2018-07-20 12:15:15.310582 osd.124 osd.124 10.4.35.36:6810/1865422 85 : 
> cluster [ERR] 1.275 shard 124: soid 
> 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head data_digest 
> 0xdf907335 != data_digest 0x38400b00 from auth oi 
> 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head(38269'330651 
> client.232404.0:23912666 dirty|data_digest|omap_digest s 4194304 uv 307138 dd 
> 38400b00 od  alloc_hint [4194304 4194304 0])
>
> 2018-07-20 12:15:15.310584 osd.124 osd.124 10.4.35.36:6810/1865422 86 : 
> cluster [ERR] 1.275 soid 
> 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head: failed to pick 
> suitable auth object
>
> 2018-07-20 12:16:07.518970 osd.124 osd.124 10.4.35.36:6810/1865422 87 : 
> cluster [ERR] 1.275 shard 100: soid 
> 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head data_digest 
> 0x6555a7c9 != data_digest 0xbad822f from auth oi 
> 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head(37917'314879 
> client.1079025.0:24564045 dirty|data_digest|omap_digest s 4194304 uv 314879 
> dd bad822f od  alloc_hint [4194304 4194304 0])
>
> 2018-07-20 12:16:07.518975 osd.124 osd.124 10.4.35.36:6810/1865422 88 : 
> cluster [ERR] 1.275 shard 124: soid 
> 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head data_digest 
> 0x6555a7c9 != data_digest 0xbad822f from auth oi 
> 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head(37917'314879 
> client.1079025.0:24564045 dirty|data_digest|omap_digest s 4194304 uv 314879 
> dd bad822f od  alloc_hint [4194304 4194304 0])
>
> 2018-07-20 12:16:07.518977 osd.124 osd.124 10.4.35.36:6810/1865422 89 : 
> cluster [ERR] 1.275 soid 
> 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head: failed to pick 
> suitable auth object
>
> 2018-07-20 12:16:29.476778 osd.124 osd.124 10.4.35.36:6810/1865422 90 : 
> cluster [ERR] 1.275 shard 100: soid 
> 1:ae47e410:::rbd_data.37c2374b0dc51.00024b09:head data_digest 
> 0xa394e845 != data_digest 0xd8aa931c from auth oi 
> 1:ae47e410:::rbd_data.37c2374b0dc51.00024b09:head(33683'302224 
> client.1079025.0:22963765 dirty|data_digest|omap_digest s 4194304 uv 302224 
> dd d8aa931c od  alloc_hint [4194304 4194304 0])
>
> 2018-07-20 12:16:29.476783 osd.124 osd.124 10.4.35.36:6810/1865422 91 : 
> cluster [ERR] 1.275 shard 124: soid 
> 1:ae47e410:::rbd_data.37c2374b0dc51.00024b09:head data_digest 
> 0xa394e845 != data_digest 0xd8aa931c from auth oi 
> 1:ae47e410:::rbd_data.37c2374b0dc51.00024b09:head(33683'302224 
> client.1079025.0:22963765 dirty|data_digest|omap_digest s 4194304 uv 302224 
> dd d8aa931c od  alloc_hint [4194304 4194304 

Re: [ceph-users] active+clean+inconsistent PGs after upgrade to 12.2.7

2018-07-20 Thread Dan van der Ster
On Thu, Jul 19, 2018 at 11:51 AM Robert Sander
 wrote:
>
> On 19.07.2018 11:15, Ronny Aasen wrote:
>
> > Did you upgrade from 12.2.5 or 12.2.6 ?
>
> Yes.
>
> > sounds like you hit the reason for the 12.2.7 release
> >
> > read : https://ceph.com/releases/12-2-7-luminous-released/
> >
> > there should come features in 12.2.8 that can deal with the "objects are
> > in sync but checksums are wrong" scenario.
>
> I already read that before the upgrade but did not consider to be
> affected by the bug.
>
> The pools with the inconsistent PGs only have RBDs stored and not CephFS
> nor RGW data.
>
> I have restarted the OSDs with "osd skip data digest = true" as a "ceph
> tell" is not able to inject this argument into the running processes.
>
> Let's see if this works out.

If you upgraded from 12.2.6 and have bluestore osds, then you would be
affected by the *first* of the two issues described in the release
notes, regardless of rgw/rbd/cephfs use-cases.

This paragraph applies:

```
If your cluster includes BlueStore OSDs and was affected, deep scrubs
will generate errors about mismatched CRCs for affected objects.
Currently the repair operation does not know how to correct them
(since all replicas do not match the expected checksum it does not
know how to proceed). These warnings are harmless in the sense that
IO is not affected and the replicas are all still in sync. The number
of affected objects is likely to drop (possibly to zero) on their own
over time as those objects are modified. We expect to include a scrub
improvement in v12.2.8 to clean up any remaining objects.
```
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Pool size (capacity)

2018-07-20 Thread sinan
Hi,

How can I see the size of a pool? When I create a new empty pool I can see
the capacity of the pool using 'ceph df', but as I start putting data in
the pool the capacity is decreasing.

So the capacity in 'ceph df' is returning the space left on the pool and
not the 'capacity size'.

Thanks!

Sinan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PGs go to down state when OSD fails

2018-07-20 Thread shrey chauhan
Hi,

I am trying to understand what happens when an OSD fails.

Few days back I wanted to check what happens when an OSD goes down for that
what I did was I just went to the node and stopped one of the osd's
service. When OSD went in down state pgs started recovering and after
sometime everything seemed fine as everything was recovered and the osd
went in OUT and DOWN state I thought great I don't really have to worry
about loss of data on osd going down.
But recently an OSD went down on its own and I saw pgs were not able to
recover they went to down state and everything was stuck, so I had to run
this command

ceph osd lost osd_number

Which is not really safe and I might lose data here.
I am not able to understand why it did not happen when I stopped the
service the first time and why did it actually happen. As in RF2 all OSD
data is replicated to other osds so Ideally it should have come back in
normal state on its own.

Can someone please explain what am I missing here?

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 12.2.6 upgrade

2018-07-20 Thread Glen Baars
Hello Ceph Users,

We have upgraded all nodes to 12.2.7 now. We have 90PGs ( ~2000 scrub errors ) 
to fix from the time when we ran 12.2.6. It doesn't seem to be affecting 
production at this time.

Below is the log of a PG repair. What is the best way to correct these errors? 
Is there any further information required?

rados list-inconsistent-obj 1.275 --format=json-pretty
{
"epoch": 38481,
"inconsistents": []
}

Is it odd that it doesn't list any inconsistents?

Ceph.log entries for this PG.
2018-07-20 12:13:28.381903 osd.124 osd.124 10.4.35.36:6810/1865422 81 : cluster 
[ERR] 1.275 shard 100: soid 
1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head data_digest 
0x1a131dab != data_digest 0x92f2c4c8 from auth oi 
1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head(37917'314836 
client.1079025.0:24453722 dirty|data_digest|omap_digest s 4194304 uv 314836 dd 
92f2c4c8 od  alloc_hint [4194304 4194304 0])
2018-07-20 12:13:28.381907 osd.124 osd.124 10.4.35.36:6810/1865422 82 : cluster 
[ERR] 1.275 shard 124: soid 
1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head data_digest 
0x1a131dab != data_digest 0x92f2c4c8 from auth oi 
1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head(37917'314836 
client.1079025.0:24453722 dirty|data_digest|omap_digest s 4194304 uv 314836 dd 
92f2c4c8 od  alloc_hint [4194304 4194304 0])
2018-07-20 12:13:28.381909 osd.124 osd.124 10.4.35.36:6810/1865422 83 : cluster 
[ERR] 1.275 soid 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head: 
failed to pick suitable auth object
2018-07-20 12:15:15.310579 osd.124 osd.124 10.4.35.36:6810/1865422 84 : cluster 
[ERR] 1.275 shard 100: soid 
1:ae455519:::rbd_data.3844874b0dc51.000293f2:head data_digest 
0xdf907335 != data_digest 0x38400b00 from auth oi 
1:ae455519:::rbd_data.3844874b0dc51.000293f2:head(38269'330651 
client.232404.0:23912666 dirty|data_digest|omap_digest s 4194304 uv 307138 dd 
38400b00 od  alloc_hint [4194304 4194304 0])
2018-07-20 12:15:15.310582 osd.124 osd.124 10.4.35.36:6810/1865422 85 : cluster 
[ERR] 1.275 shard 124: soid 
1:ae455519:::rbd_data.3844874b0dc51.000293f2:head data_digest 
0xdf907335 != data_digest 0x38400b00 from auth oi 
1:ae455519:::rbd_data.3844874b0dc51.000293f2:head(38269'330651 
client.232404.0:23912666 dirty|data_digest|omap_digest s 4194304 uv 307138 dd 
38400b00 od  alloc_hint [4194304 4194304 0])
2018-07-20 12:15:15.310584 osd.124 osd.124 10.4.35.36:6810/1865422 86 : cluster 
[ERR] 1.275 soid 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head: 
failed to pick suitable auth object
2018-07-20 12:16:07.518970 osd.124 osd.124 10.4.35.36:6810/1865422 87 : cluster 
[ERR] 1.275 shard 100: soid 
1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head data_digest 
0x6555a7c9 != data_digest 0xbad822f from auth oi 
1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head(37917'314879 
client.1079025.0:24564045 dirty|data_digest|omap_digest s 4194304 uv 314879 dd 
bad822f od  alloc_hint [4194304 4194304 0])
2018-07-20 12:16:07.518975 osd.124 osd.124 10.4.35.36:6810/1865422 88 : cluster 
[ERR] 1.275 shard 124: soid 
1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head data_digest 
0x6555a7c9 != data_digest 0xbad822f from auth oi 
1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head(37917'314879 
client.1079025.0:24564045 dirty|data_digest|omap_digest s 4194304 uv 314879 dd 
bad822f od  alloc_hint [4194304 4194304 0])
2018-07-20 12:16:07.518977 osd.124 osd.124 10.4.35.36:6810/1865422 89 : cluster 
[ERR] 1.275 soid 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head: 
failed to pick suitable auth object
2018-07-20 12:16:29.476778 osd.124 osd.124 10.4.35.36:6810/1865422 90 : cluster 
[ERR] 1.275 shard 100: soid 
1:ae47e410:::rbd_data.37c2374b0dc51.00024b09:head data_digest 
0xa394e845 != data_digest 0xd8aa931c from auth oi 
1:ae47e410:::rbd_data.37c2374b0dc51.00024b09:head(33683'302224 
client.1079025.0:22963765 dirty|data_digest|omap_digest s 4194304 uv 302224 dd 
d8aa931c od  alloc_hint [4194304 4194304 0])
2018-07-20 12:16:29.476783 osd.124 osd.124 10.4.35.36:6810/1865422 91 : cluster 
[ERR] 1.275 shard 124: soid 
1:ae47e410:::rbd_data.37c2374b0dc51.00024b09:head data_digest 
0xa394e845 != data_digest 0xd8aa931c from auth oi 
1:ae47e410:::rbd_data.37c2374b0dc51.00024b09:head(33683'302224 
client.1079025.0:22963765 dirty|data_digest|omap_digest s 4194304 uv 302224 dd 
d8aa931c od  alloc_hint [4194304 4194304 0])
2018-07-20 12:16:29.476787 osd.124 osd.124 10.4.35.36:6810/1865422 92 : cluster 
[ERR] 1.275 soid 1:ae47e410:::rbd_data.37c2374b0dc51.00024b09:head: 
failed to pick suitable auth object
2018-07-20 12:19:59.498922 osd.124 osd.124 10.4.35.36:6810/1865422 93 : cluster 
[ERR] 1.275 shard 100: soid 
1:ae4de127:::rbd_data.37c2374b0dc51.0002f6a6:head data_digest 
0x2008cb1b != data_digest 0x218b7cb4 from