[ceph-users] When will Ceph 0.72.3?

2014-10-29 Thread Irek Fasikhov
Dear developers.

Very much want io priorities ;)
During the execution of Snap roollback appear slow queries.

Thanks
-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Use 2 osds to create cluster but health check display "active+degraded"

2014-10-29 Thread Vickie CH
Hi all,
  Try to use two OSDs to create a cluster. After the deply finished, I
found the health status is "88 active+degraded" "104 active+remapped".
Before use 2 osds to create cluster the result is ok. I'm confuse why this
situation happened. Do I need to set crush map to fix this problem?


--ceph.conf-
[global]
fsid = c404ded6-4086-4f0b-b479-89bc018af954
mon_initial_members = storage0
mon_host = 192.168.1.10
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 2
osd_pool_default_min_size = 1
osd_pool_default_pg_num = 128
osd_journal_size = 2048
osd_pool_default_pgp_num = 128
osd_mkfs_type = xfs
-

---ceph -s---
cluster c404ded6-4086-4f0b-b479-89bc018af954
 health HEALTH_WARN 88 pgs degraded; 192 pgs stuck unclean
 monmap e1: 1 mons at {storage0=192.168.10.10:6789/0}, election epoch
2, quorum 0 storage0
 osdmap e20: 2 osds: 2 up, 2 in
  pgmap v45: 192 pgs, 3 pools, 0 bytes data, 0 objects
79752 kB used, 1858 GB / 1858 GB avail
  88 active+degraded
 104 active+remapped



Best wishes,
Mika
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Use 2 osds to create cluster but health check display "active+degraded"

2014-10-29 Thread Irek Fasikhov
Hi.

Because the disc requires three different hosts, the default number of
replications 3.

2014-10-29 10:56 GMT+03:00 Vickie CH :

> Hi all,
>   Try to use two OSDs to create a cluster. After the deply finished, I
> found the health status is "88 active+degraded" "104 active+remapped".
> Before use 2 osds to create cluster the result is ok. I'm confuse why this
> situation happened. Do I need to set crush map to fix this problem?
>
>
> --ceph.conf-
> [global]
> fsid = c404ded6-4086-4f0b-b479-89bc018af954
> mon_initial_members = storage0
> mon_host = 192.168.1.10
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> osd_pool_default_size = 2
> osd_pool_default_min_size = 1
> osd_pool_default_pg_num = 128
> osd_journal_size = 2048
> osd_pool_default_pgp_num = 128
> osd_mkfs_type = xfs
> -
>
> ---ceph -s---
> cluster c404ded6-4086-4f0b-b479-89bc018af954
>  health HEALTH_WARN 88 pgs degraded; 192 pgs stuck unclean
>  monmap e1: 1 mons at {storage0=192.168.10.10:6789/0}, election epoch
> 2, quorum 0 storage0
>  osdmap e20: 2 osds: 2 up, 2 in
>   pgmap v45: 192 pgs, 3 pools, 0 bytes data, 0 objects
> 79752 kB used, 1858 GB / 1858 GB avail
>   88 active+degraded
>  104 active+remapped
> 
>
>
> Best wishes,
> Mika
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Use 2 osds to create cluster but health check display "active+degraded"

2014-10-29 Thread Mark Kirkwood
It looks to me like this has been considered (mapping default pool size 
to 2). However just to check - this *does* mean that you need two (real 
or virtual) hosts - if the two osds are on the same host then crush map 
adjustment (hosts -> osds) will be required.


Regards

Mark


On 29/10/14 21:29, Irek Fasikhov wrote:

Hi.

Because the disc requires three different hosts, the default number of
replications 3.

2014-10-29 10:56 GMT+03:00 Vickie CH mailto:mika.leaf...@gmail.com>>:




osd_pool_default_size = 2
osd_pool_default_min_size = 1



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Use 2 osds to create cluster but health check display "active+degraded"

2014-10-29 Thread Irek Fasikhov
Hi.
This parameter does not apply to pools by default.
ceph osd dump | grep pool. see size=?


2014-10-29 11:40 GMT+03:00 Vickie CH :

> Der Irek:
>
> Thanks for your reply.
> Even already set "osd_pool_default_size = 2" the cluster still need 3
> different hosts right?
> Is this default number can be changed by user and write into ceph.conf
> before deploy?
>
>
> Best wishes,
> Mika
>
> 2014-10-29 16:29 GMT+08:00 Irek Fasikhov :
>
>> Hi.
>>
>> Because the disc requires three different hosts, the default number of
>> replications 3.
>>
>> 2014-10-29 10:56 GMT+03:00 Vickie CH :
>>
>>> Hi all,
>>>   Try to use two OSDs to create a cluster. After the deply finished,
>>> I found the health status is "88 active+degraded" "104 active+remapped".
>>> Before use 2 osds to create cluster the result is ok. I'm confuse why this
>>> situation happened. Do I need to set crush map to fix this problem?
>>>
>>>
>>> --ceph.conf-
>>> [global]
>>> fsid = c404ded6-4086-4f0b-b479-89bc018af954
>>> mon_initial_members = storage0
>>> mon_host = 192.168.1.10
>>> auth_cluster_required = cephx
>>> auth_service_required = cephx
>>> auth_client_required = cephx
>>> filestore_xattr_use_omap = true
>>> osd_pool_default_size = 2
>>> osd_pool_default_min_size = 1
>>> osd_pool_default_pg_num = 128
>>> osd_journal_size = 2048
>>> osd_pool_default_pgp_num = 128
>>> osd_mkfs_type = xfs
>>> -
>>>
>>> ---ceph -s---
>>> cluster c404ded6-4086-4f0b-b479-89bc018af954
>>>  health HEALTH_WARN 88 pgs degraded; 192 pgs stuck unclean
>>>  monmap e1: 1 mons at {storage0=192.168.10.10:6789/0}, election
>>> epoch 2, quorum 0 storage0
>>>  osdmap e20: 2 osds: 2 up, 2 in
>>>   pgmap v45: 192 pgs, 3 pools, 0 bytes data, 0 objects
>>> 79752 kB used, 1858 GB / 1858 GB avail
>>>   88 active+degraded
>>>  104 active+remapped
>>> 
>>>
>>>
>>> Best wishes,
>>> Mika
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>>
>> --
>> С уважением, Фасихов Ирек Нургаязович
>> Моб.: +79229045757
>>
>
>


-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph MeetUp Berlin: Performance

2014-10-29 Thread Robert Sander
Hi,

the next Ceph MeetUp in Berlin is scheduled for November 24.

Lars Marowsky-Brée of SuSE will talk about Ceph performance.

Please RSVP at http://www.meetup.com/Ceph-Berlin/events/215147892/

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Use 2 osds to create cluster but health check display "active+degraded"

2014-10-29 Thread Mark Kirkwood

That is not my experience:

$ ceph -v
ceph version 0.86-579-g06a73c3 (06a73c39169f2f332dec760f56d3ec20455b1646)

$ cat /etc/ceph/ceph.conf
[global]
...
osd pool default size = 2

$ ceph osd dump|grep size
pool 2 'hot' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 128 pgp_num 128 last_change 47 flags 
hashpspool,incomplete_clones tier_of 1 cache_mode writeback target_bytes 
20 hit_set bloom{false_positive_probability: 0.05, target_size: 
0, seed: 0} 3600s x1 stripe_width 0
pool 10 '.rgw.root' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 102 owner 
18446744073709551615 flags hashpspool stripe_width 0
pool 11 '.rgw.control' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 104 owner 
18446744073709551615 flags hashpspool stripe_width 0
pool 12 '.rgw' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 8 pgp_num 8 last_change 106 owner 18446744073709551615 
flags hashpspool stripe_width 0
pool 13 '.rgw.gc' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 107 owner 
18446744073709551615 flags hashpspool stripe_width 0
pool 14 '.users.uid' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 108 owner 
18446744073709551615 flags hashpspool stripe_width 0
pool 15 '.rgw.buckets.index' replicated size 2 min_size 1 crush_ruleset 
0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 110 owner 
18446744073709551615 flags hashpspool stripe_width 0
pool 16 '.rgw.buckets' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 112 owner 
18446744073709551615 flags hashpspool stripe_width 0
pool 17 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 1024 pgp_num 1024 last_change 186 flags hashpspool 
stripe_width 0







On 29/10/14 21:46, Irek Fasikhov wrote:

Hi.
This parameter does not apply to pools by default.
ceph osd dump | grep pool. see size=?


2014-10-29 11:40 GMT+03:00 Vickie CH mailto:mika.leaf...@gmail.com>>:

Der Irek:

Thanks for your reply.
Even already set "osd_pool_default_size = 2" the cluster still need
3 different hosts right?
Is this default number can be changed by user and write into
ceph.conf before deploy?


Best wishes,
Mika

2014-10-29 16:29 GMT+08:00 Irek Fasikhov mailto:malm...@gmail.com>>:

Hi.

Because the disc requires three different hosts, the default
number of replications 3.

2014-10-29 10:56 GMT+03:00 Vickie CH mailto:mika.leaf...@gmail.com>>:

Hi all,
   Try to use two OSDs to create a cluster. After the
deply finished, I found the health status is "88
active+degraded" "104 active+remapped". Before use 2 osds to
create cluster the result is ok. I'm confuse why this
situation happened. Do I need to set crush map to fix this
problem?


--ceph.conf-
[global]
fsid = c404ded6-4086-4f0b-b479-89bc018af954
mon_initial_members = storage0
mon_host = 192.168.1.10
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 2
osd_pool_default_min_size = 1
osd_pool_default_pg_num = 128
osd_journal_size = 2048
osd_pool_default_pgp_num = 128
osd_mkfs_type = xfs
-

---ceph -s---
cluster c404ded6-4086-4f0b-b479-89bc018af954
  health HEALTH_WARN 88 pgs degraded; 192 pgs stuck unclean
  monmap e1: 1 mons at {storage0=192.168.10.10:6789/0
}, election epoch 2, quorum 0
storage0
  osdmap e20: 2 osds: 2 up, 2 in
   pgmap v45: 192 pgs, 3 pools, 0 bytes data, 0 objects
 79752 kB used, 1858 GB / 1858 GB avail
   88 active+degraded
  104 active+remapped



Best wishes,
Mika

___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757





--
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757


___
ceph-users mailing l

Re: [ceph-users] Use 2 osds to create cluster but health check display "active+degraded"

2014-10-29 Thread Irek Fasikhov
Mark.
I meant that the existing pools, this parameter is not used.
I'm sure he pools DATA, METADATA, RDB(They are created by default) have
size = 3.

2014-10-29 11:56 GMT+03:00 Mark Kirkwood :

> That is not my experience:
>
> $ ceph -v
> ceph version 0.86-579-g06a73c3 (06a73c39169f2f332dec760f56d3ec20455b1646)
>
> $ cat /etc/ceph/ceph.conf
> [global]
> ...
> osd pool default size = 2
>
> $ ceph osd dump|grep size
> pool 2 'hot' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 128 pgp_num 128 last_change 47 flags
> hashpspool,incomplete_clones tier_of 1 cache_mode writeback target_bytes
> 20 hit_set bloom{false_positive_probability: 0.05, target_size:
> 0, seed: 0} 3600s x1 stripe_width 0
> pool 10 '.rgw.root' replicated size 2 min_size 1 crush_ruleset 0
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 102 owner
> 18446744073709551615 flags hashpspool stripe_width 0
> pool 11 '.rgw.control' replicated size 2 min_size 1 crush_ruleset 0
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 104 owner
> 18446744073709551615 flags hashpspool stripe_width 0
> pool 12 '.rgw' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 8 pgp_num 8 last_change 106 owner 18446744073709551615
> flags hashpspool stripe_width 0
> pool 13 '.rgw.gc' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 8 pgp_num 8 last_change 107 owner 18446744073709551615
> flags hashpspool stripe_width 0
> pool 14 '.users.uid' replicated size 2 min_size 1 crush_ruleset 0
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 108 owner
> 18446744073709551615 flags hashpspool stripe_width 0
> pool 15 '.rgw.buckets.index' replicated size 2 min_size 1 crush_ruleset 0
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 110 owner
> 18446744073709551615 flags hashpspool stripe_width 0
> pool 16 '.rgw.buckets' replicated size 2 min_size 1 crush_ruleset 0
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 112 owner
> 18446744073709551615 flags hashpspool stripe_width 0
> pool 17 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 1024 pgp_num 1024 last_change 186 flags hashpspool
> stripe_width 0
>
>
>
>
>
>
> On 29/10/14 21:46, Irek Fasikhov wrote:
>
>> Hi.
>> This parameter does not apply to pools by default.
>> ceph osd dump | grep pool. see size=?
>>
>>
>> 2014-10-29 11:40 GMT+03:00 Vickie CH > >:
>>
>> Der Irek:
>>
>> Thanks for your reply.
>> Even already set "osd_pool_default_size = 2" the cluster still need
>> 3 different hosts right?
>> Is this default number can be changed by user and write into
>> ceph.conf before deploy?
>>
>>
>> Best wishes,
>> Mika
>>
>> 2014-10-29 16:29 GMT+08:00 Irek Fasikhov > >:
>>
>> Hi.
>>
>> Because the disc requires three different hosts, the default
>> number of replications 3.
>>
>> 2014-10-29 10:56 GMT+03:00 Vickie CH > >:
>>
>>
>> Hi all,
>>Try to use two OSDs to create a cluster. After the
>> deply finished, I found the health status is "88
>> active+degraded" "104 active+remapped". Before use 2 osds to
>> create cluster the result is ok. I'm confuse why this
>> situation happened. Do I need to set crush map to fix this
>> problem?
>>
>>
>> --ceph.conf-
>> [global]
>> fsid = c404ded6-4086-4f0b-b479-89bc018af954
>> mon_initial_members = storage0
>> mon_host = 192.168.1.10
>> auth_cluster_required = cephx
>> auth_service_required = cephx
>> auth_client_required = cephx
>> filestore_xattr_use_omap = true
>> osd_pool_default_size = 2
>> osd_pool_default_min_size = 1
>> osd_pool_default_pg_num = 128
>> osd_journal_size = 2048
>> osd_pool_default_pgp_num = 128
>> osd_mkfs_type = xfs
>> -
>>
>> ---ceph -s---
>> cluster c404ded6-4086-4f0b-b479-89bc018af954
>>   health HEALTH_WARN 88 pgs degraded; 192 pgs stuck
>> unclean
>>   monmap e1: 1 mons at {storage0=192.168.10.10:6789/0
>> }, election epoch 2, quorum 0
>> storage0
>>   osdmap e20: 2 osds: 2 up, 2 in
>>pgmap v45: 192 pgs, 3 pools, 0 bytes data, 0 objects
>>  79752 kB used, 1858 GB / 1858 GB avail
>>88 active+degraded
>>   104 active+remapped
>> 
>>
>>
>> Best wishes,
>

Re: [ceph-users] Use 2 osds to create cluster but health check display "active+degraded"

2014-10-29 Thread Vickie CH
Dear all,
Thanks for the reply.
Pool replicated size is 2. Because the replicated size parameter already
write into ceph.conf before deploy.
Because not familiar crush map.  I will according Mark's information to do
a test that change the crush map to see the result.

---ceph.conf--
[global]
fsid = c404ded6-4086-4f0b-b479-
89bc018af954
mon_initial_members = storage0
mon_host = 192.168.1.10
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true

*osd_pool_default_size = 2osd_pool_default_min_size = 1*
osd_pool_default_pg_num = 128
osd_journal_size = 2048
osd_pool_default_pgp_num = 128
osd_mkfs_type = xfs
---

--ceph osd dump result -
pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 14 flags hashpspool
crash_replay_interval 45 stripe_width 0
pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 15 flags hashpspool stripe_width 0
pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 16 flags hashpspool stripe_width 0
max_osd 2
--

Best wishes,
Mika

Best wishes,
Mika

2014-10-29 16:56 GMT+08:00 Mark Kirkwood :

> That is not my experience:
>
> $ ceph -v
> ceph version 0.86-579-g06a73c3 (06a73c39169f2f332dec760f56d3ec20455b1646)
>
> $ cat /etc/ceph/ceph.conf
> [global]
> ...
> osd pool default size = 2
>
> $ ceph osd dump|grep size
> pool 2 'hot' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 128 pgp_num 128 last_change 47 flags
> hashpspool,incomplete_clones tier_of 1 cache_mode writeback target_bytes
> 20 hit_set bloom{false_positive_probability: 0.05, target_size:
> 0, seed: 0} 3600s x1 stripe_width 0
> pool 10 '.rgw.root' replicated size 2 min_size 1 crush_ruleset 0
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 102 owner
> 18446744073709551615 flags hashpspool stripe_width 0
> pool 11 '.rgw.control' replicated size 2 min_size 1 crush_ruleset 0
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 104 owner
> 18446744073709551615 flags hashpspool stripe_width 0
> pool 12 '.rgw' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 8 pgp_num 8 last_change 106 owner 18446744073709551615
> flags hashpspool stripe_width 0
> pool 13 '.rgw.gc' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 8 pgp_num 8 last_change 107 owner 18446744073709551615
> flags hashpspool stripe_width 0
> pool 14 '.users.uid' replicated size 2 min_size 1 crush_ruleset 0
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 108 owner
> 18446744073709551615 flags hashpspool stripe_width 0
> pool 15 '.rgw.buckets.index' replicated size 2 min_size 1 crush_ruleset 0
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 110 owner
> 18446744073709551615 flags hashpspool stripe_width 0
> pool 16 '.rgw.buckets' replicated size 2 min_size 1 crush_ruleset 0
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 112 owner
> 18446744073709551615 flags hashpspool stripe_width 0
> pool 17 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 1024 pgp_num 1024 last_change 186 flags hashpspool
> stripe_width 0
>
>
>
>
>
>
> On 29/10/14 21:46, Irek Fasikhov wrote:
>
>> Hi.
>> This parameter does not apply to pools by default.
>> ceph osd dump | grep pool. see size=?
>>
>>
>> 2014-10-29 11:40 GMT+03:00 Vickie CH > >:
>>
>> Der Irek:
>>
>> Thanks for your reply.
>> Even already set "osd_pool_default_size = 2" the cluster still need
>> 3 different hosts right?
>> Is this default number can be changed by user and write into
>> ceph.conf before deploy?
>>
>>
>> Best wishes,
>> Mika
>>
>> 2014-10-29 16:29 GMT+08:00 Irek Fasikhov > >:
>>
>> Hi.
>>
>> Because the disc requires three different hosts, the default
>> number of replications 3.
>>
>> 2014-10-29 10:56 GMT+03:00 Vickie CH > >:
>>
>>
>> Hi all,
>>Try to use two OSDs to create a cluster. After the
>> deply finished, I found the health status is "88
>> active+degraded" "104 active+remapped". Before use 2 osds to
>> create cluster the result is ok. I'm confuse why this
>> situation happened. Do I need to set crush map to fix this
>> problem?
>>
>>
>> --ceph.conf-
>> [global]
>> fsid = c404ded6-4086-4f0b-b479-89bc018af954
>> mon_initial_members = storage0
>> mon_host = 192.168.1.10
>> auth_cluster_

Re: [ceph-users] Use 2 osds to create cluster but health check display "active+degraded"

2014-10-29 Thread Irek Fasikhov
ceph osd tree please :)

2014-10-29 12:03 GMT+03:00 Vickie CH :

> Dear all,
> Thanks for the reply.
> Pool replicated size is 2. Because the replicated size parameter already
> write into ceph.conf before deploy.
> Because not familiar crush map.  I will according Mark's information to do
> a test that change the crush map to see the result.
>
> ---ceph.conf--
> [global]
> fsid = c404ded6-4086-4f0b-b479-
> 89bc018af954
> mon_initial_members = storage0
> mon_host = 192.168.1.10
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
>
> *osd_pool_default_size = 2osd_pool_default_min_size = 1*
> osd_pool_default_pg_num = 128
> osd_journal_size = 2048
> osd_pool_default_pgp_num = 128
> osd_mkfs_type = xfs
> ---
>
> --ceph osd dump result -
> pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 14 flags hashpspool
> crash_replay_interval 45 stripe_width 0
> pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 15 flags hashpspool stripe_width 0
> pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 16 flags hashpspool stripe_width 0
> max_osd 2
>
> --
>
> Best wishes,
> Mika
>
> Best wishes,
> Mika
>
> 2014-10-29 16:56 GMT+08:00 Mark Kirkwood :
>
>> That is not my experience:
>>
>> $ ceph -v
>> ceph version 0.86-579-g06a73c3 (06a73c39169f2f332dec760f56d3ec20455b1646)
>>
>> $ cat /etc/ceph/ceph.conf
>> [global]
>> ...
>> osd pool default size = 2
>>
>> $ ceph osd dump|grep size
>> pool 2 'hot' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 128 pgp_num 128 last_change 47 flags
>> hashpspool,incomplete_clones tier_of 1 cache_mode writeback target_bytes
>> 20 hit_set bloom{false_positive_probability: 0.05, target_size:
>> 0, seed: 0} 3600s x1 stripe_width 0
>> pool 10 '.rgw.root' replicated size 2 min_size 1 crush_ruleset 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 102 owner
>> 18446744073709551615 flags hashpspool stripe_width 0
>> pool 11 '.rgw.control' replicated size 2 min_size 1 crush_ruleset 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 104 owner
>> 18446744073709551615 flags hashpspool stripe_width 0
>> pool 12 '.rgw' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 8 pgp_num 8 last_change 106 owner 18446744073709551615
>> flags hashpspool stripe_width 0
>> pool 13 '.rgw.gc' replicated size 2 min_size 1 crush_ruleset 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 107 owner
>> 18446744073709551615 flags hashpspool stripe_width 0
>> pool 14 '.users.uid' replicated size 2 min_size 1 crush_ruleset 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 108 owner
>> 18446744073709551615 flags hashpspool stripe_width 0
>> pool 15 '.rgw.buckets.index' replicated size 2 min_size 1 crush_ruleset 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 110 owner
>> 18446744073709551615 flags hashpspool stripe_width 0
>> pool 16 '.rgw.buckets' replicated size 2 min_size 1 crush_ruleset 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 112 owner
>> 18446744073709551615 flags hashpspool stripe_width 0
>> pool 17 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 1024 pgp_num 1024 last_change 186 flags hashpspool
>> stripe_width 0
>>
>>
>>
>>
>>
>>
>> On 29/10/14 21:46, Irek Fasikhov wrote:
>>
>>> Hi.
>>> This parameter does not apply to pools by default.
>>> ceph osd dump | grep pool. see size=?
>>>
>>>
>>> 2014-10-29 11:40 GMT+03:00 Vickie CH >> >:
>>>
>>> Der Irek:
>>>
>>> Thanks for your reply.
>>> Even already set "osd_pool_default_size = 2" the cluster still need
>>> 3 different hosts right?
>>> Is this default number can be changed by user and write into
>>> ceph.conf before deploy?
>>>
>>>
>>> Best wishes,
>>> Mika
>>>
>>> 2014-10-29 16:29 GMT+08:00 Irek Fasikhov >> >:
>>>
>>> Hi.
>>>
>>> Because the disc requires three different hosts, the default
>>> number of replications 3.
>>>
>>> 2014-10-29 10:56 GMT+03:00 Vickie CH >> >:
>>>
>>>
>>> Hi all,
>>>Try to use two OSDs to create a cluster. After the
>>> deply finished, I found the health status is "88
>>> active+degraded" "104 active+remapped". Before use 2 osds to
>>> create cluster the result is ok. I'm confuse why this
>>> situation happened. Do I need to set crush map to fix this
>>> problem?
>>>
>>>
>>> --c

[ceph-users] Fwd: Error zapping the disk

2014-10-29 Thread Sakhi Hadebe

Hi Support, 

Can someone please help me with the below error so I can proceed with my 
cluster installation. It has taken a week now not knowing how to carry on. 




Regards,
Sakhi Hadebe
Engineer: South African National Research Network (SANReN)Competency Area, 
Meraka, CSIR

Tel:   +27 12 841 2308 
Fax:   +27 12 841 4223 
Cell:  +27 71 331 9622 
Email: shad...@csir.co.za


>>> Sakhi Hadebe 10/22/2014 1:56 PM >>>

Hi,  


 I am building a three node cluster on debian7.7. I have a problem in zapping 
the disk of the very first node. 


ERROR: 
[ceph1][WARNIN] Error: Partition(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 
14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 
54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64 on /dev/sda3 have been written, but 
we have been unable to inform the kernel of the change, probably because 
it/they are in use.  As a result, the old partition(s) will remain in use.  You 
should reboot now before making further changes. 
[ceph1][ERROR ] RuntimeError: command returned non-zero exit status: 1 
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: partprobe 
/dev/sda3 



Please help. 

Regards,
Sakhi Hadebe
Engineer: South African National Research Network (SANReN)Competency Area, 
Meraka, CSIR

Tel:   +27 12 841 2308 
Fax:   +27 12 841 4223 
Cell:  +27 71 331 9622 
Email: shad...@csir.co.za



-- 
This message is subject to the CSIR's copyright terms and conditions, e-mail 
legal notice, and implemented Open Document Format (ODF) standard. 
The full disclaimer details can be found at 
http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner, 
and is believed to be clean.

Please consider the environment before printing this email.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Use 2 osds to create cluster but health check display "active+degraded"

2014-10-29 Thread Vickie CH
Hi:
-ceph osd
tree---
# idweight  type name   up/down reweight
-1  1.82root default
-2  1.82host storage1
0   0.91osd.0   up  1
1   0.91osd.1   up  1

Best wishes,
Mika

2014-10-29 17:05 GMT+08:00 Irek Fasikhov :

> ceph osd tree please :)
>
> 2014-10-29 12:03 GMT+03:00 Vickie CH :
>
>> Dear all,
>> Thanks for the reply.
>> Pool replicated size is 2. Because the replicated size parameter already
>> write into ceph.conf before deploy.
>> Because not familiar crush map.  I will according Mark's information to
>> do a test that change the crush map to see the result.
>>
>> ---ceph.conf--
>> [global]
>> fsid = c404ded6-4086-4f0b-b479-
>> 89bc018af954
>> mon_initial_members = storage0
>> mon_host = 192.168.1.10
>> auth_cluster_required = cephx
>> auth_service_required = cephx
>> auth_client_required = cephx
>> filestore_xattr_use_omap = true
>>
>> *osd_pool_default_size = 2osd_pool_default_min_size = 1*
>> osd_pool_default_pg_num = 128
>> osd_journal_size = 2048
>> osd_pool_default_pgp_num = 128
>> osd_mkfs_type = xfs
>> ---
>>
>> --ceph osd dump result -
>> pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 14 flags hashpspool
>> crash_replay_interval 45 stripe_width 0
>> pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 0
>> object_hash rjenkins pg_num 64 pgp_num 64 last_change 15 flags hashpspool
>> stripe_width 0
>> pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 16 flags hashpspool stripe_width 0
>> max_osd 2
>>
>> --
>>
>> Best wishes,
>> Mika
>>
>> Best wishes,
>> Mika
>>
>> 2014-10-29 16:56 GMT+08:00 Mark Kirkwood :
>>
>>> That is not my experience:
>>>
>>> $ ceph -v
>>> ceph version 0.86-579-g06a73c3 (06a73c39169f2f332dec760f56d3ec
>>> 20455b1646)
>>>
>>> $ cat /etc/ceph/ceph.conf
>>> [global]
>>> ...
>>> osd pool default size = 2
>>>
>>> $ ceph osd dump|grep size
>>> pool 2 'hot' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>>> rjenkins pg_num 128 pgp_num 128 last_change 47 flags
>>> hashpspool,incomplete_clones tier_of 1 cache_mode writeback target_bytes
>>> 20 hit_set bloom{false_positive_probability: 0.05, target_size:
>>> 0, seed: 0} 3600s x1 stripe_width 0
>>> pool 10 '.rgw.root' replicated size 2 min_size 1 crush_ruleset 0
>>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 102 owner
>>> 18446744073709551615 flags hashpspool stripe_width 0
>>> pool 11 '.rgw.control' replicated size 2 min_size 1 crush_ruleset 0
>>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 104 owner
>>> 18446744073709551615 flags hashpspool stripe_width 0
>>> pool 12 '.rgw' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>>> rjenkins pg_num 8 pgp_num 8 last_change 106 owner 18446744073709551615
>>> flags hashpspool stripe_width 0
>>> pool 13 '.rgw.gc' replicated size 2 min_size 1 crush_ruleset 0
>>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 107 owner
>>> 18446744073709551615 flags hashpspool stripe_width 0
>>> pool 14 '.users.uid' replicated size 2 min_size 1 crush_ruleset 0
>>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 108 owner
>>> 18446744073709551615 flags hashpspool stripe_width 0
>>> pool 15 '.rgw.buckets.index' replicated size 2 min_size 1 crush_ruleset
>>> 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 110 owner
>>> 18446744073709551615 flags hashpspool stripe_width 0
>>> pool 16 '.rgw.buckets' replicated size 2 min_size 1 crush_ruleset 0
>>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 112 owner
>>> 18446744073709551615 flags hashpspool stripe_width 0
>>> pool 17 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>>> rjenkins pg_num 1024 pgp_num 1024 last_change 186 flags hashpspool
>>> stripe_width 0
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 29/10/14 21:46, Irek Fasikhov wrote:
>>>
 Hi.
 This parameter does not apply to pools by default.
 ceph osd dump | grep pool. see size=?


 2014-10-29 11:40 GMT+03:00 Vickie CH >>> >:

 Der Irek:

 Thanks for your reply.
 Even already set "osd_pool_default_size = 2" the cluster still need
 3 different hosts right?
 Is this default number can be changed by user and write into
 ceph.conf before deploy?


 Best wishes,
 Mika

 2014-10-29 16:29 GMT+08:00 Irek Fasikhov >>> >:

 Hi.

 Because the disc requires three different hosts, the default
 number of replications 3.

 2014-10-29 10:56 GMT+03:00 Vickie CH 

Re: [ceph-users] Fwd: Error zapping the disk

2014-10-29 Thread Vickie CH
Hi Sakhi:
I got this problem before. Host OS is Ubuntu 14.04 3.13.0-24-generic.
In the end I use fdisk /dev/sdX delete all partition and reboot. Maybe you
can try.

Best wishes,
Mika

2014-10-29 17:13 GMT+08:00 Sakhi Hadebe :

>  Hi Support,
>
>  Can someone please help me with the below error so I can proceed with my
> cluster installation. It has taken a week now not knowing how to carry on.
>
>
>
>
> Regards,
> Sakhi Hadebe
> Engineer: South African National Research Network (SANReN)Competency Area,
> Meraka, CSIR
>
> Tel:   +27 12 841 2308
> Fax:   +27 12 841 4223
> Cell:  +27 71 331 9622
> Email: shad...@csir.co.za
>
>
> >>> Sakhi Hadebe 10/22/2014 1:56 PM >>>
>
> Hi,
>
>
>   I am building a three node cluster on debian7.7. I have a problem in
> zapping the disk of the very first node.
>
>
>  ERROR:
>
> [ceph1][WARNIN] Error: Partition(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
> 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
> 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
> 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64 on /dev/sda3 have
> been written, but we have been unable to inform the kernel of the change,
> probably because it/they are in use.  As a result, the old partition(s)
> will remain in use.  You should reboot now before making further changes.
>
> [ceph1][ERROR ] RuntimeError: command returned non-zero exit status: 1
>
> [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: partprobe
> /dev/sda3
>
>
>
>  Please help.
>
>
> Regards,
> Sakhi Hadebe
> Engineer: South African National Research Network (SANReN)Competency Area,
> Meraka, CSIR
>
> Tel:   +27 12 841 2308
> Fax:   +27 12 841 4223
> Cell:  +27 71 331 9622
> Email: shad...@csir.co.za
>
>
>
> --
> This message is subject to the CSIR's copyright terms and conditions,
> e-mail legal notice, and implemented Open Document Format (ODF) standard.
> The full disclaimer details can be found at
> http://www.csir.co.za/disclaimer.html.
>
>
> This message has been scanned for viruses and dangerous content by
> *MailScanner* ,
> and is believed to be clean.
>
>
> Please consider the environment before printing this email.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Continuous OSD crash with kv backend (firefly)

2014-10-29 Thread Andrey Korolyov
Hi Haomai, all.

Today after unexpected power failure one of kv stores (placed on ext4
with default mount options) refused to work. I think that it may be
interesting to revive it because it is almost first time among
hundreds of power failures (and their simulations) when data store got
broken.

Strace:
http://xdel.ru/downloads/osd1.strace.gz

Debug output with 20-everything level:
http://xdel.ru/downloads/osd1.out
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: Error zapping the disk

2014-10-29 Thread Sakhi Hadebe
Hi Mika,
 
I am not clear. Should I delete all the partitions from all host or
from the concerned host?
 
What do you mean by /de/sdX as my partitions are /dev/sda{1,2,3}?
 
 
 

Regards,
Sakhi Hadebe
Engineer: South African National Research Network (SANReN)Competency
Area, Meraka, CSIR
 
Tel:   +27 12 841 2308
( tel:+27128414213) Fax:   +27 12 841 4223
( tel:+27128414223) Cell:  +27 71 331 9622
( tel:+27823034657) Email: shad...@csir.co.za

>>> Vickie CH  29/10/2014 11:24 >>>
Hi Sakhi:
I got this problem before. Host OS is Ubuntu 14.04 3.13.0-24-generic.
In the end I use fdisk /dev/sdX delete all partition and reboot. Maybe
you can try. 

Best wishes,
Mika

2014-10-29 17:13 GMT+08:00 Sakhi Hadebe :



Hi Support, 

Can someone please help me with the below error so I can proceed with
my cluster installation. It has taken a week now not knowing how to
carry on. 




Regards, 
Sakhi Hadebe 
Engineer: South African National Research Network (SANReN)Competency
Area, Meraka, CSIR 

Tel: +27 12 841 2308 
Fax: +27 12 841 4223 
Cell: +27 71 331 9622 
Email: shad...@csir.co.za 


>>> Sakhi Hadebe 10/22/2014 1:56 PM >>>

Hi, 


I am building a three node cluster on debian7.7. I have a problem in
zapping the disk of the very first node. 


ERROR: 
[ceph1][WARNIN] Error: Partition(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64 on
/dev/sda3 have been written, but we have been unable to inform the
kernel of the change, probably because it/they are in use. As a result,
the old partition(s) will remain in use. You should reboot now before
making further changes. 
[ceph1][ERROR ] RuntimeError: command returned non-zero exit status: 1

[ceph_deploy][ERROR ] RuntimeError: Failed to execute command:
partprobe /dev/sda3 



Please help. 

Regards,
Sakhi Hadebe
Engineer: South African National Research Network (SANReN)Competency
Area, Meraka, CSIR

Tel: +27 12 841 2308 
Fax: +27 12 841 4223 
Cell: +27 71 331 9622 
Email: shad...@csir.co.za



-- 
This message is subject to the CSIR's copyright terms and conditions,
e-mail legal notice, and implemented Open Document Format (ODF)
standard. 
The full disclaimer details can be found at
http://www.csir.co.za/disclaimer.html. 

This message has been scanned for viruses and dangerous content by
MailScanner
( http://www.mailscanner.info/) , 
and is believed to be clean. 

Please consider the environment before printing this email. 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
This message is subject to the CSIR's copyright terms and conditions,
e-mail legal notice, and implemented Open Document Format (ODF)
standard. 
The full disclaimer details can be found at
http://www.csir.co.za/disclaimer.html. 

This message has been scanned for viruses and dangerous content by
MailScanner
( http://www.mailscanner.info/) , 
and is believed to be clean. 

Please consider the environment before printing this email. 

-- 
This message is subject to the CSIR's copyright terms and conditions, e-mail 
legal notice, and implemented Open Document Format (ODF) standard. 
The full disclaimer details can be found at 
http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner, 
and is believed to be clean.

Please consider the environment before printing this email.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] use ZFS for OSDs

2014-10-29 Thread Kenneth Waegeman

Hi,

We are looking to use ZFS for our OSD backend, but I have some questions.

My main question is: Does Ceph already supports the writeparallel mode  
for ZFS ? (as described here:  
http://www.sebastien-han.fr/blog/2013/12/02/ceph-performance-interesting-things-going-on/)
I've found this, but I suppose it is outdated:  
https://wiki.ceph.com/Planning/Blueprints/Emperor/osd%3A_ceph_on_zfs


Should Ceph be build with ZFS support? I found a --with-zfslib option  
somewhere, but can someone verify this, or better has instructions for  
it?:-)


What parameters should be tuned to use this?
I found these :
filestore zfs_snap = 1
journal_aio = 0
journal_dio = 0

Are there other things we need for it?

Many thanks!!
Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Use 2 osds to create cluster but health check display "active+degraded"

2014-10-29 Thread Mark Kirkwood
Righty, both osd are on the same host, so you will need to amend the 
default crush rule. It will look something like:


rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host   <=== ah! host!
step emit

So you will need to change host to osd.

See http://ceph.com/docs/master/rados/operations/crush-map/ for a 
discussion of what/how on this front!


Regards

Mark

On 29/10/14 22:19, Vickie CH wrote:

Hi:
-ceph osd
tree---
# idweight  type name   up/down reweight
-1  1.82root default
-2  1.82host storage1
0   0.91osd.0   up  1
1   0.91osd.1   up  1



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Continuous OSD crash with kv backend (firefly)

2014-10-29 Thread Haomai Wang
Thanks for Andrey,

The attachment OSD.1's log is only these lines? I really can't find
the detail infos from it?

Maybe you need to improve debug_osd to 20/20?

On Wed, Oct 29, 2014 at 5:25 PM, Andrey Korolyov  wrote:
> Hi Haomai, all.
>
> Today after unexpected power failure one of kv stores (placed on ext4
> with default mount options) refused to work. I think that it may be
> interesting to revive it because it is almost first time among
> hundreds of power failures (and their simulations) when data store got
> broken.
>
> Strace:
> http://xdel.ru/downloads/osd1.strace.gz
>
> Debug output with 20-everything level:
> http://xdel.ru/downloads/osd1.out



-- 
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Use 2 osds to create cluster but health check display "active+degraded"

2014-10-29 Thread Vickie CH
Hi all,
Thanks for you all.
Like Mark's information this problem is releate to CRUSH Map.
After create 2 OSDs on 2 different host, healthy check is OK.
Appreciate the information again~

Best wishes,
Mika

2014-10-29 17:19 GMT+08:00 Vickie CH :

> Hi:
> -ceph osd
> tree---
> # idweight  type name   up/down reweight
> -1  1.82root default
> -2  1.82host storage1
> 0   0.91osd.0   up  1
> 1   0.91osd.1   up  1
>
> Best wishes,
> Mika
>
> 2014-10-29 17:05 GMT+08:00 Irek Fasikhov :
>
>> ceph osd tree please :)
>>
>> 2014-10-29 12:03 GMT+03:00 Vickie CH :
>>
>>> Dear all,
>>> Thanks for the reply.
>>> Pool replicated size is 2. Because the replicated size parameter already
>>> write into ceph.conf before deploy.
>>> Because not familiar crush map.  I will according Mark's information to
>>> do a test that change the crush map to see the result.
>>>
>>> ---ceph.conf--
>>> [global]
>>> fsid = c404ded6-4086-4f0b-b479-
>>> 89bc018af954
>>> mon_initial_members = storage0
>>> mon_host = 192.168.1.10
>>> auth_cluster_required = cephx
>>> auth_service_required = cephx
>>> auth_client_required = cephx
>>> filestore_xattr_use_omap = true
>>>
>>> *osd_pool_default_size = 2osd_pool_default_min_size = 1*
>>> osd_pool_default_pg_num = 128
>>> osd_journal_size = 2048
>>> osd_pool_default_pgp_num = 128
>>> osd_mkfs_type = xfs
>>> ---
>>>
>>> --ceph osd dump result -
>>> pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>>> rjenkins pg_num 64 pgp_num 64 last_change 14 flags hashpspool
>>> crash_replay_interval 45 stripe_width 0
>>> pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 0
>>> object_hash rjenkins pg_num 64 pgp_num 64 last_change 15 flags hashpspool
>>> stripe_width 0
>>> pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>>> rjenkins pg_num 64 pgp_num 64 last_change 16 flags hashpspool stripe_width 0
>>> max_osd 2
>>>
>>> --
>>>
>>> Best wishes,
>>> Mika
>>>
>>> Best wishes,
>>> Mika
>>>
>>> 2014-10-29 16:56 GMT+08:00 Mark Kirkwood 
>>> :
>>>
 That is not my experience:

 $ ceph -v
 ceph version 0.86-579-g06a73c3 (06a73c39169f2f332dec760f56d3ec
 20455b1646)

 $ cat /etc/ceph/ceph.conf
 [global]
 ...
 osd pool default size = 2

 $ ceph osd dump|grep size
 pool 2 'hot' replicated size 2 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 128 pgp_num 128 last_change 47 flags
 hashpspool,incomplete_clones tier_of 1 cache_mode writeback target_bytes
 20 hit_set bloom{false_positive_probability: 0.05,
 target_size: 0, seed: 0} 3600s x1 stripe_width 0
 pool 10 '.rgw.root' replicated size 2 min_size 1 crush_ruleset 0
 object_hash rjenkins pg_num 8 pgp_num 8 last_change 102 owner
 18446744073709551615 flags hashpspool stripe_width 0
 pool 11 '.rgw.control' replicated size 2 min_size 1 crush_ruleset 0
 object_hash rjenkins pg_num 8 pgp_num 8 last_change 104 owner
 18446744073709551615 flags hashpspool stripe_width 0
 pool 12 '.rgw' replicated size 2 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 8 pgp_num 8 last_change 106 owner 18446744073709551615
 flags hashpspool stripe_width 0
 pool 13 '.rgw.gc' replicated size 2 min_size 1 crush_ruleset 0
 object_hash rjenkins pg_num 8 pgp_num 8 last_change 107 owner
 18446744073709551615 flags hashpspool stripe_width 0
 pool 14 '.users.uid' replicated size 2 min_size 1 crush_ruleset 0
 object_hash rjenkins pg_num 8 pgp_num 8 last_change 108 owner
 18446744073709551615 flags hashpspool stripe_width 0
 pool 15 '.rgw.buckets.index' replicated size 2 min_size 1 crush_ruleset
 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 110 owner
 18446744073709551615 flags hashpspool stripe_width 0
 pool 16 '.rgw.buckets' replicated size 2 min_size 1 crush_ruleset 0
 object_hash rjenkins pg_num 8 pgp_num 8 last_change 112 owner
 18446744073709551615 flags hashpspool stripe_width 0
 pool 17 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 1024 pgp_num 1024 last_change 186 flags hashpspool
 stripe_width 0






 On 29/10/14 21:46, Irek Fasikhov wrote:

> Hi.
> This parameter does not apply to pools by default.
> ceph osd dump | grep pool. see size=?
>
>
> 2014-10-29 11:40 GMT+03:00 Vickie CH  >:
>
> Der Irek:
>
> Thanks for your reply.
> Even already set "osd_pool_default_size = 2" the cluster still need
> 3 different hosts right?
> Is this default number can be changed b

Re: [ceph-users] Continuous OSD crash with kv backend (firefly)

2014-10-29 Thread Andrey Korolyov
On Wed, Oct 29, 2014 at 1:11 PM, Haomai Wang  wrote:
> Thanks for Andrey,
>
> The attachment OSD.1's log is only these lines? I really can't find
> the detail infos from it?
>
> Maybe you need to improve debug_osd to 20/20?
>
> On Wed, Oct 29, 2014 at 5:25 PM, Andrey Korolyov  wrote:
>> Hi Haomai, all.
>>
>> Today after unexpected power failure one of kv stores (placed on ext4
>> with default mount options) refused to work. I think that it may be
>> interesting to revive it because it is almost first time among
>> hundreds of power failures (and their simulations) when data store got
>> broken.
>>
>> Strace:
>> http://xdel.ru/downloads/osd1.strace.gz
>>
>> Debug output with 20-everything level:
>> http://xdel.ru/downloads/osd1.out
>
>
>
> --
> Best Regards,
>
> Wheat


Unfortunately that`s all I`ve got. Updated osd1.out to show an actual
cli args and entire output - it ends abruptly without last newline and
without any valuable output.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Continuous OSD crash with kv backend (firefly)

2014-10-29 Thread Haomai Wang
Thanks!

You mean osd.1 exited abrptly without ceph callback trace?
Anyone has some ideas about this log? @sage @gregory


On Wed, Oct 29, 2014 at 6:19 PM, Andrey Korolyov  wrote:
> On Wed, Oct 29, 2014 at 1:11 PM, Haomai Wang  wrote:
>> Thanks for Andrey,
>>
>> The attachment OSD.1's log is only these lines? I really can't find
>> the detail infos from it?
>>
>> Maybe you need to improve debug_osd to 20/20?
>>
>> On Wed, Oct 29, 2014 at 5:25 PM, Andrey Korolyov  wrote:
>>> Hi Haomai, all.
>>>
>>> Today after unexpected power failure one of kv stores (placed on ext4
>>> with default mount options) refused to work. I think that it may be
>>> interesting to revive it because it is almost first time among
>>> hundreds of power failures (and their simulations) when data store got
>>> broken.
>>>
>>> Strace:
>>> http://xdel.ru/downloads/osd1.strace.gz
>>>
>>> Debug output with 20-everything level:
>>> http://xdel.ru/downloads/osd1.out
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>
>
> Unfortunately that`s all I`ve got. Updated osd1.out to show an actual
> cli args and entire output - it ends abruptly without last newline and
> without any valuable output.



-- 
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RHEL6.6 upgrade (selinux-policy-targeted) triggers slow requests

2014-10-29 Thread Dan Van Der Ster
Hi RHEL/CentOS users,
This is just a heads up that we observe slow requests during the RHEL6.6 
upgrade. The upgrade includes selinux-policy-targeted, which runs this during 
the update:

   /sbin/restorecon -i -f - -R -p -e /sys -e /proc -e /dev -e /mnt -e /var/tmp 
-e /home -e /tmp -e /dev
   
restorecon is scanning every single file on the OSDs, e.g. from strace:

...
lstat("rbd\\udata.1b9d8d42be29bd3.0003e430__head_052DF076__4", 
{st_mode=S_IFREG|0644, st_size=4194304, ...}) = 0
lstat("rbd\\udata.1c2064583a15ea.000a8553__head_4B4DF076__4", 
{st_mode=S_IFREG|0644, st_size=4194304, ...}) = 0
lstat("rbd\\udata.1c20d893e777ea0.0007ee23__head_2FDDF076__4", 
{st_mode=S_IFREG|0644, st_size=4194304, ...}) = 0
lstat("rbd\\udata.1e02d691ddaefb.437c__head_1FADF076__4", 
{st_mode=S_IFREG|0644, st_size=4194304, ...}) = 0
...

and it is using a default (be/4) io priority:

65567 be/4 root  768.61 K/s0.00 B/s  0.00 %  0.00 % restorecon -i -f - 
-R -p -e /sys -e /proc -e /dev -e /mnt -e /var/tmp -e /home -e /tmp -e /dev

I’m going to submit a ticket about this in case our RedHat friends want to 
follow up.

Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Continuous OSD crash with kv backend (firefly)

2014-10-29 Thread Andrey Korolyov
On Wed, Oct 29, 2014 at 1:28 PM, Haomai Wang  wrote:
> Thanks!
>
> You mean osd.1 exited abrptly without ceph callback trace?
> Anyone has some ideas about this log? @sage @gregory
>
>
> On Wed, Oct 29, 2014 at 6:19 PM, Andrey Korolyov  wrote:
>> On Wed, Oct 29, 2014 at 1:11 PM, Haomai Wang  wrote:
>>> Thanks for Andrey,
>>>
>>> The attachment OSD.1's log is only these lines? I really can't find
>>> the detail infos from it?
>>>
>>> Maybe you need to improve debug_osd to 20/20?
>>>
>>> On Wed, Oct 29, 2014 at 5:25 PM, Andrey Korolyov  wrote:
 Hi Haomai, all.

 Today after unexpected power failure one of kv stores (placed on ext4
 with default mount options) refused to work. I think that it may be
 interesting to revive it because it is almost first time among
 hundreds of power failures (and their simulations) when data store got
 broken.

 Strace:
 http://xdel.ru/downloads/osd1.strace.gz

 Debug output with 20-everything level:
 http://xdel.ru/downloads/osd1.out
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>>
>>> Wheat
>>
>>
>> Unfortunately that`s all I`ve got. Updated osd1.out to show an actual
>> cli args and entire output - it ends abruptly without last newline and
>> without any valuable output.
>
>
>
> --
> Best Regards,
>
> Wheat


With log-file specified, it adds just following line at very end:

2014-10-29 13:29:57.437776 7ffa562c9840 -1  ** ERROR: osd init failed:
(22) Invalid argument

the stdout printing seems a bit broken and do not print this at all
(and store output part is definitely is not detailed enough to make
any conclusions, and even file a bug). CCing Sage/Greg.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Continuous OSD crash with kv backend (firefly)

2014-10-29 Thread Haomai Wang
maybe you can run it directly with debug_osd=20/20 and get ending logs
ceph-osd -i 1 -c /etc/ceph/ceph.conf -f

On Wed, Oct 29, 2014 at 6:34 PM, Andrey Korolyov  wrote:
> On Wed, Oct 29, 2014 at 1:28 PM, Haomai Wang  wrote:
>> Thanks!
>>
>> You mean osd.1 exited abrptly without ceph callback trace?
>> Anyone has some ideas about this log? @sage @gregory
>>
>>
>> On Wed, Oct 29, 2014 at 6:19 PM, Andrey Korolyov  wrote:
>>> On Wed, Oct 29, 2014 at 1:11 PM, Haomai Wang  wrote:
 Thanks for Andrey,

 The attachment OSD.1's log is only these lines? I really can't find
 the detail infos from it?

 Maybe you need to improve debug_osd to 20/20?

 On Wed, Oct 29, 2014 at 5:25 PM, Andrey Korolyov  wrote:
> Hi Haomai, all.
>
> Today after unexpected power failure one of kv stores (placed on ext4
> with default mount options) refused to work. I think that it may be
> interesting to revive it because it is almost first time among
> hundreds of power failures (and their simulations) when data store got
> broken.
>
> Strace:
> http://xdel.ru/downloads/osd1.strace.gz
>
> Debug output with 20-everything level:
> http://xdel.ru/downloads/osd1.out



 --
 Best Regards,

 Wheat
>>>
>>>
>>> Unfortunately that`s all I`ve got. Updated osd1.out to show an actual
>>> cli args and entire output - it ends abruptly without last newline and
>>> without any valuable output.
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>
>
> With log-file specified, it adds just following line at very end:
>
> 2014-10-29 13:29:57.437776 7ffa562c9840 -1  ** ERROR: osd init failed:
> (22) Invalid argument
>
> the stdout printing seems a bit broken and do not print this at all
> (and store output part is definitely is not detailed enough to make
> any conclusions, and even file a bug). CCing Sage/Greg.



-- 
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Continuous OSD crash with kv backend (firefly)

2014-10-29 Thread Andrey Korolyov
On Wed, Oct 29, 2014 at 1:37 PM, Haomai Wang  wrote:
> maybe you can run it directly with debug_osd=20/20 and get ending logs
> ceph-osd -i 1 -c /etc/ceph/ceph.conf -f
>
> On Wed, Oct 29, 2014 at 6:34 PM, Andrey Korolyov  wrote:
>> On Wed, Oct 29, 2014 at 1:28 PM, Haomai Wang  wrote:
>>> Thanks!
>>>
>>> You mean osd.1 exited abrptly without ceph callback trace?
>>> Anyone has some ideas about this log? @sage @gregory
>>>
>>>
>>> On Wed, Oct 29, 2014 at 6:19 PM, Andrey Korolyov  wrote:
 On Wed, Oct 29, 2014 at 1:11 PM, Haomai Wang  wrote:
> Thanks for Andrey,
>
> The attachment OSD.1's log is only these lines? I really can't find
> the detail infos from it?
>
> Maybe you need to improve debug_osd to 20/20?
>
> On Wed, Oct 29, 2014 at 5:25 PM, Andrey Korolyov  wrote:
>> Hi Haomai, all.
>>
>> Today after unexpected power failure one of kv stores (placed on ext4
>> with default mount options) refused to work. I think that it may be
>> interesting to revive it because it is almost first time among
>> hundreds of power failures (and their simulations) when data store got
>> broken.
>>
>> Strace:
>> http://xdel.ru/downloads/osd1.strace.gz
>>
>> Debug output with 20-everything level:
>> http://xdel.ru/downloads/osd1.out
>
>
>
> --
> Best Regards,
>
> Wheat


 Unfortunately that`s all I`ve got. Updated osd1.out to show an actual
 cli args and entire output - it ends abruptly without last newline and
 without any valuable output.
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>>
>>> Wheat
>>
>>
>> With log-file specified, it adds just following line at very end:
>>
>> 2014-10-29 13:29:57.437776 7ffa562c9840 -1  ** ERROR: osd init failed:
>> (22) Invalid argument
>>
>> the stdout printing seems a bit broken and do not print this at all
>> (and store output part is definitely is not detailed enough to make
>> any conclusions, and even file a bug). CCing Sage/Greg.
>
>
>
> --
> Best Regards,
>
> Wheat

-f does not print the last line to stderr either. Ok, it looks like
very minor separate bug, but I remembering its appearance long before,
so but as bug remains, probably it does not bother anyone - the stderr
output is less usual for debugging purposes.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Object Storage Statistics

2014-10-29 Thread M Ranga Swami Reddy
>There are two different statistics that are collected, one is the
>'usage' information that collects data about actual operations that
>clients do in a period of time. This information can be accessed
>through the admin api. The other one is the user stats info that is
>part of the user quota system, which at the moment is not hooked into
>a REST interface.

How to get the user stats info?

Thanks
Swami

On Sat, Oct 25, 2014 at 1:27 AM, Yehuda Sadeh  wrote:
> On Fri, Oct 24, 2014 at 8:17 AM, Dane Elwell  wrote:
>> Hi list,
>>
>> We're using the object storage in production and billing people based
>> on their usage, much like S3. We're also trying to produce things like
>> hourly bandwidth graphs for our clients.
>>
>> We're having some issues with the API not returning the correct
>> statistics. I can see that there is a --sync-stats option for the
>> command line radosgw-admin, but there doesn't appear to be anything
>> similar for the admin REST API. Is there an equivalent feature for the
>> API that hasn't been documented by chance?
>>
>
> There are two different statistics that are collected, one is the
> 'usage' information that collects data about actual operations that
> clients do in a period of time. This information can be accessed
> through the admin api. The other one is the user stats info that is
> part of the user quota system, which at the moment is not hooked into
> a REST interface.
>
> Yehuda
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD process exhausting server memory

2014-10-29 Thread Lukáš Kubín
Hello,
I've found my ceph v 0.80.3 cluster in a state with 5 of 34 OSDs being down
through night after months of running without change. From Linux logs I
found out the OSD processes were killed because they consumed all available
memory.

Those 5 failed OSDs were from different hosts of my 4-node cluster (see
below). Two hosts act as SSD cache tier in some of my pools. The other two
hosts are the default rotational drives storage.

After checking the Linux was not out of memory I've attempted to restart
those failed OSDs. Most of those OSD daemon exhaust all memory in seconds
and got killed by Linux again:

Oct 28 22:16:34 q07 kernel: Out of memory: Kill process 24207 (ceph-osd)
score 867 or sacrifice child
Oct 28 22:16:34 q07 kernel: Killed process 24207, UID 0, (ceph-osd)
total-vm:59974412kB, anon-rss:59076880kB, file-rss:512kB


On the host I've found lots of similar "slow request" messages preceding
the crash:

2014-10-28 22:11:20.885527 7f25f84d1700  0 log [WRN] : slow request
31.117125 seconds old, received at 2014-10-28 22:10:49.768291:
osd_sub_op(client.168752.0:2197931 14.2c7
888596c7/rbd_data.293272f8695e4.006f/head//14 [] v 1551'377417
snapset=0=[]:[] snapc=0=[]) v10 currently no flag points reached
2014-10-28 22:11:21.885668 7f25f84d1700  0 log [WRN] : 67 slow requests, 1
included below; oldest blocked for > 9879.304770 secs


Apparently I can't get the cluster fixed by restarting the OSDs all over
again. Is there any other option then?

Thank you.

Lukas Kubin



[root@q04 ~]# ceph -s
cluster ec433b4a-9dc0-4d08-bde4-f1657b1fdb99
 health HEALTH_ERR 9 pgs backfill; 1 pgs backfilling; 521 pgs degraded;
425 pgs incomplete; 13 pgs inconsistent; 20 pgs recovering; 50 pgs
recovery_wait; 151 pgs stale; 425 pgs stuck inactive; 151 pgs stuck stale;
1164 pgs stuck unclean; 12070270 requests are blocked > 32 sec; recovery
887322/35206223 objects degraded (2.520%); 119/17131232 unfound (0.001%);
13 scrub errors
 monmap e2: 3 mons at {q03=
10.255.253.33:6789/0,q04=10.255.253.34:6789/0,q05=10.255.253.35:6789/0},
election epoch 90, quorum 0,1,2 q03,q04,q05
 osdmap e2194: 34 osds: 31 up, 31 in
  pgmap v7429812: 5632 pgs, 7 pools, 1446 GB data, 16729 kobjects
2915 GB used, 12449 GB / 15365 GB avail
887322/35206223 objects degraded (2.520%); 119/17131232 unfound
(0.001%)
  38 active+recovery_wait+remapped
4455 active+clean
  65 stale+incomplete
   3 active+recovering+remapped
 359 incomplete
  12 active+recovery_wait
 139 active+remapped
  86 stale+active+degraded
  16 active+recovering
   1 active+remapped+backfilling
  13 active+clean+inconsistent
   9 active+remapped+wait_backfill
 434 active+degraded
   1 remapped+incomplete
   1 active+recovering+degraded+remapped
  client io 0 B/s rd, 469 kB/s wr, 48 op/s

[root@q04 ~]# ceph osd tree
# idweight  type name   up/down reweight
-5  3.24root ssd
-6  1.62host q06
16  0.18osd.16  up  1
17  0.18osd.17  up  1
18  0.18osd.18  up  1
19  0.18osd.19  up  1
20  0.18osd.20  up  1
21  0.18osd.21  up  1
22  0.18osd.22  up  1
23  0.18osd.23  up  1
24  0.18osd.24  up  1
-7  1.62host q07
25  0.18osd.25  up  1
26  0.18osd.26  up  1
27  0.18osd.27  up  1
28  0.18osd.28  up  1
29  0.18osd.29  up  1
30  0.18osd.30  up  1
31  0.18osd.31  up  1
32  0.18osd.32  up  1
33  0.18osd.33  up  1
-1  14.56   root default
-4  14.56   root sata
-2  7.28host q08
0   0.91osd.0   up  1
1   0.91osd.1   up  1
2   0.91osd.2   up  1
3   0.91osd.3   up  1
11  0.91osd.11  up  1
12  0.91osd.12  up  1
13  0.91osd.13  down0
14  0.91osd.14  up  1
-3  7.28host q09
4   0.91osd.4   up  1
5   0.91osd.5   up  1
6   0.91osd.6   up  1
7   0.91osd.7   up  1
8   0.91 

[ceph-users] fail to add another rgw

2014-10-29 Thread yuelongguang
hi, clewis:
my environment:
one ceph cluster, 3 nodes, each node has one  monitor and  one osd.  one 
rgw(rgw1) which is on one of them(osd1).  before i deploy the second rgw(rgw2), 
the first rgw works well.
after i deploy a second rgw, which can not start.
the number of radosgw process increases constantly. 
the configuration file of rgw1 and rgw2 are almost same, except servername, 
host option of section of client.radosgw.gateway in ceph.conf.
default region, default zone.
 
another test:
i shutdown rgw1, then try to start rgw2. of course rgw3 can not start like 
before.
then i try to restart rgw1, it  fails . errors are almost  same as rgw2? 
i try to deploy multiple rgw on one ceph cluster in default zone and default 
region。
 
thanks。
---log---
[root@cephosd2-monb ceph]# /usr/bin/radosgw -d -c /etc/ceph/ceph.conf 
--debug-rgw=10 -n client.radosgw.gateway
2014-10-29 21:59:10.763921 7f32d24cf820  0 ceph version 0.80.7 
(6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 11887
2014-10-29 21:59:10.767922 7f32d24cf820 -1 asok(0xaa3110) 
AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to 
bind the UNIX domain socket to 
'/var/run/ceph/ceph-client.radosgw.gateway.asok': (17) File exists
2014-10-29 21:59:10.776185 7f32c17fb700  2 
RGWDataChangesLog::ChangesRenewThread: start
2014-10-29 21:59:10.777282 7f32d24cf820 10 cache get: 
name=.rgw.root+default.region : miss
2014-10-29 21:59:10.780609 7f32d24cf820 10 cache put: 
name=.rgw.root+default.region
2014-10-29 21:59:10.780617 7f32d24cf820 10 adding .rgw.root+default.region to 
cache LRU end
2014-10-29 21:59:10.780634 7f32d24cf820 10 cache get: 
name=.rgw.root+default.region : type miss (requested=1, cached=6)
2014-10-29 21:59:10.780658 7f32d24cf820 10 cache get: 
name=.rgw.root+default.region : hit
2014-10-29 21:59:10.781820 7f32d24cf820 10 cache put: 
name=.rgw.root+default.region
2014-10-29 21:59:10.781825 7f32d24cf820 10 moving .rgw.root+default.region to 
cache LRU end
2014-10-29 21:59:10.781883 7f32d24cf820 10 cache get: 
name=.rgw.root+region_info.default : miss
2014-10-29 21:59:10.783149 7f32d24cf820 10 cache put: 
name=.rgw.root+region_info.default
2014-10-29 21:59:10.783156 7f32d24cf820 10 adding .rgw.root+region_info.default 
to cache LRU end
2014-10-29 21:59:10.783168 7f32d24cf820 10 cache get: 
name=.rgw.root+region_info.default : type miss (requested=1, cached=6)
2014-10-29 21:59:10.783187 7f32d24cf820 10 cache get: 
name=.rgw.root+region_info.default : hit
2014-10-29 21:59:10.784622 7f32d24cf820 10 cache put: 
name=.rgw.root+region_info.default
2014-10-29 21:59:10.784627 7f32d24cf820 10 moving .rgw.root+region_info.default 
to cache LRU end
2014-10-29 21:59:10.784671 7f32d24cf820 10 cache get: 
name=.rgw.root+zone_info.default : miss
2014-10-29 21:59:10.788050 7f32d24cf820 10 cache put: 
name=.rgw.root+zone_info.default
2014-10-29 21:59:10.788071 7f32d24cf820 10 adding .rgw.root+zone_info.default 
to cache LRU end
2014-10-29 21:59:10.788091 7f32d24cf820 10 cache get: 
name=.rgw.root+zone_info.default : type miss (requested=1, cached=6)
2014-10-29 21:59:10.788125 7f32d24cf820 10 cache get: 
name=.rgw.root+zone_info.default : hit
2014-10-29 21:59:10.789630 7f32d24cf820 10 cache put: 
name=.rgw.root+zone_info.default
2014-10-29 21:59:10.789645 7f32d24cf820 10 moving .rgw.root+zone_info.default 
to cache LRU end
2014-10-29 21:59:10.789695 7f32d24cf820  2 zone default is master
2014-10-29 21:59:10.789742 7f32d24cf820 10 cache get: name=.rgw.root+region_map 
: miss
2014-10-29 21:59:10.791929 7f32d24cf820 10 cache put: name=.rgw.root+region_map
2014-10-29 21:59:10.791958 7f32d24cf820 10 adding .rgw.root+region_map to cache 
LRU end
2014-10-29 21:59:10.898679 7f32c0af7700  2 garbage collection: start
2014-10-29 21:59:10.899114 7f32a35fe700  0 ERROR: can't get key: ret=-2
2014-10-29 21:59:10.899663 7f32a35fe700  0 ERROR: sync_all_users() returned 
ret=-2
2014-10-29 21:59:10.900019 7f32d24cf820  0 framework: fastcgi
2014-10-29 21:59:10.900046 7f32d24cf820  0 starting handler: fastcgi
2014-10-29 21:59:10.909479 7f32a20fb700 10 allocated request req=0x7f329400b7c0
2014-10-29 21:59:10.926163 7f32c0af7700  0 RGWGC::process() failed to acquire 
lock on gc.89
2014-10-29 21:59:10.927823 7f32c0af7700  0 RGWGC::process() failed to acquire 
lock on gc.90
2014-10-29 21:59:10.958487 7f32c0af7700  0 RGWGC::process() failed to acquire 
lock on gc.93
2014-10-29 21:59:11.002497 7f32c0af7700  0 RGWGC::process() failed to acquire 
lock on gc.97
2014-10-29 21:59:11.032245 7f32c0af7700  0 RGWGC::process() failed to acquire 
lock on gc.0___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD process exhausting server memory

2014-10-29 Thread Lukáš Kubín
Hello,
I've found my ceph v 0.80.3 cluster in a state with 5 of 34 OSDs being down
through night after months of running without change. From Linux logs I
found out the OSD processes were killed because they consumed all available
memory.

Those 5 failed OSDs were from different hosts of my 4-node cluster (see
below). Two hosts act as SSD cache tier in some of my pools. The other two
hosts are the default rotational drives storage.

After checking the Linux was not out of memory I've attempted to restart
those failed OSDs. Most of those OSD daemon exhaust all memory in seconds
and got killed by Linux again:

Oct 28 22:16:34 q07 kernel: Out of memory: Kill process 24207 (ceph-osd)
score 867 or sacrifice child
Oct 28 22:16:34 q07 kernel: Killed process 24207, UID 0, (ceph-osd)
total-vm:59974412kB, anon-rss:59076880kB, file-rss:512kB


On the host I've found lots of similar "slow request" messages preceding
the crash:

2014-10-28 22:11:20.885527 7f25f84d1700  0 log [WRN] : slow request
31.117125 seconds old, received at 2014-10-28 22:10:49.768291:
osd_sub_op(client.168752.0:2197931 14.2c7
888596c7/rbd_data.293272f8695e4.006f/head//14 [] v 1551'377417
snapset=0=[]:[] snapc=0=[]) v10 currently no flag points reached
2014-10-28 22:11:21.885668 7f25f84d1700  0 log [WRN] : 67 slow requests, 1
included below; oldest blocked for > 9879.304770 secs


Apparently I can't get the cluster fixed by restarting the OSDs all over
again. Is there any other option then?

Thank you.

Lukas Kubin



[root@q04 ~]# ceph -s
cluster ec433b4a-9dc0-4d08-bde4-f1657b1fdb99
 health HEALTH_ERR 9 pgs backfill; 1 pgs backfilling; 521 pgs degraded;
425 pgs incomplete; 13 pgs inconsistent; 20 pgs recovering; 50 pgs
recovery_wait; 151 pgs stale; 425 pgs stuck inactive; 151 pgs stuck stale;
1164 pgs stuck unclean; 12070270 requests are blocked > 32 sec; recovery
887322/35206223 objects degraded (2.520%); 119/17131232 unfound (0.001%);
13 scrub errors
 monmap e2: 3 mons at {q03=
10.255.253.33:6789/0,q04=10.255.253.34:6789/0,q05=10.255.253.35:6789/0},
election epoch 90, quorum 0,1,2 q03,q04,q05
 osdmap e2194: 34 osds: 31 up, 31 in
  pgmap v7429812: 5632 pgs, 7 pools, 1446 GB data, 16729 kobjects
2915 GB used, 12449 GB / 15365 GB avail
887322/35206223 objects degraded (2.520%); 119/17131232 unfound
(0.001%)
  38 active+recovery_wait+remapped
4455 active+clean
  65 stale+incomplete
   3 active+recovering+remapped
 359 incomplete
  12 active+recovery_wait
 139 active+remapped
  86 stale+active+degraded
  16 active+recovering
   1 active+remapped+backfilling
  13 active+clean+inconsistent
   9 active+remapped+wait_backfill
 434 active+degraded
   1 remapped+incomplete
   1 active+recovering+degraded+remapped
  client io 0 B/s rd, 469 kB/s wr, 48 op/s

[root@q04 ~]# ceph osd tree
# idweight  type name   up/down reweight
-5  3.24root ssd
-6  1.62host q06
16  0.18osd.16  up  1
17  0.18osd.17  up  1
18  0.18osd.18  up  1
19  0.18osd.19  up  1
20  0.18osd.20  up  1
21  0.18osd.21  up  1
22  0.18osd.22  up  1
23  0.18osd.23  up  1
24  0.18osd.24  up  1
-7  1.62host q07
25  0.18osd.25  up  1
26  0.18osd.26  up  1
27  0.18osd.27  up  1
28  0.18osd.28  up  1
29  0.18osd.29  up  1
30  0.18osd.30  up  1
31  0.18osd.31  up  1
32  0.18osd.32  up  1
33  0.18osd.33  up  1
-1  14.56   root default
-4  14.56   root sata
-2  7.28host q08
0   0.91osd.0   up  1
1   0.91osd.1   up  1
2   0.91osd.2   up  1
3   0.91osd.3   up  1
11  0.91osd.11  up  1
12  0.91osd.12  up  1
13  0.91osd.13  down0
14  0.91osd.14  up  1
-3  7.28host q09
4   0.91osd.4   up  1
5   0.91osd.5   up  1
6   0.91osd.6   up  1
7   0.91osd.7   up  1
8   0.91 

[ceph-users] Crash with rados cppool and snapshots

2014-10-29 Thread Daniel Schneller
Hi!

We are exploring options to regularly preserve (i.e. backup) the
contents of the pools backing our rados gateways. For that we create
nightly snapshots of all the relevant pools when there is no activity
on the system to get consistent states.

In order to restore the whole pools back to a specific snapshot state,
we tried to use the rados cppool command (see below) to copy a snapshot
state into a new pool. Unfortunately this causes a segfault. Are we
doing anything wrong?

This command:

rados cppool --snap snap-1 deleteme.lp deleteme.lp2 2> segfault.txt

Produces this output:

*** Caught signal (Segmentation fault) ** in thread 7f8f49a927c0 ceph
version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: rados()
[0x43eedf] 2: (()+0x10340) [0x7f8f48738340] 3:
(librados::IoCtxImpl::snap_lookup(char const*, unsigned long*)+0x17)
[0x7f8f48aff127] 4: (main()+0x1385) [0x411e75] 5:
(__libc_start_main()+0xf5) [0x7f8f4795fec5] 6: rados() [0x41c6f7]
2014-10-29 12:03:22.761653 7f8f49a927c0 -1 *** Caught signal
(Segmentation fault) ** in thread 7f8f49a927c0

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1:
 rados() [0x43eedf] 2: (()+0x10340) [0x7f8f48738340] 3:
 (librados::IoCtxImpl::snap_lookup(char const*, unsigned long*)+0x17)
 [0x7f8f48aff127] 4: (main()+0x1385) [0x411e75] 5:
 (__libc_start_main()+0xf5) [0x7f8f4795fec5] 6: rados() [0x41c6f7] NOTE:
 a copy of the executable, or `objdump -rdS ` is needed to
 interpret this.

Full segfault file and the objdump output for the rados command can be
found here:

- https://public.centerdevice.de/53bddb80-423e-4213-ac62-59fe8dbb9bea
- https://public.centerdevice.de/50b81566-41fb-439a-b58b-e1e32d75f32a

We updated to the 0.80.7 release (saw the issue with 0.80.5 before and
had hoped that the long list of bugfixes in the release notes would
include a fix for this) but are still seeing it. Rados gateways, OSDs,
MONs etc. have all been restarted after the update. Package versions 
as follows:

daniel.schneller@node01 [~] $  
➜  dpkg -l | grep ceph
ii  ceph0.80.7-1trusty 
ii  ceph-common 0.80.7-1trusty 
ii  ceph-fs-common  0.80.7-1trusty 
ii  ceph-fuse   0.80.7-1trusty 
ii  ceph-mds0.80.7-1trusty 
ii  libcephfs1  0.80.7-1trusty 
ii  python-ceph 0.80.7-1trusty 

daniel.schneller@node01 [~] $  
➜  uname -a
Linux node01 3.13.0-27-generic #50-Ubuntu SMP Thu May 15 18:06:16 
   UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Copying without the snapshot works. Should this work at least in 
theory?

Thanks! 

Daniel

-- 
Daniel Schneller
Mobile Development Lead
 
CenterDevice GmbH  | Merscheider Straße 1
   | 42699 Solingen
tel: +49 1754155711| Deutschland
daniel.schnel...@centerdevice.com  | www.centerdevice.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds isn't working anymore after osd's running full

2014-10-29 Thread Jasper Siero
Hello Greg,

I added the debug options which you mentioned and started the process again:

[root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
/var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
--reset-journal 0
old journal was 9483323613~134233517
new journal start will be 9621733376 (4176246 bytes past old end)
writing journal head
writing EResetJournal entry
done
[root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 -c /etc/ceph/ceph.conf 
--cluster ceph --undump-journal 0 journaldumptgho-mon001 
undump journaldumptgho-mon001
start 9483323613 len 134213311
writing header 200.
 writing 9483323613~1048576
 writing 9484372189~1048576
 writing 9485420765~1048576
 writing 9486469341~1048576
 writing 9487517917~1048576
 writing 9488566493~1048576
 writing 9489615069~1048576
 writing 9490663645~1048576
 writing 9491712221~1048576
 writing 9492760797~1048576
 writing 9493809373~1048576
 writing 9494857949~1048576
 writing 9495906525~1048576
 writing 9496955101~1048576
 writing 9498003677~1048576
 writing 9499052253~1048576
 writing 9500100829~1048576
 writing 9501149405~1048576
 writing 9502197981~1048576
 writing 9503246557~1048576
 writing 9504295133~1048576
 writing 9505343709~1048576
 writing 9506392285~1048576
 writing 9507440861~1048576
 writing 9508489437~1048576
 writing 9509538013~1048576
 writing 9510586589~1048576
 writing 9511635165~1048576
 writing 9512683741~1048576
 writing 9513732317~1048576
 writing 9514780893~1048576
 writing 9515829469~1048576
 writing 9516878045~1048576
 writing 9517926621~1048576
 writing 9518975197~1048576
 writing 9520023773~1048576
 writing 9521072349~1048576
 writing 9522120925~1048576
 writing 9523169501~1048576
 writing 9524218077~1048576
 writing 9525266653~1048576
 writing 9526315229~1048576
 writing 9527363805~1048576
 writing 9528412381~1048576
 writing 9529460957~1048576
 writing 9530509533~1048576
 writing 9531558109~1048576
 writing 9532606685~1048576
 writing 9533655261~1048576
 writing 9534703837~1048576
 writing 9535752413~1048576
 writing 9536800989~1048576
 writing 9537849565~1048576
 writing 9538898141~1048576
 writing 9539946717~1048576
 writing 9540995293~1048576
 writing 9542043869~1048576
 writing 9543092445~1048576
 writing 9544141021~1048576
 writing 9545189597~1048576
 writing 9546238173~1048576
 writing 9547286749~1048576
 writing 9548335325~1048576
 writing 9549383901~1048576
 writing 9550432477~1048576
 writing 9551481053~1048576
 writing 9552529629~1048576
 writing 9553578205~1048576
 writing 9554626781~1048576
 writing 9555675357~1048576
 writing 9556723933~1048576
 writing 9557772509~1048576
 writing 9558821085~1048576
 writing 9559869661~1048576
 writing 9560918237~1048576
 writing 9561966813~1048576
 writing 9563015389~1048576
 writing 9564063965~1048576
 writing 9565112541~1048576
 writing 9566161117~1048576
 writing 9567209693~1048576
 writing 9568258269~1048576
 writing 9569306845~1048576
 writing 9570355421~1048576
 writing 9571403997~1048576
 writing 9572452573~1048576
 writing 9573501149~1048576
 writing 9574549725~1048576
 writing 9575598301~1048576
 writing 9576646877~1048576
 writing 9577695453~1048576
 writing 9578744029~1048576
 writing 9579792605~1048576
 writing 9580841181~1048576
 writing 9581889757~1048576
 writing 9582938333~1048576
 writing 9583986909~1048576
 writing 9585035485~1048576
 writing 9586084061~1048576
 writing 9587132637~1048576
 writing 9588181213~1048576
 writing 9589229789~1048576
 writing 9590278365~1048576
 writing 9591326941~1048576
 writing 9592375517~1048576
 writing 9593424093~1048576
 writing 9594472669~1048576
 writing 9595521245~1048576
 writing 9596569821~1048576
 writing 9597618397~1048576
 writing 9598666973~1048576
 writing 9599715549~1048576
 writing 9600764125~1048576
 writing 9601812701~1048576
 writing 9602861277~1048576
 writing 9603909853~1048576
 writing 9604958429~1048576
 writing 9606007005~1048576
 writing 9607055581~1048576
 writing 9608104157~1048576
 writing 9609152733~1048576
 writing 9610201309~1048576
 writing 9611249885~1048576
 writing 9612298461~1048576
 writing 9613347037~1048576
 writing 9614395613~1048576
 writing 9615444189~1048576
 writing 9616492765~1044159
done.
[root@th1-mon001 ~]# service ceph start mds
=== mds.th1-mon001 === 
Starting Ceph mds.th1-mon001 on th1-mon001...
starting mds.th1-mon001 at :/0


The new logs:
http://pastebin.com/wqqjuEpy


Kind regards,

Jasper


Van: gregory.far...@inktank.com [gregory.far...@inktank.com] namens Gregory 
Farnum [gfar...@redhat.com]
Verzonden: dinsdag 28 oktober 2014 19:26
Aan: Jasper Siero
CC: John Spray; ceph-users
Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

You'll need to gather a log with the offsets visible; you can do this
with "debug ms = 1; debug mds = 20; debug journaler = 20".
-Greg

On Fri, Oct 24, 2014 at 7:03 AM, Jasper Siero
 wrote:
> Hello Greg and John,
>
> I used the patch on the ceph cluster a

Re: [ceph-users] OSD process exhausting server memory

2014-10-29 Thread Michael J. Kidd
Hello Lukas,
  Please try the following process for getting all your OSDs up and
operational...

* Set the following flags: noup, noin, noscrub, nodeep-scrub, norecover,
nobackfill
for i in noup noin noscrub nodeep-scrub norecover nobackfill; do ceph osd
set $i; done

* Stop all OSDs (I know, this seems counter productive)
* Set all OSDs down / out
for i in $(ceph osd tree | grep osd | awk '{print $3}'); do ceph osd down
$i; ceph osd out $i; done
* Set recovery / backfill throttles as well as heartbeat and OSD map
processing tweaks in the /etc/ceph/ceph.conf file under the [osd] section:
[osd]
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_max_single_start = 1
osd_backfill_scan_min = 8
osd_heartbeat_interval = 36
osd_heartbeat_grace = 240
osd_map_message_max = 1000
osd_map_cache_size = 3136

* Start all OSDs
* Monitor 'top' for 0% CPU on all OSD processes.. it may take a while..  I
usually issue 'top' then, the keys M c
 - M = Sort by memory usage
 - c = Show command arguments
 - This allows to easily monitor the OSD process and know which OSDs have
settled, etc..
* Once all OSDs have hit 0% CPU utilization, remove the 'noup' flag
 - ceph osd unset noup
* Again, wait for 0% CPU utilization (may  be immediate, may take a while..
just gotta wait)
* Once all OSDs have hit 0% CPU again, remove the 'noin' flag
 - ceph osd unset noin
 - All OSDs should now appear up/in, and will go through peering..
* Once ceph -s shows no further activity, and OSDs are back at 0% CPU
again, unset 'nobackfill'
 - ceph osd unset nobackfill
* Once ceph -s shows no further activity, and OSDs are back at 0% CPU
again, unset 'norecover'
 - ceph osd unset norecover
* Monitor OSD memory usage... some OSDs may get killed off again, but their
subsequent restart should consume less memory and allow more recovery to
occur between each step above.. and ultimately, hopefully... your entire
cluster will come back online and be usable.

## Clean-up:
* Remove all of the above set options from ceph.conf
* Reset the running OSDs to their defaults:
ceph tell osd.\* injectargs '--osd_max_backfills 10
--osd_recovery_max_active 15 --osd_recovery_max_single_start 5
--osd_backfill_scan_min 64 --osd_heartbeat_interval 6 --osd_heartbeat_grace
36 --osd_map_message_max 100 --osd_map_cache_size 500'
* Unset the noscrub and nodeep-scrub flags:
 - ceph osd unset noscrub
 - ceph osd unset nodeep-scrub


## For help identifying why memory usage was so high, please provide:
* ceph osd dump | grep pool
* ceph osd crush rule dump

Let us know if this helps... I know it looks extreme, but it's worked for
me in the past..


Michael J. Kidd
Sr. Storage Consultant
Inktank Professional Services
 - by Red Hat

On Wed, Oct 29, 2014 at 8:51 AM, Lukáš Kubín  wrote:

> Hello,
> I've found my ceph v 0.80.3 cluster in a state with 5 of 34 OSDs being
> down through night after months of running without change. From Linux logs
> I found out the OSD processes were killed because they consumed all
> available memory.
>
> Those 5 failed OSDs were from different hosts of my 4-node cluster (see
> below). Two hosts act as SSD cache tier in some of my pools. The other two
> hosts are the default rotational drives storage.
>
> After checking the Linux was not out of memory I've attempted to restart
> those failed OSDs. Most of those OSD daemon exhaust all memory in seconds
> and got killed by Linux again:
>
> Oct 28 22:16:34 q07 kernel: Out of memory: Kill process 24207 (ceph-osd)
> score 867 or sacrifice child
> Oct 28 22:16:34 q07 kernel: Killed process 24207, UID 0, (ceph-osd)
> total-vm:59974412kB, anon-rss:59076880kB, file-rss:512kB
>
>
> On the host I've found lots of similar "slow request" messages preceding
> the crash:
>
> 2014-10-28 22:11:20.885527 7f25f84d1700  0 log [WRN] : slow request
> 31.117125 seconds old, received at 2014-10-28 22:10:49.768291:
> osd_sub_op(client.168752.0:2197931 14.2c7
> 888596c7/rbd_data.293272f8695e4.006f/head//14 [] v 1551'377417
> snapset=0=[]:[] snapc=0=[]) v10 currently no flag points reached
> 2014-10-28 22:11:21.885668 7f25f84d1700  0 log [WRN] : 67 slow requests, 1
> included below; oldest blocked for > 9879.304770 secs
>
>
> Apparently I can't get the cluster fixed by restarting the OSDs all over
> again. Is there any other option then?
>
> Thank you.
>
> Lukas Kubin
>
>
>
> [root@q04 ~]# ceph -s
> cluster ec433b4a-9dc0-4d08-bde4-f1657b1fdb99
>  health HEALTH_ERR 9 pgs backfill; 1 pgs backfilling; 521 pgs
> degraded; 425 pgs incomplete; 13 pgs inconsistent; 20 pgs recovering; 50
> pgs recovery_wait; 151 pgs stale; 425 pgs stuck inactive; 151 pgs stuck
> stale; 1164 pgs stuck unclean; 12070270 requests are blocked > 32 sec;
> recovery 887322/35206223 objects degraded (2.520%); 119/17131232 unfound
> (0.001%); 13 scrub errors
>  monmap e2: 3 mons at {q03=
> 10.255.253.33:6789/0,q04=10.255.253.34:6789/0,q05=10.255.253.35:6789/0},
> election epoch 90, quorum 0,1,2 q03,q04,q05
>  osdmap e2194: 34 osds

Re: [ceph-users] use ZFS for OSDs

2014-10-29 Thread Sage Weil
On Wed, 29 Oct 2014, Kenneth Waegeman wrote:
> Hi,
> 
> We are looking to use ZFS for our OSD backend, but I have some questions.
> 
> My main question is: Does Ceph already supports the writeparallel mode for ZFS
> ? (as described here:
> http://www.sebastien-han.fr/blog/2013/12/02/ceph-performance-interesting-things-going-on/)
> I've found this, but I suppose it is outdated:
> https://wiki.ceph.com/Planning/Blueprints/Emperor/osd%3A_ceph_on_zfs

All of the code is there, but it is almost completely untested.
 
> Should Ceph be build with ZFS support? I found a --with-zfslib option
> somewhere, but can someone verify this, or better has instructions for it?:-)
>
> What parameters should be tuned to use this?
> I found these :
>filestore zfs_snap = 1

Yes

>journal_aio = 0
>journal_dio = 0

Maybe, if ZFS doesn't support directio or aio.

Curious to hear how it goes!  Wouldn't recommend this for production 
though without significant testing.

sage

> 
> Are there other things we need for it?
> 
> Many thanks!!
> Kenneth
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Delete pools with low priority?

2014-10-29 Thread Daniel Schneller

Bump :-)

Any ideas on this? They would be much appreciated.

Also: Sorry for a possible double post, client had forgotten its email config.

On 2014-10-22 21:21:54 +, Daniel Schneller said:


We have been running several rounds of benchmarks through the Rados
Gateway. Each run creates several hundred thousand objects and similarly
many containers.

The cluster consists of 4 machines, 12 OSD disks (spinning, 4TB) — 48
OSDs total.

After running a set of benchmarks we renamed the pools used by the
gateway pools to get a clean baseline. In total we now have several
million objects and containers in 3 pools. Redundancy for all pools is
set to 3.

Today we started deleting the benchmark data. Once the first renamed set
of RGW pools was executed, cluster performance started to go down the
drain. Using iotop we can see that the disks are all working furiously.
As the command to delete the pools came back very quickly, the
assumption is that we are now seeing the effects of the actual objects
being removed, causing lots and lots of IO activity on the disks,
negatively impacting regular operations.

We are running OpenStack on top of Ceph, and we see drastic reduction in
responsiveness of these machines as well as in CephFS.

Fortunately this is still a test setup, so no production systems are
affected. Nevertheless I would like to ask a few questions:

1) Is it possible to have the object deletion run in some low-prio mode?
2) If not, is there another way to delete lots and lots of objects
without affecting the rest of the cluster so badly? 3) Can we somehow
determine the progress of the deletion so far? We would like to estimate
if this is going to take hours, days or weeks? 4) Even if not possible
for the already running deletion, could be get a progress for the
remaining pools we still want to delete? 5) Are there any parameters
that we might tune — even if just temporarily - to speed this up?

Slide 18 of http://www.slideshare.net/Inktank_Ceph/scaling-ceph-at-cern
describes a very similar situation.

Thanks, Daniel

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] use ZFS for OSDs

2014-10-29 Thread Michal Kozanecki
Hi Kenneth,

I run a small ceph test cluster using ZoL (ZFS on Linux) ontop of CentOS 7, so 
I'll try and answer any questions. :) 

Yes, ZFS writeparallel support is there, but NOT compiled in by default. You'll 
need to compile it with --with-zlib, but that by itself will fail to compile 
the ZFS support as I found out. You need to ensure you have ZoL installed and 
working, and then pass the location of libzfs to ceph at compile time. 
Personally I just set my environment variables before compiling like so;

ldconfig
export LIBZFS_LIBS="/usr/include/libzfs/"
export LIBZFS_CFLAGS="-I/usr/include/libzfs -I/usr/include/libspl"

However, the writeparallel performance isn't all that great. The writeparallel 
mode makes heavy use of ZFS's (and BtrFS's for that matter) snapshotting 
capability, and the snap performance on ZoL, at least when I last tested it, is 
pretty terrible. You lose any performance benefits you gain with writeparallel 
to the poor snap performance. 

If you decide that you don't need writeparallel mode you, can use the prebuilt 
packages (or compile with default options) without issue. Ceph (without zlib 
support compiled in) will detect ZFS as a generic/ext4 file system and work 
accordingly. 

As far as performance tweaking, ZIL, write journals and etc, I found that the 
performance difference between using a ZIL vs ceph write journal is about the 
same. I also found that doing both (ZIL AND writejournal) didn't give me much 
of a performance benefit. In my small test cluster I decided after testing to 
forego the ZIL and only use a SSD backed ceph write journal on each OSD, with 
each OSD being a single ZFS dataset/vdev(no zraid or mirroring). With Ceph 
handling the redundancy at the OSD level I saw no need for using ZFS mirroring 
or zraid, instead if ZFS detects corruption instead of self-healing it sends a 
read failure of the pg file to ceph, and then ceph's scrub mechanisms should 
then repair/replace the pg file using a good replica elsewhere on the cluster. 
ZFS + ceph are a beautiful bitrot fighting match!

Let me know if there's anything else I can answer. 

Cheers

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Kenneth Waegeman
Sent: October-29-14 6:09 AM
To: ceph-users
Subject: [ceph-users] use ZFS for OSDs

Hi,

We are looking to use ZFS for our OSD backend, but I have some questions.

My main question is: Does Ceph already supports the writeparallel mode for ZFS 
? (as described here:  
http://www.sebastien-han.fr/blog/2013/12/02/ceph-performance-interesting-things-going-on/)
I've found this, but I suppose it is outdated:  
https://wiki.ceph.com/Planning/Blueprints/Emperor/osd%3A_ceph_on_zfs

Should Ceph be build with ZFS support? I found a --with-zfslib option 
somewhere, but can someone verify this, or better has instructions for
it?:-)

What parameters should be tuned to use this?
I found these :
 filestore zfs_snap = 1
 journal_aio = 0
 journal_dio = 0

Are there other things we need for it?

Many thanks!!
Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-announce list

2014-10-29 Thread Brian Rak
Would it be possible to establish an announcement mailing list, used 
only for announcing new versions?


Many other projects have similar lists, and they're very helpful for 
keeping up on changes, while not being particularly noisy.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HTTP Get returns 404 Not Found for Swift API

2014-10-29 Thread Pedro Miranda
I'm finishing a Masters degree and using Ceph for the first time, and would
really appreciate any help because I'm really stuck with this.

Thank you.

On Tue, Oct 28, 2014 at 9:23 PM, Pedro Miranda  wrote:

> Hi I'm new using Ceph and I have a very basic Ceph cluster with 1 mon in
> one node and 2 OSDs in two separate nodes (all CentOS 7). I followed the
> quick-ceph-deploy 
>  tutorial.
> All went well.
>
> Then I started the quick-rgw
>  tutorial. I installed the
> optimized apache2 and fastcgi packages. I skipped the add-wildcard-to-dns
> 
>  because
> I'm want the Swift API and not the S3 style subdomains.
>
> Finnally I created a user johndoe and a subuser johndoe:swift
>
> { "user_id": "johndoe",
>   "display_name": "John Doe",
>   "email": "j...@example.com",
>   "suspended": 0,
>   "max_buckets": 1000,
>   "auid": 0,
>   "subusers": [
> { "id": "johndoe:swift",
>   "permissions": "full-control"}],
>   "keys": [
> { "user": "johndoe",
>   "access_key": "11BS02LGFB6AL6H1ADMW",
>   "secret_key": "vzCEkuryfn060dfee4fgQPqFrncKEIkh3ZcdOANY"}],
>   "swift_keys": [
> { "user": "johndoe:swift",
>   "secret_key": "vzCEkuryfn060dfee4fgQPqFrncKEIkh3ZcdOANX"}],
>   "caps": [],
>   "op_mask": "read, write, delete",
>   "default_placement": "",
>   "placement_tags": [],
>   "bucket_quota": { "enabled": false,
>   "max_size_kb": -1,
>   "max_objects": -1},
>   "user_quota": { "enabled": false,
>   "max_size_kb": -1,
>   "max_objects": -1},
>   "temp_url_keys": []}
>
>
> And when I make an HTTP call to authenticate I get the 404 Not Found for
> the /auth
>
> [root@ceph04 ~]# curl -v -i https://ceph04.ncg.ingrid.pt/auth -X GET -H
> "X-Auth-User: johndoe:swift" -H "X-Auth-Key:
> vzCEkuryfn060dfee4fgQPqFrncKEIkh3ZcdOANX"
> * About to connect() to ceph04.ncg.ingrid.pt port 443 (#0)
> *   Trying 10.193.50.4...
> * Connected to ceph04.ncg.ingrid.pt (10.193.50.4) port 443 (#0)
> * Initializing NSS with certpath: sql:/etc/pki/nssdb
> *   CAfile: /etc/pki/tls/certs/ca-bundle.crt
>   CApath: none
> * SSL connection using TLS_DHE_RSA_WITH_AES_128_CBC_SHA
> * Server certificate:
> * subject: CN=ceph04.ncg.ingrid.pt,O=Default Company
> Ltd,L=lisbon,ST=lisbon,C=pt
> * start date: Out 27 13:41:53 2014 GMT
> * expire date: Out 27 13:41:53 2015 GMT
> * common name: ceph04.ncg.ingrid.pt
> * issuer: CN=ceph04.ncg.ingrid.pt,O=Default Company
> Ltd,L=lisbon,ST=lisbon,C=pt
> > GET /auth HTTP/1.1
> > User-Agent: curl/7.29.0
> > Host: ceph04.ncg.ingrid.pt
> > Accept: */*
> > X-Auth-User: johndoe:swift
> > X-Auth-Key: vzCEkuryfn060dfee4fgQPqFrncKEIkh3ZcdOANX
> >
> < HTTP/1.1 404 Not Found
> HTTP/1.1 404 Not Found
> < Date: Tue, 28 Oct 2014 21:10:14 GMT
> Date: Tue, 28 Oct 2014 21:10:14 GMT
> < Server: Apache/2.4.6 (Red Hat) mod_fastcgi/mod_fastcgi-SNAP-0910052141
> OpenSSL/1.0.1e-fips
> Server: Apache/2.4.6 (Red Hat) mod_fastcgi/mod_fastcgi-SNAP-0910052141
> OpenSSL/1.0.1e-fips
> < Content-Length: 202
> Content-Length: 202
> < Content-Type: text/html; charset=iso-8859-1
> Content-Type: text/html; charset=iso-8859-1
>
> <
> 
> 
> 404 Not Found
> 
> Not Found
> The requested URL /auth was not found on this server.
> 
> * Connection #0 to host ceph04.ncg.ingrid.pt left intact
>
> I don't understand what I am doing wrong!!
>
> --
> ===
> Pedro Miranda| Tel: +351
> 212 969 519
> Aluno de MIEI | Telem: 969 559 560
> Departamento de Informática | 910 642 753
> Faculdade de Ciências e Tecnologia | e-mail: potter...@gmail.com
> Universidade Nova de Lisboa | e-mail: p.mira...@campus.fct.unl.pt
> 2829-516 Caparica, Portugal | LinkedIn: pt.linkedin.com/in/pmsmiranda/
>



-- 
===
Pedro Miranda| Tel: +351
212 969 519
Aluno de MIEI | Telem: 969 559 560
Departamento de Informática | 910 642 753
Faculdade de Ciências e Tecnologia | e-mail: potter...@gmail.com
Universidade Nova de Lisboa | e-mail: p.mira...@campus.fct.unl.pt
2829-516 Caparica, Portugal | LinkedIn: pt.linkedin.com/in/pmsmiranda/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HTTP Get returns 404 Not Found for Swift API

2014-10-29 Thread Yehuda Sadeh
On Tue, Oct 28, 2014 at 2:23 PM, Pedro Miranda  wrote:
> Hi I'm new using Ceph and I have a very basic Ceph cluster with 1 mon in one
> node and 2 OSDs in two separate nodes (all CentOS 7). I followed the
> quick-ceph-deploy tutorial.
> All went well.
>
> Then I started the quick-rgw tutorial. I installed the optimized apache2 and
> fastcgi packages. I skipped the add-wildcard-to-dns because I'm want the
> Swift API and not the S3 style subdomains.
>
> Finnally I created a user johndoe and a subuser johndoe:swift
>
> { "user_id": "johndoe",
>   "display_name": "John Doe",
>   "email": "j...@example.com",
>   "suspended": 0,
>   "max_buckets": 1000,
>   "auid": 0,
>   "subusers": [
> { "id": "johndoe:swift",
>   "permissions": "full-control"}],
>   "keys": [
> { "user": "johndoe",
>   "access_key": "11BS02LGFB6AL6H1ADMW",
>   "secret_key": "vzCEkuryfn060dfee4fgQPqFrncKEIkh3ZcdOANY"}],
>   "swift_keys": [
> { "user": "johndoe:swift",
>   "secret_key": "vzCEkuryfn060dfee4fgQPqFrncKEIkh3ZcdOANX"}],
>   "caps": [],
>   "op_mask": "read, write, delete",
>   "default_placement": "",
>   "placement_tags": [],
>   "bucket_quota": { "enabled": false,
>   "max_size_kb": -1,
>   "max_objects": -1},
>   "user_quota": { "enabled": false,
>   "max_size_kb": -1,
>   "max_objects": -1},
>   "temp_url_keys": []}
>
>
> And when I make an HTTP call to authenticate I get the 404 Not Found for the
> /auth
>
> [root@ceph04 ~]# curl -v -i https://ceph04.ncg.ingrid.pt/auth -X GET -H
> "X-Auth-User: johndoe:swift" -H "X-Auth-Key:
> vzCEkuryfn060dfee4fgQPqFrncKEIkh3ZcdOANX"
> * About to connect() to ceph04.ncg.ingrid.pt port 443 (#0)
> *   Trying 10.193.50.4...
> * Connected to ceph04.ncg.ingrid.pt (10.193.50.4) port 443 (#0)
> * Initializing NSS with certpath: sql:/etc/pki/nssdb
> *   CAfile: /etc/pki/tls/certs/ca-bundle.crt
>   CApath: none
> * SSL connection using TLS_DHE_RSA_WITH_AES_128_CBC_SHA
> * Server certificate:
> * subject: CN=ceph04.ncg.ingrid.pt,O=Default Company
> Ltd,L=lisbon,ST=lisbon,C=pt
> * start date: Out 27 13:41:53 2014 GMT
> * expire date: Out 27 13:41:53 2015 GMT
> * common name: ceph04.ncg.ingrid.pt
> * issuer: CN=ceph04.ncg.ingrid.pt,O=Default Company
> Ltd,L=lisbon,ST=lisbon,C=pt
>> GET /auth HTTP/1.1
>> User-Agent: curl/7.29.0
>> Host: ceph04.ncg.ingrid.pt
>> Accept: */*
>> X-Auth-User: johndoe:swift
>> X-Auth-Key: vzCEkuryfn060dfee4fgQPqFrncKEIkh3ZcdOANX
>>
> < HTTP/1.1 404 Not Found
> HTTP/1.1 404 Not Found
> < Date: Tue, 28 Oct 2014 21:10:14 GMT
> Date: Tue, 28 Oct 2014 21:10:14 GMT
> < Server: Apache/2.4.6 (Red Hat) mod_fastcgi/mod_fastcgi-SNAP-0910052141
> OpenSSL/1.0.1e-fips
> Server: Apache/2.4.6 (Red Hat) mod_fastcgi/mod_fastcgi-SNAP-0910052141
> OpenSSL/1.0.1e-fips
> < Content-Length: 202
> Content-Length: 202
> < Content-Type: text/html; charset=iso-8859-1
> Content-Type: text/html; charset=iso-8859-1
>
> <
> 
> 
> 404 Not Found
> 
> Not Found
> The requested URL /auth was not found on this server.
> 
> * Connection #0 to host ceph04.ncg.ingrid.pt left intact
>

I was just about to say that it looks like issue #9155 (for which you
just need to recreate the user), but looking at it again, it seems
that your apache is misconfigured. Can you verify your apache config
again? Make sure you don't have the default apache site enabled.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ERROR: error converting store /var/lib/ceph/osd/ceph-176: (28) No space left on device

2014-10-29 Thread David Z
Hi Guys,

Recently we have encountered an XFS issue which is "no space available" on XFS 
although there is 15% disk space available (total space is 4TB) reported by df.

This error had been seen long time before by some other users. Now we reproduce 
it even if using inode64 on XFS and inode size is 2048.

It looks the same as this one - 
https://www.novell.com/support/kb/doc.php?id=7014318. And from its “Additional 
Information”, I think our problem could be like below.

“In the case of a filesystem already running inode64, this error can be 
encountered if the filesystem does run low/out of disk space -- or if it cannot 
allocate 4 contiguous blocks for additional inodes.  So if the filesystem is 
severely fragmented -- meaning only 1-3 blocks are together -- you will not be 
able to create new files but can write additional data to older files."

I have tried to run xfs_repair which didn’t work, and free some space by moving 
out few files but it is still a matter of time to hit this issue again once it 
happened before. I didn’t google much useful information to fix this issue, 
either.

This is a known XFS issue, but we still want to check with your guys whether 
you might know some XFS tunings could help or latest fix or enhancement on XFS 
might resolve it.

Thanks.David Zhang
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CDS Hammer (Day 1) Videos Posted

2014-10-29 Thread Patrick McGarry
Hey cephers,

For those of you unable to attend yesterday (or those who would like a
replay), the video sessions from yesterday's Ceph Developer Summit:
Hammer have been posted to YouTube.  You can access them directly from
the playlist at:

https://www.youtube.com/watch?v=PlEUxK7KrBA&list=PLrBUGiINAakNjna2bJrGcc72udDz4Y-rj

They have also been linked from the wiki page so you have the context
of blueprints and the etherpad notes:

https://wiki.ceph.com/Planning/CDS/Hammer_(Oct_2014)

While some of them are still processing for the thumbnails they should
all be more or less intact.  If you see any errors or have any
questions, please let me know.

See you all at the APAC-friendly session later today at 16:00 PDT.  Thanks!


Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD process exhausting server memory

2014-10-29 Thread Lukáš Kubín
I've ended up at step "ceph osd unset noin". My OSDs are up, but not in,
even after an hour:

[root@q04 ceph-recovery]# ceph osd stat
 osdmap e2602: 34 osds: 34 up, 0 in
flags nobackfill,norecover,noscrub,nodeep-scrub


There seems to be no activity generated by OSD processes, occasionally they
show 0,3% which I believe is just some basic communication processing. No
load in network interfaces.

Is there some other step needed to bring the OSDs in?

Thank you.

Lukas

On Wed, Oct 29, 2014 at 3:58 PM, Michael J. Kidd 
wrote:

> Hello Lukas,
>   Please try the following process for getting all your OSDs up and
> operational...
>
> * Set the following flags: noup, noin, noscrub, nodeep-scrub, norecover,
> nobackfill
> for i in noup noin noscrub nodeep-scrub norecover nobackfill; do ceph osd
> set $i; done
>
> * Stop all OSDs (I know, this seems counter productive)
> * Set all OSDs down / out
> for i in $(ceph osd tree | grep osd | awk '{print $3}'); do ceph osd down
> $i; ceph osd out $i; done
> * Set recovery / backfill throttles as well as heartbeat and OSD map
> processing tweaks in the /etc/ceph/ceph.conf file under the [osd] section:
> [osd]
> osd_max_backfills = 1
> osd_recovery_max_active = 1
> osd_recovery_max_single_start = 1
> osd_backfill_scan_min = 8
> osd_heartbeat_interval = 36
> osd_heartbeat_grace = 240
> osd_map_message_max = 1000
> osd_map_cache_size = 3136
>
> * Start all OSDs
> * Monitor 'top' for 0% CPU on all OSD processes.. it may take a while..  I
> usually issue 'top' then, the keys M c
>  - M = Sort by memory usage
>  - c = Show command arguments
>  - This allows to easily monitor the OSD process and know which OSDs have
> settled, etc..
> * Once all OSDs have hit 0% CPU utilization, remove the 'noup' flag
>  - ceph osd unset noup
> * Again, wait for 0% CPU utilization (may  be immediate, may take a
> while.. just gotta wait)
> * Once all OSDs have hit 0% CPU again, remove the 'noin' flag
>  - ceph osd unset noin
>  - All OSDs should now appear up/in, and will go through peering..
> * Once ceph -s shows no further activity, and OSDs are back at 0% CPU
> again, unset 'nobackfill'
>  - ceph osd unset nobackfill
> * Once ceph -s shows no further activity, and OSDs are back at 0% CPU
> again, unset 'norecover'
>  - ceph osd unset norecover
> * Monitor OSD memory usage... some OSDs may get killed off again, but
> their subsequent restart should consume less memory and allow more recovery
> to occur between each step above.. and ultimately, hopefully... your entire
> cluster will come back online and be usable.
>
> ## Clean-up:
> * Remove all of the above set options from ceph.conf
> * Reset the running OSDs to their defaults:
> ceph tell osd.\* injectargs '--osd_max_backfills 10
> --osd_recovery_max_active 15 --osd_recovery_max_single_start 5
> --osd_backfill_scan_min 64 --osd_heartbeat_interval 6 --osd_heartbeat_grace
> 36 --osd_map_message_max 100 --osd_map_cache_size 500'
> * Unset the noscrub and nodeep-scrub flags:
>  - ceph osd unset noscrub
>  - ceph osd unset nodeep-scrub
>
>
> ## For help identifying why memory usage was so high, please provide:
> * ceph osd dump | grep pool
> * ceph osd crush rule dump
>
> Let us know if this helps... I know it looks extreme, but it's worked for
> me in the past..
>
>
> Michael J. Kidd
> Sr. Storage Consultant
> Inktank Professional Services
>  - by Red Hat
>
> On Wed, Oct 29, 2014 at 8:51 AM, Lukáš Kubín 
> wrote:
>
>> Hello,
>> I've found my ceph v 0.80.3 cluster in a state with 5 of 34 OSDs being
>> down through night after months of running without change. From Linux logs
>> I found out the OSD processes were killed because they consumed all
>> available memory.
>>
>> Those 5 failed OSDs were from different hosts of my 4-node cluster (see
>> below). Two hosts act as SSD cache tier in some of my pools. The other two
>> hosts are the default rotational drives storage.
>>
>> After checking the Linux was not out of memory I've attempted to restart
>> those failed OSDs. Most of those OSD daemon exhaust all memory in seconds
>> and got killed by Linux again:
>>
>> Oct 28 22:16:34 q07 kernel: Out of memory: Kill process 24207 (ceph-osd)
>> score 867 or sacrifice child
>> Oct 28 22:16:34 q07 kernel: Killed process 24207, UID 0, (ceph-osd)
>> total-vm:59974412kB, anon-rss:59076880kB, file-rss:512kB
>>
>>
>> On the host I've found lots of similar "slow request" messages preceding
>> the crash:
>>
>> 2014-10-28 22:11:20.885527 7f25f84d1700  0 log [WRN] : slow request
>> 31.117125 seconds old, received at 2014-10-28 22:10:49.768291:
>> osd_sub_op(client.168752.0:2197931 14.2c7
>> 888596c7/rbd_data.293272f8695e4.006f/head//14 [] v 1551'377417
>> snapset=0=[]:[] snapc=0=[]) v10 currently no flag points reached
>> 2014-10-28 22:11:21.885668 7f25f84d1700  0 log [WRN] : 67 slow requests,
>> 1 included below; oldest blocked for > 9879.304770 secs
>>
>>
>> Apparently I can't get the cluster fixed by restarting the OSDs all over
>> a

[ceph-users] ceph status 104 active+degraded+remapped 88 creating+incomplete

2014-10-29 Thread Thomas Alrin
Hi all,
I'm new to ceph. What is wrong in this ceph? How can i make status to 
change HEALTH_OK? Please help

$ceph status
cluster 62e2f40c-401b-4b3e-804a-cebbec1016c5
 health HEALTH_WARN 104 pgs degraded; 88 pgs incomplete; 88 pgs stuck 
inactive; 192 pgs stuck unclean
 monmap e1: 1 mons at {megamubuntu=192.168.2.10:6789/0}, election epoch 
1, quorum 0 megamubuntu
 osdmap e26: 2 osds: 2 up, 2 in
  pgmap v144: 192 pgs, 3 pools, 0 bytes data, 0 objects
80631 MB used, 777 GB / 901 GB avail
 104 active+degraded+remapped
  88 creating+incomplete

$ ceph osd stat
 osdmap e26: 2 osds: 2 up, 2 in

$ ceph osd tree
# idweight  type name   up/down reweight
-1  0.88root default
-2  0.88host alrin
0   0.44osd.0   up  1   
1   0.44osd.1   up  1   

$ cat ceph.conf 
[global]
fsid = 62e2f40c-401b-4b3e-804a-cebbec1016c5
mon_initial_members = megamubuntu, alrin
mon_host = 192.168.2.10
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_mkfs_type = ext4


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD process exhausting server memory

2014-10-29 Thread Michael J. Kidd
Ah, sorry... since they were set out manually, they'll need to be set in
manually..

for i in $(ceph osd tree | grep osd | awk '{print $3}'); do ceph osd in $i;
done



Michael J. Kidd
Sr. Storage Consultant
Inktank Professional Services
 - by Red Hat

On Wed, Oct 29, 2014 at 12:33 PM, Lukáš Kubín  wrote:

> I've ended up at step "ceph osd unset noin". My OSDs are up, but not in,
> even after an hour:
>
> [root@q04 ceph-recovery]# ceph osd stat
>  osdmap e2602: 34 osds: 34 up, 0 in
> flags nobackfill,norecover,noscrub,nodeep-scrub
>
>
> There seems to be no activity generated by OSD processes, occasionally
> they show 0,3% which I believe is just some basic communication processing.
> No load in network interfaces.
>
> Is there some other step needed to bring the OSDs in?
>
> Thank you.
>
> Lukas
>
> On Wed, Oct 29, 2014 at 3:58 PM, Michael J. Kidd  > wrote:
>
>> Hello Lukas,
>>   Please try the following process for getting all your OSDs up and
>> operational...
>>
>> * Set the following flags: noup, noin, noscrub, nodeep-scrub, norecover,
>> nobackfill
>> for i in noup noin noscrub nodeep-scrub norecover nobackfill; do ceph osd
>> set $i; done
>>
>> * Stop all OSDs (I know, this seems counter productive)
>> * Set all OSDs down / out
>> for i in $(ceph osd tree | grep osd | awk '{print $3}'); do ceph osd down
>> $i; ceph osd out $i; done
>> * Set recovery / backfill throttles as well as heartbeat and OSD map
>> processing tweaks in the /etc/ceph/ceph.conf file under the [osd] section:
>> [osd]
>> osd_max_backfills = 1
>> osd_recovery_max_active = 1
>> osd_recovery_max_single_start = 1
>> osd_backfill_scan_min = 8
>> osd_heartbeat_interval = 36
>> osd_heartbeat_grace = 240
>> osd_map_message_max = 1000
>> osd_map_cache_size = 3136
>>
>> * Start all OSDs
>> * Monitor 'top' for 0% CPU on all OSD processes.. it may take a while..
>> I usually issue 'top' then, the keys M c
>>  - M = Sort by memory usage
>>  - c = Show command arguments
>>  - This allows to easily monitor the OSD process and know which OSDs have
>> settled, etc..
>> * Once all OSDs have hit 0% CPU utilization, remove the 'noup' flag
>>  - ceph osd unset noup
>> * Again, wait for 0% CPU utilization (may  be immediate, may take a
>> while.. just gotta wait)
>> * Once all OSDs have hit 0% CPU again, remove the 'noin' flag
>>  - ceph osd unset noin
>>  - All OSDs should now appear up/in, and will go through peering..
>> * Once ceph -s shows no further activity, and OSDs are back at 0% CPU
>> again, unset 'nobackfill'
>>  - ceph osd unset nobackfill
>> * Once ceph -s shows no further activity, and OSDs are back at 0% CPU
>> again, unset 'norecover'
>>  - ceph osd unset norecover
>> * Monitor OSD memory usage... some OSDs may get killed off again, but
>> their subsequent restart should consume less memory and allow more recovery
>> to occur between each step above.. and ultimately, hopefully... your entire
>> cluster will come back online and be usable.
>>
>> ## Clean-up:
>> * Remove all of the above set options from ceph.conf
>> * Reset the running OSDs to their defaults:
>> ceph tell osd.\* injectargs '--osd_max_backfills 10
>> --osd_recovery_max_active 15 --osd_recovery_max_single_start 5
>> --osd_backfill_scan_min 64 --osd_heartbeat_interval 6 --osd_heartbeat_grace
>> 36 --osd_map_message_max 100 --osd_map_cache_size 500'
>> * Unset the noscrub and nodeep-scrub flags:
>>  - ceph osd unset noscrub
>>  - ceph osd unset nodeep-scrub
>>
>>
>> ## For help identifying why memory usage was so high, please provide:
>> * ceph osd dump | grep pool
>> * ceph osd crush rule dump
>>
>> Let us know if this helps... I know it looks extreme, but it's worked for
>> me in the past..
>>
>>
>> Michael J. Kidd
>> Sr. Storage Consultant
>> Inktank Professional Services
>>  - by Red Hat
>>
>> On Wed, Oct 29, 2014 at 8:51 AM, Lukáš Kubín 
>> wrote:
>>
>>> Hello,
>>> I've found my ceph v 0.80.3 cluster in a state with 5 of 34 OSDs being
>>> down through night after months of running without change. From Linux logs
>>> I found out the OSD processes were killed because they consumed all
>>> available memory.
>>>
>>> Those 5 failed OSDs were from different hosts of my 4-node cluster (see
>>> below). Two hosts act as SSD cache tier in some of my pools. The other two
>>> hosts are the default rotational drives storage.
>>>
>>> After checking the Linux was not out of memory I've attempted to restart
>>> those failed OSDs. Most of those OSD daemon exhaust all memory in seconds
>>> and got killed by Linux again:
>>>
>>> Oct 28 22:16:34 q07 kernel: Out of memory: Kill process 24207 (ceph-osd)
>>> score 867 or sacrifice child
>>> Oct 28 22:16:34 q07 kernel: Killed process 24207, UID 0, (ceph-osd)
>>> total-vm:59974412kB, anon-rss:59076880kB, file-rss:512kB
>>>
>>>
>>> On the host I've found lots of similar "slow request" messages preceding
>>> the crash:
>>>
>>> 2014-10-28 22:11:20.885527 7f25f84d1700  0 log [WRN] : slow request
>>> 31.117125 seconds old, received

[ceph-users] Micro Ceph and OpenStack Design Summit November 3rd, 2014 11:40am

2014-10-29 Thread Loic Dachary
Hi Ceph,

TL;DR: Register for the Micro Ceph and OpenStack Design Summit November 3rd, 
2014 11:40am 
http://kilodesignsummit.sched.org/event/f2e49f4547a757cc3d51f5641b2000cb

November 3rd, 2014 11:40am, during the OpenStack summit in Paris[1], the 
present and the future of Ceph and OpenStack will be discussed with Ceph and 
OpenStack developers. The following agenda was prepared based on feedback from 
the participants[3]. It is however more of a guideline than a fixed plan and 
everyone will be invited to participate. 

* Ceph and OpenStack state & roadmap (30min)
  Josh Durgin and Sébastien Han

  * Glance
  * Nova
  * Cinder
  * Kilo Roadmap

* Ceph Performance tuning (30min)
  Josh Durgin

  Improved performances has become an increasingly important topic for Ceph and 
it is discussed weekly with the community[4]. A short summary of each ongoing 
discussion topic will be presented. The latest action items decided during the 
Ceph Developer Summit[5] that happens this week will also be discussed.

  * Async Messenger
  * XIO Messenger
  * RBD IOPS Performance
  * OSD Performance
  * Cache Tiering Performance
  * Monitor Performances
  * Sequential/Random Disk Throughput
  * LTTNG tracepoints
  * Erasure Code LRC, ISA and NEON

* OpenStack and Ceph Deployment automation (30min)
  Sébastien Han and Loïc Dachary

  Ceph was improved to make it easier to write modules/recipes for all kinds of 
deployment tools. They are at various stages of maturity and it is often quite 
difficult to figure out which one is right in a given context. We will give a 
short tour of the current state of each tool, the development effort behind 
them and what we can expect in the near future.

  * ansible
  * ceph-deploy
  * juju
  * puppet
  * chef

Cheers

[1] OpenStack Summit Paris 
https://www.openstack.org/summit/openstack-paris-summit-2014/
[2] Ceph for OpenStack Design Session 
http://kilodesignsummit.sched.org/event/f2e49f4547a757cc3d51f5641b2000cb
[3] Pad preparation https://etherpad.openstack.org/p/kilo-ceph
[4] Weekly Ceph Performance Meeting 
https://www.mail-archive.com/ceph-devel@vger.kernel.org/msg20744.html and 
http://pad.ceph.com/p/performance_weekly
[5] Ceph Developer Summit 
https://wiki.ceph.com/Planning/CDS/Hammer_%28Oct_2014%29

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] use ZFS for OSDs

2014-10-29 Thread Michal Kozanecki
Forgot to mention, when you create the ZFS/ZPOOL datasets, make sure to set the 
xattar setting to sa

e.g.
  
zpool create osd01 -O xattr=sa -O compression=lz4 sdb

OR if zpool/zfs dataset already created

zfs set xattr=sa osd01

Cheers



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Michal 
Kozanecki
Sent: October-29-14 11:33 AM
To: Kenneth Waegeman; ceph-users
Subject: Re: [ceph-users] use ZFS for OSDs

Hi Kenneth,

I run a small ceph test cluster using ZoL (ZFS on Linux) ontop of CentOS 7, so 
I'll try and answer any questions. :) 

Yes, ZFS writeparallel support is there, but NOT compiled in by default. You'll 
need to compile it with --with-zlib, but that by itself will fail to compile 
the ZFS support as I found out. You need to ensure you have ZoL installed and 
working, and then pass the location of libzfs to ceph at compile time. 
Personally I just set my environment variables before compiling like so;

ldconfig
export LIBZFS_LIBS="/usr/include/libzfs/"
export LIBZFS_CFLAGS="-I/usr/include/libzfs -I/usr/include/libspl"

However, the writeparallel performance isn't all that great. The writeparallel 
mode makes heavy use of ZFS's (and BtrFS's for that matter) snapshotting 
capability, and the snap performance on ZoL, at least when I last tested it, is 
pretty terrible. You lose any performance benefits you gain with writeparallel 
to the poor snap performance. 

If you decide that you don't need writeparallel mode you, can use the prebuilt 
packages (or compile with default options) without issue. Ceph (without zlib 
support compiled in) will detect ZFS as a generic/ext4 file system and work 
accordingly. 

As far as performance tweaking, ZIL, write journals and etc, I found that the 
performance difference between using a ZIL vs ceph write journal is about the 
same. I also found that doing both (ZIL AND writejournal) didn't give me much 
of a performance benefit. In my small test cluster I decided after testing to 
forego the ZIL and only use a SSD backed ceph write journal on each OSD, with 
each OSD being a single ZFS dataset/vdev(no zraid or mirroring). With Ceph 
handling the redundancy at the OSD level I saw no need for using ZFS mirroring 
or zraid, instead if ZFS detects corruption instead of self-healing it sends a 
read failure of the pg file to ceph, and then ceph's scrub mechanisms should 
then repair/replace the pg file using a good replica elsewhere on the cluster. 
ZFS + ceph are a beautiful bitrot fighting match!

Let me know if there's anything else I can answer. 

Cheers

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Kenneth Waegeman
Sent: October-29-14 6:09 AM
To: ceph-users
Subject: [ceph-users] use ZFS for OSDs

Hi,

We are looking to use ZFS for our OSD backend, but I have some questions.

My main question is: Does Ceph already supports the writeparallel mode for ZFS 
? (as described here:  
http://www.sebastien-han.fr/blog/2013/12/02/ceph-performance-interesting-things-going-on/)
I've found this, but I suppose it is outdated:  
https://wiki.ceph.com/Planning/Blueprints/Emperor/osd%3A_ceph_on_zfs

Should Ceph be build with ZFS support? I found a --with-zfslib option 
somewhere, but can someone verify this, or better has instructions for
it?:-)

What parameters should be tuned to use this?
I found these :
 filestore zfs_snap = 1
 journal_aio = 0
 journal_dio = 0

Are there other things we need for it?

Many thanks!!
Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD process exhausting server memory

2014-10-29 Thread Lukáš Kubín
I should have figured that out myself since I did that recently. Thanks.

Unfortunately, I'm still at the step "ceph osd unset noin". After setting
all the OSDs in, the original issue reapears preventing me to proceed with
recovery. It now appears mostly at single OSD - osd.10 which consumes ~200%
CPU and all memory within 45 seconds being killed by Linux then:

Oct 29 18:24:38 q09 kernel: Out of memory: Kill process 17202 (ceph-osd)
score 912 or sacrifice child
Oct 29 18:24:38 q09 kernel: Killed process 17202, UID 0, (ceph-osd)
total-vm:62713176kB, anon-rss:62009772kB, file-rss:328kB


I've tried to restart it several times with same result. Similar situation
with OSDs 0 and 13.

Also, I've noticed one of SSD cache tier's OSD - osd.29 generating high CPU
utilization around 180%.

All the problematic OSD's have been the same ones all the time -  OSD
0,8,10,13 and 29 - they are those which I've found to be down this morning.

There is some minor load coming from client - Openstack instances, I
preferred not to kill them:

[root@q04 ceph-recovery]# ceph -s
cluster ec433b4a-9dc0-4d08-bde4-f1657b1fdb99
 health HEALTH_ERR 31 pgs backfill; 241 pgs degraded; 62 pgs down; 193
pgs incomplete; 13 pgs inconsistent; 62 pgs peering; 12 pgs recovering; 205
pgs recovery_wait; 93 pgs stuck inactive; 608 pgs stuck unclean; 381138
requests are blocked > 32 sec; recovery 1162468/35207488 objects degraded
(3.302%); 466/17112963 unfound (0.003%); 13 scrub errors; 1/34 in osds are
down; nobackfill,norecover,noscrub,nodeep-scrub flag(s) set
 monmap e2: 3 mons at {q03=
10.255.253.33:6789/0,q04=10.255.253.34:6789/0,q05=10.255.253.35:6789/0},
election epoch 92, quorum 0,1,2 q03,q04,q05
 osdmap e2782: 34 osds: 33 up, 34 in
flags nobackfill,norecover,noscrub,nodeep-scrub
  pgmap v7440374: 5632 pgs, 7 pools, 1449 GB data, 16711 kobjects
3148 GB used, 15010 GB / 18158 GB avail
1162468/35207488 objects degraded (3.302%); 466/17112963
unfound (0.003%)
  13 active
  22 active+recovery_wait+remapped
   1 active+recovery_wait+inconsistent
4794 active+clean
 193 incomplete
  62 down+peering
   9 active+degraded+remapped+wait_backfill
 182 active+recovery_wait
  74 active+remapped
  12 active+recovering
  12 active+clean+inconsistent
  22 active+remapped+wait_backfill
   4 active+clean+replay
 232 active+degraded
  client io 0 B/s rd, 1048 kB/s wr, 184 op/s


Below I'm sending the requested output.

Do you have any other ideas how to recover from this?

Thanks a lot.

Lukas




[root@q04 ceph-recovery]# ceph osd crush rule dump
[
{ "rule_id": 0,
  "rule_name": "replicated_ruleset",
  "ruleset": 0,
  "type": 1,
  "min_size": 1,
  "max_size": 10,
  "steps": [
{ "op": "take",
  "item": -1,
  "item_name": "default"},
{ "op": "chooseleaf_firstn",
  "num": 0,
  "type": "host"},
{ "op": "emit"}]},
{ "rule_id": 1,
  "rule_name": "ssd",
  "ruleset": 1,
  "type": 1,
  "min_size": 1,
  "max_size": 10,
  "steps": [
{ "op": "take",
  "item": -5,
  "item_name": "ssd"},
{ "op": "chooseleaf_firstn",
  "num": 0,
  "type": "host"},
{ "op": "emit"}]},
{ "rule_id": 2,
  "rule_name": "sata",
  "ruleset": 2,
  "type": 1,
  "min_size": 1,
  "max_size": 10,
  "steps": [
{ "op": "take",
  "item": -4,
  "item_name": "sata"},
{ "op": "chooseleaf_firstn",
  "num": 0,
  "type": "host"},
{ "op": "emit"}]}]

[root@q04 ceph-recovery]# ceph osd dump | grep pool
pool 0 'data' replicated size 2 min_size 1 crush_ruleset 2 object_hash
rjenkins pg_num 512 pgp_num 512 last_change 630 flags hashpspool
crash_replay_interval 45 stripe_width 0
pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 2 object_hash
rjenkins pg_num 512 pgp_num 512 last_change 632 flags hashpspool
stripe_width 0
pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 2 object_hash
rjenkins pg_num 512 pgp_num 512 last_change 634 flags hashpspool
stripe_width 0
pool 7 'volumes' replicated size 2 min_size 2 crush_ruleset 0 object_hash
rjenkins pg_num 1024 pgp_num 1024 last_change 1517 flags hashpspool tiers
14 read_tier 14 write_tier 14 stripe_width 0
pool 8 'images' replicated size 2 min_size 2 crush_ruleset 0 object_hash
rjenkins pg_num 1024 pgp_num 1024 last_change 1519 flags hashpspool
stripe_width 0
pool 12 'backups' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 1024 pgp_num 1024 last_change 862 flags hashpspool
stripe_width 0
pool 14 'volumes-cache' replicated size 2 

Re: [ceph-users] use ZFS for OSDs

2014-10-29 Thread Stijn De Weirdt


hi michal,

thanks for the info. we will certainly try it and see if we come to the 
same conclusions ;)


one small detail: since you were using centos7, i'm assuming you were 
using ZoL 0.6.3?


stijn

On 10/29/2014 08:03 PM, Michal Kozanecki wrote:

Forgot to mention, when you create the ZFS/ZPOOL datasets, make sure to set the 
xattar setting to sa

e.g.

zpool create osd01 -O xattr=sa -O compression=lz4 sdb

OR if zpool/zfs dataset already created

zfs set xattr=sa osd01

Cheers



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Michal 
Kozanecki
Sent: October-29-14 11:33 AM
To: Kenneth Waegeman; ceph-users
Subject: Re: [ceph-users] use ZFS for OSDs

Hi Kenneth,

I run a small ceph test cluster using ZoL (ZFS on Linux) ontop of CentOS 7, so 
I'll try and answer any questions. :)

Yes, ZFS writeparallel support is there, but NOT compiled in by default. You'll 
need to compile it with --with-zlib, but that by itself will fail to compile 
the ZFS support as I found out. You need to ensure you have ZoL installed and 
working, and then pass the location of libzfs to ceph at compile time. 
Personally I just set my environment variables before compiling like so;

ldconfig
export LIBZFS_LIBS="/usr/include/libzfs/"
export LIBZFS_CFLAGS="-I/usr/include/libzfs -I/usr/include/libspl"

However, the writeparallel performance isn't all that great. The writeparallel 
mode makes heavy use of ZFS's (and BtrFS's for that matter) snapshotting 
capability, and the snap performance on ZoL, at least when I last tested it, is 
pretty terrible. You lose any performance benefits you gain with writeparallel 
to the poor snap performance.

If you decide that you don't need writeparallel mode you, can use the prebuilt 
packages (or compile with default options) without issue. Ceph (without zlib 
support compiled in) will detect ZFS as a generic/ext4 file system and work 
accordingly.

As far as performance tweaking, ZIL, write journals and etc, I found that the 
performance difference between using a ZIL vs ceph write journal is about the 
same. I also found that doing both (ZIL AND writejournal) didn't give me much 
of a performance benefit. In my small test cluster I decided after testing to 
forego the ZIL and only use a SSD backed ceph write journal on each OSD, with 
each OSD being a single ZFS dataset/vdev(no zraid or mirroring). With Ceph 
handling the redundancy at the OSD level I saw no need for using ZFS mirroring 
or zraid, instead if ZFS detects corruption instead of self-healing it sends a 
read failure of the pg file to ceph, and then ceph's scrub mechanisms should 
then repair/replace the pg file using a good replica elsewhere on the cluster. 
ZFS + ceph are a beautiful bitrot fighting match!

Let me know if there's anything else I can answer.

Cheers

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Kenneth Waegeman
Sent: October-29-14 6:09 AM
To: ceph-users
Subject: [ceph-users] use ZFS for OSDs

Hi,

We are looking to use ZFS for our OSD backend, but I have some questions.

My main question is: Does Ceph already supports the writeparallel mode for ZFS 
? (as described here:
http://www.sebastien-han.fr/blog/2013/12/02/ceph-performance-interesting-things-going-on/)
I've found this, but I suppose it is outdated:
https://wiki.ceph.com/Planning/Blueprints/Emperor/osd%3A_ceph_on_zfs

Should Ceph be build with ZFS support? I found a --with-zfslib option 
somewhere, but can someone verify this, or better has instructions for
it?:-)

What parameters should be tuned to use this?
I found these :
  filestore zfs_snap = 1
  journal_aio = 0
  journal_dio = 0

Are there other things we need for it?

Many thanks!!
Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] use ZFS for OSDs

2014-10-29 Thread Michal Kozanecki
Hi Stijn,

Yes, on my cluster I am running; CentOS 7, ZoL 0.6.3, Ceph 80.5.

Cheers


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Stijn 
De Weirdt
Sent: October-29-14 3:49 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] use ZFS for OSDs


hi michal,

thanks for the info. we will certainly try it and see if we come to the same 
conclusions ;)

one small detail: since you were using centos7, i'm assuming you were using ZoL 
0.6.3?

stijn

On 10/29/2014 08:03 PM, Michal Kozanecki wrote:
> Forgot to mention, when you create the ZFS/ZPOOL datasets, make sure 
> to set the xattar setting to sa
>
> e.g.
>
> zpool create osd01 -O xattr=sa -O compression=lz4 sdb
>
> OR if zpool/zfs dataset already created
>
> zfs set xattr=sa osd01
>
> Cheers
>
>
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
> Of Michal Kozanecki
> Sent: October-29-14 11:33 AM
> To: Kenneth Waegeman; ceph-users
> Subject: Re: [ceph-users] use ZFS for OSDs
>
> Hi Kenneth,
>
> I run a small ceph test cluster using ZoL (ZFS on Linux) ontop of 
> CentOS 7, so I'll try and answer any questions. :)
>
> Yes, ZFS writeparallel support is there, but NOT compiled in by 
> default. You'll need to compile it with --with-zlib, but that by 
> itself will fail to compile the ZFS support as I found out. You need 
> to ensure you have ZoL installed and working, and then pass the 
> location of libzfs to ceph at compile time. Personally I just set my 
> environment variables before compiling like so;
>
> ldconfig
> export LIBZFS_LIBS="/usr/include/libzfs/"
> export LIBZFS_CFLAGS="-I/usr/include/libzfs -I/usr/include/libspl"
>
> However, the writeparallel performance isn't all that great. The 
> writeparallel mode makes heavy use of ZFS's (and BtrFS's for that matter) 
> snapshotting capability, and the snap performance on ZoL, at least when I 
> last tested it, is pretty terrible. You lose any performance benefits you 
> gain with writeparallel to the poor snap performance.
>
> If you decide that you don't need writeparallel mode you, can use the 
> prebuilt packages (or compile with default options) without issue. Ceph 
> (without zlib support compiled in) will detect ZFS as a generic/ext4 file 
> system and work accordingly.
>
> As far as performance tweaking, ZIL, write journals and etc, I found that the 
> performance difference between using a ZIL vs ceph write journal is about the 
> same. I also found that doing both (ZIL AND writejournal) didn't give me much 
> of a performance benefit. In my small test cluster I decided after testing to 
> forego the ZIL and only use a SSD backed ceph write journal on each OSD, with 
> each OSD being a single ZFS dataset/vdev(no zraid or mirroring). With Ceph 
> handling the redundancy at the OSD level I saw no need for using ZFS 
> mirroring or zraid, instead if ZFS detects corruption instead of self-healing 
> it sends a read failure of the pg file to ceph, and then ceph's scrub 
> mechanisms should then repair/replace the pg file using a good replica 
> elsewhere on the cluster. ZFS + ceph are a beautiful bitrot fighting match!
>
> Let me know if there's anything else I can answer.
>
> Cheers
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
> Of Kenneth Waegeman
> Sent: October-29-14 6:09 AM
> To: ceph-users
> Subject: [ceph-users] use ZFS for OSDs
>
> Hi,
>
> We are looking to use ZFS for our OSD backend, but I have some questions.
>
> My main question is: Does Ceph already supports the writeparallel mode for 
> ZFS ? (as described here:
> http://www.sebastien-han.fr/blog/2013/12/02/ceph-performance-interesti
> ng-things-going-on/) I've found this, but I suppose it is outdated:
> https://wiki.ceph.com/Planning/Blueprints/Emperor/osd%3A_ceph_on_zfs
>
> Should Ceph be build with ZFS support? I found a --with-zfslib option 
> somewhere, but can someone verify this, or better has instructions for
> it?:-)
>
> What parameters should be tuned to use this?
> I found these :
>   filestore zfs_snap = 1
>   journal_aio = 0
>   journal_dio = 0
>
> Are there other things we need for it?
>
> Many thanks!!
> Kenneth
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http:

[ceph-users] how to check real rados read speed

2014-10-29 Thread VELARTIS Philipp Dürhammer
Hi,

with ceph -w i can see ceph writes reads and io.
But the reads seem to be only reads wich are not served from osd or monitor 
cache.
As we have 128gb with every ceph server our monitors and osds are set to use a 
lot of ram.
Monitoring only very view times show some ceph reads... but a lot more writes 
(but it should be more reads)
Even when i do a  benchmark inside a virtual machine with 2gb i will se 2gb of 
writes but no reads...

Is there any way to monitor real reads from rados and not only osd reads?
Btw where can i check the reads oft the scrub process?

Thank you
philipp

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] journal on entire ssd device

2014-10-29 Thread Cristian Falcas
Hello,

Will there be any benefit in making the journal the size of an entire ssd disk?

I was also thinking on increasing "journal max write entries" and
"journal queue max ops".

But will it matter, or it will have the same effect as a 4gb journal
on the same ssd?

Thank you,
Cristian Falcas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] journal on entire ssd device

2014-10-29 Thread Craig Lewis
Journals that are too small can cause performance problems; it
basically takes away the SSD journal speedup, and forces all writes to
go at the speed of the HDD.

Once you make the journal big enough to prevent that, there is no
benefit to making it larger.

There might be a slight performance penalty to making it too large.
SSD wear leveling can take advantage of an under-provisioning.  It
should make things slightly faster when the SSD needs to execute wear
leveling operations (ie, not often).  Under provisioning also extends
the life of the SSD.


Putting numbers to those statements... I really couldn't say.



On Wed, Oct 29, 2014 at 2:01 PM, Cristian Falcas
 wrote:
> Hello,
>
> Will there be any benefit in making the journal the size of an entire ssd 
> disk?
>
> I was also thinking on increasing "journal max write entries" and
> "journal queue max ops".
>
> But will it matter, or it will have the same effect as a 4gb journal
> on the same ssd?
>
> Thank you,
> Cristian Falcas
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v0.87 Giant released

2014-10-29 Thread Sage Weil
This release will form the basis for the Giant stable series,
v0.87.x.  Highlights for Giant include:

* *RADOS Performance*: a range of improvements have been made in the
  OSD and client-side librados code that improve the throughput on
  flash backends and improve parallelism and scaling on fast machines.
* *CephFS*: we have fixed a raft of bugs in CephFS and built some
  basic journal recovery and diagnostic tools.  Stability and
  performance of single-MDS systems is vastly improved in Giant.
  Although we do not yet recommend CephFS for production deployments,
  we do encourage testing for non-critical workloads so that we can
  better guage the feature, usability, performance, and stability
  gaps.
* *Local Recovery Codes*: the OSDs now support an erasure-coding scheme
  that stores some additional data blocks to reduce the IO required to
  recover from single OSD failures.
* *Degraded vs misplaced*: the Ceph health reports from 'ceph -s' and
  related commands now make a distinction between data that is
  degraded (there are fewer than the desired number of copies) and
  data that is misplaced (stored in the wrong location in the
  cluster).  The distinction is important because the latter does not
  compromise data safety.
* *Tiering improvements*: we have made several improvements to the
  cache tiering implementation that improve performance.  Most
  notably, objects are not promoted into the cache tier by a single
  read; they must be found to be sufficiently hot before that happens.
* *Monitor performance*: the monitors now perform writes to the local
  data store asynchronously, improving overall responsiveness.
* *Recovery tools*: the ceph_objectstore_tool is greatly expanded to
  allow manipulation of an individual OSDs data store for debugging
  and repair purposes.  This is most heavily used by our QA
  infrastructure to exercise recovery code.

Upgrade Sequencing
--

* If your existing cluster is running a version older than v0.80.x
  Firefly, please first upgrade to the latest Firefly release before
  moving on to Giant.  We have not tested upgrades directly from
  Emperor, Dumpling, or older releases.

  We *have* tested:

   * Firefly to Giant
   * Dumpling to Firefly to Giant

* Please upgrade daemons in the following order:

   #. Monitors
   #. OSDs
   #. MDSs and/or radosgw

  Note that the relative ordering of OSDs and monitors should not matter, but
  we primarily tested upgrading monitors first.

Upgrading from v0.80.x Firefly
--

* The client-side caching for librbd is now enabled by default (rbd
  cache = true).  A safety option (rbd cache writethrough until flush
  = true) is also enabled so that writeback caching is not used until
  the library observes a 'flush' command, indicating that the librbd
  users is passing that operation through from the guest VM.  This
  avoids potential data loss when used with older versions of qemu
  that do not support flush.

leveldb_write_buffer_size = 32*1024*1024  = 33554432  // 32MB
leveldb_cache_size= 512*1024*1204 = 536870912 // 512MB
leveldb_block_size= 64*1024   = 65536 // 64KB
leveldb_compression   = false
leveldb_log   = ""

  OSDs will still maintain the following osd-specific defaults:

leveldb_log   = ""

* The 'rados getxattr ...' command used to add a gratuitous newline to the attr
  value; it now does not.

* The ``*_kb perf`` counters on the monitor have been removed.  These are
  replaced with a new set of ``*_bytes`` counters (e.g., ``cluster_osd_kb`` is
  replaced by ``cluster_osd_bytes``).

* The ``rd_kb`` and ``wr_kb`` fields in the JSON dumps for pool stats (accessed
  via the ``ceph df detail -f json-pretty`` and related commands) have been
  replaced with corresponding ``*_bytes`` fields.  Similarly, the
  ``total_space``, ``total_used``, and ``total_avail`` fields are replaced with
  ``total_bytes``, ``total_used_bytes``,  and ``total_avail_bytes`` fields.

* The ``rados df --format=json`` output ``read_bytes`` and ``write_bytes``
  fields were incorrectly reporting ops; this is now fixed.

* The ``rados df --format=json`` output previously included ``read_kb`` and
  ``write_kb`` fields; these have been removed.  Please use ``read_bytes`` and
  ``write_bytes`` instead (and divide by 1024 if appropriate).

* The experimental keyvaluestore-dev OSD backend had an on-disk format
  change that prevents existing OSD data from being upgraded.  This
  affects developers and testers only.

* mon-specific and osd-specific leveldb options have been removed.
  From this point onward users should use the `leveldb_*` generic
  options and add the options in the appropriate sections of their
  configuration files.  Monitors will still maintain the following
  monitor-specific defaults:

leveldb_write_buffer_size = 32*1024*1024  = 33554432  // 32MB
leveldb_cache_size= 512*1024*1204 = 536870912 // 

[ceph-users] Rbd cache severely inhibiting read performance (Giant)

2014-10-29 Thread Mark Kirkwood

I am doing some testing on our new ceph cluster:

- 3 ceph nodes (8 cpu 128G, Ubuntu 12.04 + 3.13 kernel)
- 8 osd on each (i.e 24 in total)
- 4 compute nodes (ceph clients)
- 10G networking
- ceph 0.86 (97dcc0539dfa7dac3de74852305d51580b7b1f82)

I'm using one of the compute nodes to run some fio tests from, and 
initial runs had us scratching our heads:


- 4M write 700 MB/s (175 IOPS)
- 4M read 320 MB/s (81 IOPS)

Why was reading so much slower than writing? The [client] section of the 
compute nodes ceph.conf looks like:


rbd_cache = true
rbd_cache_size = 67108864
rbd_cache_max_dirty = 33554432
rbd_cache_writethrough_until_flush = true

After quite a lot of mucking about with different cache sizes (upto 2G), 
we thought to switch the rbd cache off:


rbd_cache = false

...and rerun the read test:

- 4M read 1450 MB/s (362 IOPS)

That is more like it! So I am confused about what is happening here. I 
put 'Giant' in the subject, but I'm wondering if our current (Emperor) 
cluster is also suffering from this too (are seeing what we think is 
poor read performance).


I redid the read test, running fio from one of the ceph nodes - to try 
to mitigate network latency to some extent - same result. I'm *thinking* 
that rbd cache management is not keeping up with the read stream.


FWIW I can't reproduce this with master (0.86-672-g5c051f5 
5c051f5c0c640ddc9b27b7cab3860a899dc185cb) on a dev setup, but the 
topology is very different (4 osds on 4 vms and all of 'em are actually 
on one host with no real network), so may not be a relevant data point.


Regards

Mark
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] When will Ceph 0.72.3?

2014-10-29 Thread Craig Lewis
Probably never.

I'm having trouble finding documentation, but my understanding is that
dumpling and firefly are the only supported releases.  I believe
emperor became unsupported when firefly came out.  Similarly, giant
will be supported until hammer comes out.  Once hammer comes out,
dumpling and giant become unsupported.


I just upgraded to firefly, specifically for the scrubbing and
backfilling sleeps.


On Wed, Oct 29, 2014 at 12:01 AM, Irek Fasikhov  wrote:
> Dear developers.
>
> Very much want io priorities ;)
> During the execution of Snap roollback appear slow queries.
>
> Thanks
> --
> С уважением, Фасихов Ирек Нургаязович
> Моб.: +79229045757
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rbd cache severely inhibiting read performance (Giant)

2014-10-29 Thread Mark Kirkwood

On 30/10/14 11:16, Mark Kirkwood wrote:

I am doing some testing on our new ceph cluster:

- 3 ceph nodes (8 cpu 128G, Ubuntu 12.04 + 3.13 kernel)
- 8 osd on each (i.e 24 in total)
- 4 compute nodes (ceph clients)
- 10G networking
- ceph 0.86 (97dcc0539dfa7dac3de74852305d51580b7b1f82)

I'm using one of the compute nodes to run some fio tests from, and
initial runs had us scratching our heads:

- 4M write 700 MB/s (175 IOPS)
- 4M read 320 MB/s (81 IOPS)

Why was reading so much slower than writing? The [client] section of the
compute nodes ceph.conf looks like:

rbd_cache = true
rbd_cache_size = 67108864
rbd_cache_max_dirty = 33554432
rbd_cache_writethrough_until_flush = true

After quite a lot of mucking about with different cache sizes (upto 2G),
we thought to switch the rbd cache off:

rbd_cache = false

...and rerun the read test:

- 4M read 1450 MB/s (362 IOPS)

That is more like it! So I am confused about what is happening here. I
put 'Giant' in the subject, but I'm wondering if our current (Emperor)
cluster is also suffering from this too (are seeing what we think is
poor read performance).

I redid the read test, running fio from one of the ceph nodes - to try
to mitigate network latency to some extent - same result. I'm *thinking*
that rbd cache management is not keeping up with the read stream.

FWIW I can't reproduce this with master (0.86-672-g5c051f5
5c051f5c0c640ddc9b27b7cab3860a899dc185cb) on a dev setup, but the
topology is very different (4 osds on 4 vms and all of 'em are actually
on one host with no real network), so may not be a relevant data point.



Now that 0.87 is out, upgrading and retesting with cache back on:

- 4M read 1250 MB/s (300 IOPS)

So looking like this is pretty much resolved in this release (could 
probably get closer to 1450 MB/s with cache bigger than 64MB).


Good timing guys!

Mark

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.87 Giant released

2014-10-29 Thread Nigel Williams

On 30/10/2014 8:56 AM, Sage Weil wrote:

* *Degraded vs misplaced*: the Ceph health reports from 'ceph -s' and
   related commands now make a distinction between data that is
   degraded (there are fewer than the desired number of copies) and
   data that is misplaced (stored in the wrong location in the
   cluster).


Is someone able to briefly described how/why misplaced happens please, is it repaired 
eventually? I've not seen misplaced (yet).




 leveldb_write_buffer_size = 32*1024*1024  = 33554432  // 32MB
 leveldb_cache_size= 512*1024*1204 = 536870912 // 512MB


I noticed the typo, wondered about the code, but I'm not seeing the same values 
anyway?

https://github.com/ceph/ceph/blob/giant/src/common/config_opts.h

OPTION(leveldb_write_buffer_size, OPT_U64, 8 *1024*1024) // leveldb write 
buffer size
OPTION(leveldb_cache_size, OPT_U64, 128 *1024*1024) // leveldb cache size





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding a monitor to

2014-10-29 Thread Gregory Farnum
[Re-adding the list, so this is archived for future posterity.]

On Wed, Oct 29, 2014 at 6:11 AM, Patrick Darley
 wrote:
>
> Thanks again for the reply Greg!
>
> On 2014-10-28 17:39, Gregory Farnum wrote:
>>
>> I'm sorry, you're right — I misread it. :(
>
>
> No worries, I had included some misleading words like generate in my rough
> description where retrive would have been more appropriate. Sorry!
>
>> But indeed step 6 is the crucial one, which tells the existing
>> monitors to accept the new one into the cluster. You'll need to run it
>> with an admin client keyring that can connect to the existing cluster;
>> that's probably the part that has gone wrong. You don't need to run it
>> from the new monitor,
>
>
> I think, in order to carry out the 5th step you also need the client.admin
> keyring present, that'd be "preparing the monitors data directory". I had
> scp-ed it across to the monitor along with the ceph.conf file and pu them in
> the expected location, /etc/ceph/, prior to running that command.
>
>> so if you're having trouble getting the keys to
>> behave I'd just run it from an existing system. :)
>
>
> I tried running this command, step 6, from the admin node of my ubuntu ceph
> cluster.
> As I had experienced before, the command hung. Then trying to run any ceph
> commands on the
> rest of the cluster I get a long hang then the following error:
>
> cc@ucc01:~$ ceph -s
> 2014-10-29 10:40:33.748334 7ffaec051700  0 monclient(hunting):
> authenticate timed out after 300
> 2014-10-29 10:40:33.748499 7ffaec051700  0 librados: client.admin
> authentication error (110) Connection timed out
> Error connecting to cluster: TimedOut
>
>
> The monitor that I was trying to add can be started ok after this (once I
> have touched the done and sysvinit files) but also gives the
> above error when attempting to run the ceph -s. Checking the log file I see
> the following lines repeated:
>
>
> 2014-10-29 10:01:01.721905 7ffd548ac700  0 mon.bcc07@-1(probing) e0
> handle_probe ignoring fsid 5021163c-3c0b-4ec5-83fe-f0622c0e9447 !=
> f2d609ef-2065-4862-a821-55c484d61dca
> 2014-10-29 10:01:01.809991 7ffd550ad700  1
> mon.bcc07@-1(probing).paxos(paxos recovering c 0..0) is_readable
> now=2014-10-29 10:01:01.809996 lease_expire=0.00 has v0 lc 0
> 2014-10-29 10:01:03.721559 7ffd548ac700  0 mon.bcc07@-1(probing) e0
> handle_probe ignoring fsid 5021163c-3c0b-4ec5-83fe-f0622c0e9447 !=
> f2d609ef-2065-4862-a821-55c484d61dca
> 2014-10-29 10:01:03.810466 7ffd550ad700  1
> mon.bcc07@-1(probing).paxos(paxos recovering c 0..0) is_readable
> now=2014-10-29 10:01:03.810467 lease_expire=0.00 has v0 lc 0
>
>
> The initial monitor has the following log at around a similar time:
>
>
> 2014-10-29 10:01:02.169655 7f52e7408700  0 mon.ucc01@1(probing) e2
> handle_probe ignoring fsid f2d609ef-2065-4862-a821-55c484d61dca !=
> 5021163c-3c0b-4ec5-83fe-f0622c0e9447
> 2014-10-29 10:01:04.170153 7f52e7408700  0 mon.ucc01@1(probing) e2
> handle_probe ignoring fsid f2d609ef-2065-4862-a821-55c484d61dca !=
> 5021163c-3c0b-4ec5-83fe-f0622c0e9447
> 2014-10-29 10:01:06.169300 7f52e7408700  0 mon.ucc01@1(probing) e2
> handle_probe ignoring fsid f2d609ef-2065-4862-a821-55c484d61dca !=
> 5021163c-3c0b-4ec5-83fe-f0622c0e9447
>
>
> It looks to me like there might be conflicting fsid values being compared
> somewhere, but checking the ceph.conf files on the
> nodes I found them to be declared as the same. The log files recorded a
> similar output on both monitors for some time.
>
> I then turned off the monitor I was attempting to add at approximately
> 12:39:30 and the log file of the initial
> monitor has the following output around this time:
>
>
> 2014-10-29 12:39:30.304639 7f52e7408700  0 mon.ucc01@1(probing) e2
> handle_probe ignoring fsid f2d609ef-2065-4862-a821-55c484d61dca !=
> 5021163c-3c0b-4ec5-83fe-f0622c0e9447

Okay, that's indeed not right. I suspect this is your issue but I'm
not entirely certain because your other symptoms are a bit weird. I
bet Joao can help though; he maintains the monitor and deals with
these issues a lot more often than I do. :)
-Greg

> 2014-10-29 12:39:32.023964 7f52e7c09700  0
> mon.ucc01@1(probing).data_health(1) update_stats avail 68% total 14318640
> used 3748076 avail 9820180
> 2014-10-29 12:39:32.303740 7f52e7408700  0 mon.ucc01@1(probing) e2
> handle_probe ignoring fsid f2d609ef-2065-4862-a821-55c484d61dca !=
> 5021163c-3c0b-4ec5-83fe-f0622c0e9447
> 2014-10-29 12:39:32.394606 7f52e53fd700  0 -- 192.168.122.95:6789/0 >>
> 192.168.122.42:6789/0 pipe(0x55e5180 sd=24 :6789 s=2 pgs=1 cs=1 l=0
> c=0x39bfde0).fault with nothing to send, going to standby
> 2014-10-29 12:39:33.862400 7f52e5902700  0 -- 192.168.122.95:6789/0 >>
> 192.168.122.42:6789/0 pipe(0x55e5180 sd=13 :6789 s=1 pgs=1 cs=2 l=0
> c=0x39bfde0).fault
> 2014-10-29 12:40:32.024807 7f52e7c09700  0
> mon.ucc01@1(probing).data_health(1) update_stats avail 68% total 14318640
> used 3

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-10-29 Thread Gregory Farnum
On Wed, Oct 29, 2014 at 7:51 AM, Jasper Siero
 wrote:
> Hello Greg,
>
> I added the debug options which you mentioned and started the process again:
>
> [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
> /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
> --reset-journal 0
> old journal was 9483323613~134233517
> new journal start will be 9621733376 (4176246 bytes past old end)
> writing journal head
> writing EResetJournal entry
> done
> [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 -c /etc/ceph/ceph.conf 
> --cluster ceph --undump-journal 0 journaldumptgho-mon001
> undump journaldumptgho-mon001
> start 9483323613 len 134213311
> writing header 200.
>  writing 9483323613~1048576
>  writing 9484372189~1048576
>  writing 9485420765~1048576
>  writing 9486469341~1048576
>  writing 9487517917~1048576
>  writing 9488566493~1048576
>  writing 9489615069~1048576
>  writing 9490663645~1048576
>  writing 9491712221~1048576
>  writing 9492760797~1048576
>  writing 9493809373~1048576
>  writing 9494857949~1048576
>  writing 9495906525~1048576
>  writing 9496955101~1048576
>  writing 9498003677~1048576
>  writing 9499052253~1048576
>  writing 9500100829~1048576
>  writing 9501149405~1048576
>  writing 9502197981~1048576
>  writing 9503246557~1048576
>  writing 9504295133~1048576
>  writing 9505343709~1048576
>  writing 9506392285~1048576
>  writing 9507440861~1048576
>  writing 9508489437~1048576
>  writing 9509538013~1048576
>  writing 9510586589~1048576
>  writing 9511635165~1048576
>  writing 9512683741~1048576
>  writing 9513732317~1048576
>  writing 9514780893~1048576
>  writing 9515829469~1048576
>  writing 9516878045~1048576
>  writing 9517926621~1048576
>  writing 9518975197~1048576
>  writing 9520023773~1048576
>  writing 9521072349~1048576
>  writing 9522120925~1048576
>  writing 9523169501~1048576
>  writing 9524218077~1048576
>  writing 9525266653~1048576
>  writing 9526315229~1048576
>  writing 9527363805~1048576
>  writing 9528412381~1048576
>  writing 9529460957~1048576
>  writing 9530509533~1048576
>  writing 9531558109~1048576
>  writing 9532606685~1048576
>  writing 9533655261~1048576
>  writing 9534703837~1048576
>  writing 9535752413~1048576
>  writing 9536800989~1048576
>  writing 9537849565~1048576
>  writing 9538898141~1048576
>  writing 9539946717~1048576
>  writing 9540995293~1048576
>  writing 9542043869~1048576
>  writing 9543092445~1048576
>  writing 9544141021~1048576
>  writing 9545189597~1048576
>  writing 9546238173~1048576
>  writing 9547286749~1048576
>  writing 9548335325~1048576
>  writing 9549383901~1048576
>  writing 9550432477~1048576
>  writing 9551481053~1048576
>  writing 9552529629~1048576
>  writing 9553578205~1048576
>  writing 9554626781~1048576
>  writing 9555675357~1048576
>  writing 9556723933~1048576
>  writing 9557772509~1048576
>  writing 9558821085~1048576
>  writing 9559869661~1048576
>  writing 9560918237~1048576
>  writing 9561966813~1048576
>  writing 9563015389~1048576
>  writing 9564063965~1048576
>  writing 9565112541~1048576
>  writing 9566161117~1048576
>  writing 9567209693~1048576
>  writing 9568258269~1048576
>  writing 9569306845~1048576
>  writing 9570355421~1048576
>  writing 9571403997~1048576
>  writing 9572452573~1048576
>  writing 9573501149~1048576
>  writing 9574549725~1048576
>  writing 9575598301~1048576
>  writing 9576646877~1048576
>  writing 9577695453~1048576
>  writing 9578744029~1048576
>  writing 9579792605~1048576
>  writing 9580841181~1048576
>  writing 9581889757~1048576
>  writing 9582938333~1048576
>  writing 9583986909~1048576
>  writing 9585035485~1048576
>  writing 9586084061~1048576
>  writing 9587132637~1048576
>  writing 9588181213~1048576
>  writing 9589229789~1048576
>  writing 9590278365~1048576
>  writing 9591326941~1048576
>  writing 9592375517~1048576
>  writing 9593424093~1048576
>  writing 9594472669~1048576
>  writing 9595521245~1048576
>  writing 9596569821~1048576
>  writing 9597618397~1048576
>  writing 9598666973~1048576
>  writing 9599715549~1048576
>  writing 9600764125~1048576
>  writing 9601812701~1048576
>  writing 9602861277~1048576
>  writing 9603909853~1048576
>  writing 9604958429~1048576
>  writing 9606007005~1048576
>  writing 9607055581~1048576
>  writing 9608104157~1048576
>  writing 9609152733~1048576
>  writing 9610201309~1048576
>  writing 9611249885~1048576
>  writing 9612298461~1048576
>  writing 9613347037~1048576
>  writing 9614395613~1048576
>  writing 9615444189~1048576
>  writing 9616492765~1044159
> done.
> [root@th1-mon001 ~]# service ceph start mds
> === mds.th1-mon001 ===
> Starting Ceph mds.th1-mon001 on th1-mon001...
> starting mds.th1-mon001 at :/0
>
>
> The new logs:
> http://pastebin.com/wqqjuEpy

These don't have the increased debugging levels set. :( I'm not sure
where you could have put them that they didn't get picked up, but make
sure it's in the ceph.conf that this mds daemon is referring to. (You
can see the debug levels 

Re: [ceph-users] v0.87 Giant released

2014-10-29 Thread Christian Balzer
On Thu, 30 Oct 2014 10:40:38 +1100 Nigel Williams wrote:

> On 30/10/2014 8:56 AM, Sage Weil wrote:
> > * *Degraded vs misplaced*: the Ceph health reports from 'ceph -s' and
> >related commands now make a distinction between data that is
> >degraded (there are fewer than the desired number of copies) and
> >data that is misplaced (stored in the wrong location in the
> >cluster).
> 
> Is someone able to briefly described how/why misplaced happens please,
> is it repaired eventually? I've not seen misplaced (yet).
> 
Any time there is a change in data placement, be it a change in the CRUSH
map like modifying the weight of an OSD or simply adding a new OSD.
Thus objects are (temporarily) not where they're supposed to be, but still
present in sufficient replication. 
A much more benign scenario than degraded and I hope that this doesn't
even generate a WARN in the "ceph -s" report.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.87 Giant released

2014-10-29 Thread Nigel Williams

On 30/10/2014 11:51 AM, Christian Balzer wrote:

Thus objects are (temporarily) not where they're supposed to be, but still
present in sufficient replication.


thanks for the reminder, I suppose that is obvious :-)


A much more benign scenario than degraded and I hope that this doesn't
even generate a WARN in the "ceph -s" report.


Better described as a transitory "hazardous" state, given that the PG distribution might 
not be optimal for a period of time and (inopportune) failures may tip the health into 
degraded.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] where to download 0.87 RPMS?

2014-10-29 Thread 廖建锋


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] journal on entire ssd device

2014-10-29 Thread Christian Balzer

Hello,

On Wed, 29 Oct 2014 23:01:55 +0200 Cristian Falcas wrote:

> Hello,
> 
> Will there be any benefit in making the journal the size of an entire
> ssd disk?
> 
Not really, Craig already pointed out a number of things.

To put numbers on things, I size my journals so they can at least hold 20
seconds worth of writes (reasoning below) at full speed. 
For example a OSD server with a single 10Gb/s link, 8 OSDs and 4 journal
SSDs (2 journals per SSD), each SSD capable of writing 250MB/s (or faster).
So 20 seconds worth through that 10Gb/s link make 20GB, divided by 8
results in a mere 2.5GB journal. 
Now I still deployed machines like the above with 10GB journals because
I had the space, but that's about it.

> I was also thinking on increasing "journal max write entries" and
> "journal queue max ops".
> 
> But will it matter, or it will have the same effect as a 4gb journal
> on the same ssd?
> 

Find the "How to detect journal problems" thread in the ML archives from
April this year for details.
In short the journal* options don't necessarily do what you think they do,
if you want to max out journal utilization you have to play with the
filestore min and max sync intervals. 
I have set the max to 10 seconds and since Ceph also starts flushing the
journal when it becomes half full there's the above goal of having 20
seconds worth of space.

That all said, I'd be very happy about some journal perf counters in Ceph
that show how effective and utilized it is.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw issues

2014-10-29 Thread yuelongguang
lists ceph, hi:
how do you solve this issue? i run into it  when i tryy to deploy 2 rgws on one 
ceph cluster in default region and default zone.
 
 
thanks








At 2014-07-01 09:06:24, "Brian Rak"  wrote:
>That sounds like you have some kind of odd situation going on.  We only 
>use radosgw with nginx/tengine so I can't comment on the apache part of it.
>
>My understanding is this:
>
>You start ceph-radosgw, this creates a fastcgi socket somewhere (verify 
>this is created with lsof, there are some permission problems that will 
>result in radosgw running, but not opening the socket).
>
>Apache is configured to connect to this socket, and forward any incoming 
>requests.  Apache should not be launching things.
>
>I did set Apache up once to test a bug, so I took a look at my config.  
>I do *not* have a s3gw.fcgi file on disk.  Have you tried removing 
>that?  I think that with FastCgiExternalServer, you don't need 
>s3gw.fcgi.  The other thing that got me was socket path in 
>FastCgiExternalServer is relative to whatever you have FastCgiIpcDir set 
>to (which the Ceph docs don't seem to take into account).
>
>
>On 6/30/2014 8:40 PM, lists+c...@deksai.com wrote:
>> On 2014-06-16 13:16, lists+c...@deksai.com wrote:
>>> I've just tried setting up the radosgw on centos6 according to
>>> http://ceph.com/docs/master/radosgw/config/
>>
>>> While I can run the admin commands just fine to create users etc.,
>>> making a simple wget request to the domain I set up returns a 500 due
>>> to a timeout.  Every request I make results in another radosgw process
>>> being created, which seems to start even more processes itself. I
>>> only have to make a few requests to have about 60 radosgw processes.
>>>
>>
>> Guess I'll try again.  I gave this another shot, following the 
>> documentation, and still end up with basically a fork bomb rather than 
>> the nice ListAllMyBucketsResult output that the docs say I should 
>> get.  Everything else about the cluster works fine, and I see others 
>> talking about the gateway as if it just worked, so I'm led to believe 
>> that I'm probably doing something stupid.  Has anybody else run into 
>> the situation where apache times out while fastcgi just launches more 
>> and more processes?
>>
>> The init script launches a process, and the webserver seems to launch 
>> the same thing, so I'm not clear on what should be happening here.  
>> Either way, I get nothing back when making a simple GET request to the 
>> domain.
>>
>> If anybody has suggestions, even if they are "You nincompoop! 
>> Everybody knows that you need to do such and such", that would be 
>> helpful.
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Delete pools with low priority?

2014-10-29 Thread Gregory Farnum
Dan (who wrote that slide deck) is probably your best bet here, but I
believe pool deletion is not very configurable and fairly expensive
right now. I suspect that it will get better in Hammer or Infernalis,
once we have a unified op work queue that we can independently
prioritize all IO through (this was a blueprint in CDS today!).
Similar problems with snap trimming and scrubbing were resolved by
introducing sleeps between ops, but that's a bit of a hack itself and
should be going away once proper IO prioritization is available.
-Greg

On Wed, Oct 29, 2014 at 8:19 AM, Daniel Schneller
 wrote:
> Bump :-)
>
> Any ideas on this? They would be much appreciated.
>
> Also: Sorry for a possible double post, client had forgotten its email
> config.
>
> On 2014-10-22 21:21:54 +, Daniel Schneller said:
>
>> We have been running several rounds of benchmarks through the Rados
>> Gateway. Each run creates several hundred thousand objects and similarly
>> many containers.
>>
>> The cluster consists of 4 machines, 12 OSD disks (spinning, 4TB) — 48
>> OSDs total.
>>
>> After running a set of benchmarks we renamed the pools used by the
>> gateway pools to get a clean baseline. In total we now have several
>> million objects and containers in 3 pools. Redundancy for all pools is
>> set to 3.
>>
>> Today we started deleting the benchmark data. Once the first renamed set
>> of RGW pools was executed, cluster performance started to go down the
>> drain. Using iotop we can see that the disks are all working furiously.
>> As the command to delete the pools came back very quickly, the
>> assumption is that we are now seeing the effects of the actual objects
>> being removed, causing lots and lots of IO activity on the disks,
>> negatively impacting regular operations.
>>
>> We are running OpenStack on top of Ceph, and we see drastic reduction in
>> responsiveness of these machines as well as in CephFS.
>>
>> Fortunately this is still a test setup, so no production systems are
>> affected. Nevertheless I would like to ask a few questions:
>>
>> 1) Is it possible to have the object deletion run in some low-prio mode?
>> 2) If not, is there another way to delete lots and lots of objects
>> without affecting the rest of the cluster so badly? 3) Can we somehow
>> determine the progress of the deletion so far? We would like to estimate
>> if this is going to take hours, days or weeks? 4) Even if not possible
>> for the already running deletion, could be get a progress for the
>> remaining pools we still want to delete? 5) Are there any parameters
>> that we might tune — even if just temporarily - to speed this up?
>>
>> Slide 18 of http://www.slideshare.net/Inktank_Ceph/scaling-ceph-at-cern
>> describes a very similar situation.
>>
>> Thanks, Daniel
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] where to download 0.87 RPMS?

2014-10-29 Thread Patrick McGarry
I have updated the http://ceph.com/get page to reflect a more generic
approach to linking.  It's also worth noting that the new
http://download.ceph.com/ infrastructure is available now.

To get to the rpms specifically you can either crawl the
download.ceph.com tree or use the symlink at
http://ceph.com/rpm-giant/

Hope that (and the updated linkage on ceph.com/get) helps.  Thanks!


Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph


On Wed, Oct 29, 2014 at 9:15 PM, 廖建锋  wrote:
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Crash with rados cppool and snapshots

2014-10-29 Thread Gregory Farnum
On Wed, Oct 29, 2014 at 7:49 AM, Daniel Schneller
 wrote:
> Hi!
>
> We are exploring options to regularly preserve (i.e. backup) the
> contents of the pools backing our rados gateways. For that we create
> nightly snapshots of all the relevant pools when there is no activity
> on the system to get consistent states.
>
> In order to restore the whole pools back to a specific snapshot state,
> we tried to use the rados cppool command (see below) to copy a snapshot
> state into a new pool. Unfortunately this causes a segfault. Are we
> doing anything wrong?
>
> This command:
>
> rados cppool --snap snap-1 deleteme.lp deleteme.lp2 2> segfault.txt
>
> Produces this output:
>
> *** Caught signal (Segmentation fault) ** in thread 7f8f49a927c0 ceph
> version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: rados()
> [0x43eedf] 2: (()+0x10340) [0x7f8f48738340] 3:
> (librados::IoCtxImpl::snap_lookup(char const*, unsigned long*)+0x17)
> [0x7f8f48aff127] 4: (main()+0x1385) [0x411e75] 5:
> (__libc_start_main()+0xf5) [0x7f8f4795fec5] 6: rados() [0x41c6f7]
> 2014-10-29 12:03:22.761653 7f8f49a927c0 -1 *** Caught signal
> (Segmentation fault) ** in thread 7f8f49a927c0
>
>  ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1:
>  rados() [0x43eedf] 2: (()+0x10340) [0x7f8f48738340] 3:
>  (librados::IoCtxImpl::snap_lookup(char const*, unsigned long*)+0x17)
>  [0x7f8f48aff127] 4: (main()+0x1385) [0x411e75] 5:
>  (__libc_start_main()+0xf5) [0x7f8f4795fec5] 6: rados() [0x41c6f7] NOTE:
>  a copy of the executable, or `objdump -rdS ` is needed to
>  interpret this.
>
> Full segfault file and the objdump output for the rados command can be
> found here:
>
> - https://public.centerdevice.de/53bddb80-423e-4213-ac62-59fe8dbb9bea
> - https://public.centerdevice.de/50b81566-41fb-439a-b58b-e1e32d75f32a
>
> We updated to the 0.80.7 release (saw the issue with 0.80.5 before and
> had hoped that the long list of bugfixes in the release notes would
> include a fix for this) but are still seeing it. Rados gateways, OSDs,
> MONs etc. have all been restarted after the update. Package versions
> as follows:
>
> daniel.schneller@node01 [~] $
> ➜  dpkg -l | grep ceph
> ii  ceph0.80.7-1trusty
> ii  ceph-common 0.80.7-1trusty
> ii  ceph-fs-common  0.80.7-1trusty
> ii  ceph-fuse   0.80.7-1trusty
> ii  ceph-mds0.80.7-1trusty
> ii  libcephfs1  0.80.7-1trusty
> ii  python-ceph 0.80.7-1trusty
>
> daniel.schneller@node01 [~] $
> ➜  uname -a
> Linux node01 3.13.0-27-generic #50-Ubuntu SMP Thu May 15 18:06:16
>UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
>
> Copying without the snapshot works. Should this work at least in
> theory?

Well, that's interesting. I'm not sure if this can be expected to work
properly, but it certainly shouldn't crash there. Looking at it a bit,
you can make it not crash by specifying "-p deleteme.lp" as well, but
it simply copies the current state of the pool, not the snapped state.
If you could generate a ticket or two at tracker.ceph.com, that would
be helpful!
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw issues

2014-10-29 Thread yuelongguang
i  ignore a detail, to set FastCgiWrapper off
thanks






At 2014-10-30 10:01:19, "yuelongguang"  wrote:

lists ceph, hi:
how do you solve this issue? i run into it  when i tryy to deploy 2 rgws on one 
ceph cluster in default region and default zone.
 
 
thanks








At 2014-07-01 09:06:24, "Brian Rak"  wrote:
>That sounds like you have some kind of odd situation going on.  We only 
>use radosgw with nginx/tengine so I can't comment on the apache part of it.
>
>My understanding is this:
>
>You start ceph-radosgw, this creates a fastcgi socket somewhere (verify 
>this is created with lsof, there are some permission problems that will 
>result in radosgw running, but not opening the socket).
>
>Apache is configured to connect to this socket, and forward any incoming 
>requests.  Apache should not be launching things.
>
>I did set Apache up once to test a bug, so I took a look at my config.  
>I do *not* have a s3gw.fcgi file on disk.  Have you tried removing 
>that?  I think that with FastCgiExternalServer, you don't need 
>s3gw.fcgi.  The other thing that got me was socket path in 
>FastCgiExternalServer is relative to whatever you have FastCgiIpcDir set 
>to (which the Ceph docs don't seem to take into account).
>
>
>On 6/30/2014 8:40 PM, lists+c...@deksai.com wrote:
>> On 2014-06-16 13:16, lists+c...@deksai.com wrote:
>>> I've just tried setting up the radosgw on centos6 according to
>>> http://ceph.com/docs/master/radosgw/config/
>>
>>> While I can run the admin commands just fine to create users etc.,
>>> making a simple wget request to the domain I set up returns a 500 due
>>> to a timeout.  Every request I make results in another radosgw process
>>> being created, which seems to start even more processes itself. I
>>> only have to make a few requests to have about 60 radosgw processes.
>>>
>>
>> Guess I'll try again.  I gave this another shot, following the 
>> documentation, and still end up with basically a fork bomb rather than 
>> the nice ListAllMyBucketsResult output that the docs say I should 
>> get.  Everything else about the cluster works fine, and I see others 
>> talking about the gateway as if it just worked, so I'm led to believe 
>> that I'm probably doing something stupid.  Has anybody else run into 
>> the situation where apache times out while fastcgi just launches more 
>> and more processes?
>>
>> The init script launches a process, and the webserver seems to launch 
>> the same thing, so I'm not clear on what should be happening here.  
>> Either way, I get nothing back when making a simple GET request to the 
>> domain.
>>
>> If anybody has suggestions, even if they are "You nincompoop! 
>> Everybody knows that you need to do such and such", that would be 
>> helpful.
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Anyone deploying Ceph on Docker?

2014-10-29 Thread Patrick McGarry
Hey cephers,

Given some of the recent interest in utilizing Docker with Ceph I'm
taking another survey of the landscape. I know that Loic recently got
Teuthology running with Docker (http://dachary.org/?p=3330)  but I'd
like to look at running a containerized Ceph setup as well.

So far I see that Sebastien did an experiment back in 2013:

http://www.sebastien-han.fr/blog/2013/09/19/how-I-barely-got-my-first-ceph-mon-running-in-docker/

and Lorieri had a CoreOS experiment:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-August/042063.html

but there have been a few people that mentioned their experiments to
me in passing at cons and other places.  I'd love to gather any
experience that people have gleaned in this area by aggregating blog
entries and other notes.  So, if you have a successful Ceph+Docker
setup and would be willing to write a short doc via email/blog/wiki I
would greatly appreciate it.  Thanks!


Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Anyone deploying Ceph on Docker?

2014-10-29 Thread Christopher Armstrong
Also to note - we're running on CoreOS and making use of the etcd
distributed key value store to store configuration data, and confd to
template out some of the configuration from etcd. So it's a cool marriage
of various tools in the ecosystem.


*Chris Armstrong*Head of Services
OpDemand / Deis.io

GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/


On Wed, Oct 29, 2014 at 10:16 PM, Christopher Armstrong 
wrote:

> Hey Patrick,
>
> We recently added a new component to Deis which is based entirely on
> running Ceph in containers. We're running mons, OSDs, and MDSes in
> containers, and consuming from containers with radosgw as well as CephFS.
> See the source here: https://github.com/deis/deis/tree/master/store
>
> I'm pretty proud of the work, and would be more than happy to write a blog
> post about it if you'd like.
>
> Chris
>
>
> *Chris Armstrong*Head of Services
> OpDemand / Deis.io
>
> GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/
>
>
> On Wed, Oct 29, 2014 at 8:26 PM, Patrick McGarry 
> wrote:
>
>> Hey cephers,
>>
>> Given some of the recent interest in utilizing Docker with Ceph I'm
>> taking another survey of the landscape. I know that Loic recently got
>> Teuthology running with Docker (http://dachary.org/?p=3330)  but I'd
>> like to look at running a containerized Ceph setup as well.
>>
>> So far I see that Sebastien did an experiment back in 2013:
>>
>>
>> http://www.sebastien-han.fr/blog/2013/09/19/how-I-barely-got-my-first-ceph-mon-running-in-docker/
>>
>> and Lorieri had a CoreOS experiment:
>>
>>
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-August/042063.html
>>
>> but there have been a few people that mentioned their experiments to
>> me in passing at cons and other places.  I'd love to gather any
>> experience that people have gleaned in this area by aggregating blog
>> entries and other notes.  So, if you have a successful Ceph+Docker
>> setup and would be willing to write a short doc via email/blog/wiki I
>> would greatly appreciate it.  Thanks!
>>
>>
>> Best Regards,
>>
>> Patrick McGarry
>> Director Ceph Community || Red Hat
>> http://ceph.com  ||  http://community.redhat.com
>> @scuttlemonkey || @ceph
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Anyone deploying Ceph on Docker?

2014-10-29 Thread Christopher Armstrong
Sure thing. I'll work on something and send it over early next week.


*Chris Armstrong*Head of Services
OpDemand / Deis.io

GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/


On Wed, Oct 29, 2014 at 10:24 PM, Patrick McGarry 
wrote:

> Christopher,
>
> I would definitely welcome a writeup for the ceph.com blog!  Feel free
> to send something my way as soon as is convenient. :)
>
>
> As an aside to anyone doing fun/cool/interesting/wacky things...I'm
> always looking for ceph.com blog content and love to feature
> everything from small experiments to huge performance datasets.  Feel
> free to contact me directly and we'll get you slotted in.  Thanks!
>
>
> Best Regards,
>
> Patrick McGarry
> Director Ceph Community || Red Hat
> http://ceph.com  ||  http://community.redhat.com
> @scuttlemonkey || @ceph
>
>
> On Thu, Oct 30, 2014 at 1:18 AM, Christopher Armstrong
>  wrote:
> > Also to note - we're running on CoreOS and making use of the etcd
> > distributed key value store to store configuration data, and confd to
> > template out some of the configuration from etcd. So it's a cool
> marriage of
> > various tools in the ecosystem.
> >
> > Chris Armstrong
> > Head of Services
> > OpDemand / Deis.io
> >
> > GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/
> >
> >
> > On Wed, Oct 29, 2014 at 10:16 PM, Christopher Armstrong <
> ch...@opdemand.com>
> > wrote:
> >>
> >> Hey Patrick,
> >>
> >> We recently added a new component to Deis which is based entirely on
> >> running Ceph in containers. We're running mons, OSDs, and MDSes in
> >> containers, and consuming from containers with radosgw as well as
> CephFS.
> >> See the source here: https://github.com/deis/deis/tree/master/store
> >>
> >> I'm pretty proud of the work, and would be more than happy to write a
> blog
> >> post about it if you'd like.
> >>
> >> Chris
> >>
> >> Chris Armstrong
> >> Head of Services
> >> OpDemand / Deis.io
> >>
> >> GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/
> >>
> >>
> >> On Wed, Oct 29, 2014 at 8:26 PM, Patrick McGarry 
> >> wrote:
> >>>
> >>> Hey cephers,
> >>>
> >>> Given some of the recent interest in utilizing Docker with Ceph I'm
> >>> taking another survey of the landscape. I know that Loic recently got
> >>> Teuthology running with Docker (http://dachary.org/?p=3330)  but I'd
> >>> like to look at running a containerized Ceph setup as well.
> >>>
> >>> So far I see that Sebastien did an experiment back in 2013:
> >>>
> >>>
> >>>
> http://www.sebastien-han.fr/blog/2013/09/19/how-I-barely-got-my-first-ceph-mon-running-in-docker/
> >>>
> >>> and Lorieri had a CoreOS experiment:
> >>>
> >>>
> >>>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-August/042063.html
> >>>
> >>> but there have been a few people that mentioned their experiments to
> >>> me in passing at cons and other places.  I'd love to gather any
> >>> experience that people have gleaned in this area by aggregating blog
> >>> entries and other notes.  So, if you have a successful Ceph+Docker
> >>> setup and would be willing to write a short doc via email/blog/wiki I
> >>> would greatly appreciate it.  Thanks!
> >>>
> >>>
> >>> Best Regards,
> >>>
> >>> Patrick McGarry
> >>> Director Ceph Community || Red Hat
> >>> http://ceph.com  ||  http://community.redhat.com
> >>> @scuttlemonkey || @ceph
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >>
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Anyone deploying Ceph on Docker?

2014-10-29 Thread Patrick McGarry
Christopher,

I would definitely welcome a writeup for the ceph.com blog!  Feel free
to send something my way as soon as is convenient. :)


As an aside to anyone doing fun/cool/interesting/wacky things...I'm
always looking for ceph.com blog content and love to feature
everything from small experiments to huge performance datasets.  Feel
free to contact me directly and we'll get you slotted in.  Thanks!


Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph


On Thu, Oct 30, 2014 at 1:18 AM, Christopher Armstrong
 wrote:
> Also to note - we're running on CoreOS and making use of the etcd
> distributed key value store to store configuration data, and confd to
> template out some of the configuration from etcd. So it's a cool marriage of
> various tools in the ecosystem.
>
> Chris Armstrong
> Head of Services
> OpDemand / Deis.io
>
> GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/
>
>
> On Wed, Oct 29, 2014 at 10:16 PM, Christopher Armstrong 
> wrote:
>>
>> Hey Patrick,
>>
>> We recently added a new component to Deis which is based entirely on
>> running Ceph in containers. We're running mons, OSDs, and MDSes in
>> containers, and consuming from containers with radosgw as well as CephFS.
>> See the source here: https://github.com/deis/deis/tree/master/store
>>
>> I'm pretty proud of the work, and would be more than happy to write a blog
>> post about it if you'd like.
>>
>> Chris
>>
>> Chris Armstrong
>> Head of Services
>> OpDemand / Deis.io
>>
>> GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/
>>
>>
>> On Wed, Oct 29, 2014 at 8:26 PM, Patrick McGarry 
>> wrote:
>>>
>>> Hey cephers,
>>>
>>> Given some of the recent interest in utilizing Docker with Ceph I'm
>>> taking another survey of the landscape. I know that Loic recently got
>>> Teuthology running with Docker (http://dachary.org/?p=3330)  but I'd
>>> like to look at running a containerized Ceph setup as well.
>>>
>>> So far I see that Sebastien did an experiment back in 2013:
>>>
>>>
>>> http://www.sebastien-han.fr/blog/2013/09/19/how-I-barely-got-my-first-ceph-mon-running-in-docker/
>>>
>>> and Lorieri had a CoreOS experiment:
>>>
>>>
>>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-August/042063.html
>>>
>>> but there have been a few people that mentioned their experiments to
>>> me in passing at cons and other places.  I'd love to gather any
>>> experience that people have gleaned in this area by aggregating blog
>>> entries and other notes.  So, if you have a successful Ceph+Docker
>>> setup and would be willing to write a short doc via email/blog/wiki I
>>> would greatly appreciate it.  Thanks!
>>>
>>>
>>> Best Regards,
>>>
>>> Patrick McGarry
>>> Director Ceph Community || Red Hat
>>> http://ceph.com  ||  http://community.redhat.com
>>> @scuttlemonkey || @ceph
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Anyone deploying Ceph on Docker?

2014-10-29 Thread Christopher Armstrong
Hey Patrick,

We recently added a new component to Deis which is based entirely on
running Ceph in containers. We're running mons, OSDs, and MDSes in
containers, and consuming from containers with radosgw as well as CephFS.
See the source here: https://github.com/deis/deis/tree/master/store

I'm pretty proud of the work, and would be more than happy to write a blog
post about it if you'd like.

Chris


*Chris Armstrong*Head of Services
OpDemand / Deis.io

GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/


On Wed, Oct 29, 2014 at 8:26 PM, Patrick McGarry 
wrote:

> Hey cephers,
>
> Given some of the recent interest in utilizing Docker with Ceph I'm
> taking another survey of the landscape. I know that Loic recently got
> Teuthology running with Docker (http://dachary.org/?p=3330)  but I'd
> like to look at running a containerized Ceph setup as well.
>
> So far I see that Sebastien did an experiment back in 2013:
>
>
> http://www.sebastien-han.fr/blog/2013/09/19/how-I-barely-got-my-first-ceph-mon-running-in-docker/
>
> and Lorieri had a CoreOS experiment:
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-August/042063.html
>
> but there have been a few people that mentioned their experiments to
> me in passing at cons and other places.  I'd love to gather any
> experience that people have gleaned in this area by aggregating blog
> entries and other notes.  So, if you have a successful Ceph+Docker
> setup and would be willing to write a short doc via email/blog/wiki I
> would greatly appreciate it.  Thanks!
>
>
> Best Regards,
>
> Patrick McGarry
> Director Ceph Community || Red Hat
> http://ceph.com  ||  http://community.redhat.com
> @scuttlemonkey || @ceph
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.87 Giant released

2014-10-29 Thread Sage Weil
On Thu, 30 Oct 2014, Nigel Williams wrote:
> On 30/10/2014 8:56 AM, Sage Weil wrote:
> > * *Degraded vs misplaced*: the Ceph health reports from 'ceph -s' and
> >related commands now make a distinction between data that is
> >degraded (there are fewer than the desired number of copies) and
> >data that is misplaced (stored in the wrong location in the
> >cluster).
> 
> Is someone able to briefly described how/why misplaced happens please, is it
> repaired eventually? I've not seen misplaced (yet).

Sure.  An easy way to get misplaced objects is to do 'ceph osd 
out N' on an OSD.  Nothing is down, we still have as many copies 
as we had before, but Ceph now wants to move them somewhere 
else. Starting with giant, you will see the misplaced % in 'ceph -s' and 
not degraded.

> >  leveldb_write_buffer_size = 32*1024*1024  = 33554432  // 32MB
> >  leveldb_cache_size= 512*1024*1204 = 536870912 // 512MB
> 
> I noticed the typo, wondered about the code, but I'm not seeing the same
> values anyway?
> 
> https://github.com/ceph/ceph/blob/giant/src/common/config_opts.h
> 
> OPTION(leveldb_write_buffer_size, OPT_U64, 8 *1024*1024) // leveldb write
> buffer size
> OPTION(leveldb_cache_size, OPT_U64, 128 *1024*1024) // leveldb cache size

Hmm!  Not sure where that 32MB number came from.  I'll fix it, thanks!

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] survey: Ceph integration into auth security frameworks (AD/kerberos/etc.)

2014-10-29 Thread Sage Weil
I've created a survey to try to get a feel for what people are using in 
environments where Ceph is deployed today.  If you have a moment, please 
help us figure out where we should be spending our effort, here!  In 
particular, if you use kerberos or active directory or something similar, 
we want to hear from you.  (And if you don't, that is good to know, too!)

https://www.surveymonkey.com/s/DQJPJ9J

Thanks-
sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Clone field from rados df command

2014-10-29 Thread Mallikarjun Biradar
What exactly "clone" field from "rados df"meant for?

Steps tried:
Created an rbd image & mapped it
Wrote some 1GB of data on /dev/rbd1 using fio
Unmapped rbd image
Took snapshot
Mapped rbd image again and overwrite 1GB of data using fio
Unmapped rbd image
Took snapshot
Mapped rbd image again and wrote 5GB of data by overwriting initial 1GB of
data using fio

After this rados df is showing 256 clones under clone field.
Since, I haven't created any rbd image clones, I want to get clarify on why
rados df is displaying these clones?

-Thanks & Regards,
Arjun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com