Re: [ceph-users] About the data movement in Ceph

2013-09-27 Thread Zh Chen
Thx Sage for help me understand Ceph much more deeply!

And recently i have another questions as follows,

1. As we know, Ceph -s is the summary of system's state, and is there any
tools to monitor the detail of data's flow when the Crush map is changed?

2. In my understanding, the mapping between the obj and PG is consistent,
we only need to change the mapping between the PG and OSD when our Crush
map is changed, right?

3.Suppose that two pools in Ceph, each root has its OSDs in leaf. And if i
move one's OSD from one to the other pool, in response to that, the PG in
this OSD would be migrated in its original pool or rebalance in the target
pool?

4. The crushtool is a very cool tool to understand the Crush, but i don't
know how to use the --show-utilization (show OSD usage), what args or
action i need to add in cli?
Is there any cli that can query each OSD's usage and statistics?

5. I find that librados offer the api, and about this rados_ioctx_pool_stat(
rados_ioctx_t io, struct rados_pool_stat_t *stats), if i want to query some
pools' statistics and i need to declare some rados_ioctx io or cluster
handle that each for a pool? i found the segment fault when the return for
rados_ioctx_pool_stat.

Very looking forward for ur kindly reply!




2013/9/11 Sage Weil 

> On Tue, 10 Sep 2013, atrmat wrote:
> > Hi all,
> > recently i read the source code and paper, and i have some questions
> about
> > the data movement:
> > 1. when OSD's add or removal, how Ceph do this data migration and
> rebalance
> > the crush map? is it the rados modify the crush map or cluster map, and
> the
> > primary OSD does the data movement according to the cluster map? how to
> > found the data migration in the source code?
>
> The OSDMap changes when the osd is added or removed (or some other event
> or administrator action happens).  In response, the OSDs recalculate where
> the PGs should be stored, and move data in response to that.
>
> > 2. when OSD's down or failed, how Ceph recover the data in other OSDs?
> is it
> > the primary OSD copy the PG to the new located OSD?
>
> The (new) primary figures out where data is/was (peering) and the
> coordinates any data migration (recovery) to where the data should now be
> (according to the latest OSDMap and its embedded CRUSH map).
>
> > 3. the OSD has 4 status bits: up,down,in,out. But i can't found the
> defined
> > status-- CEPH_OSD_DOWN, is it the OSD call the function mark_osd_down()
> to
> > modify the OSD status in OSDMap?
>
> See OSDMap.h: is_up() and is_down().  For in/out, it is either binary
> (is_in() and is_out() or can be somewhere in between; see get_weight()).
>
> Hope that helps!
>
> sage
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OpenStack Grizzly Authentication (Keystone PKI) with RADOS Gateway

2013-09-27 Thread Amit Vijairania
Hello!

Does RADOS Gateway supports or integrates with OpenStack (Grizzly)
Authentication (Keystone PKI)?

Can RADOS Gateway use PKI tokens to conduct user token verification without
explicit calls to Keystone.

Thanks!
Amit

Amit Vijairania  |  978.319.3684
--*--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rdosgw swift subuser creation

2013-09-27 Thread Snider, Tim
Thanks that worked - you were close: 
This is another document issue on http://ceph.com/docs/next/radosgw/config/ , 
--gen-secret parameter requirement isn't mentioned.
Enabling Swift Access
Allowing access to the object store with Swift (OpenStack 
Object Storage) compatible clients requires an additional step; 
namely, the creation of a subuser and a Swift access key.
  sudo radosgw-admin subuser create --uid=johndoe 
--subuser=johndoe:swift --access=full
  sudo radosgw-admin key create --subuser=johndoe:swift 
--key-type=swift

radosgw-admin key create   --subuser=rados:swift   --key-type=swift --gen-secret
2013-09-27 14:46:40.202708 7f25c5d70780  0 WARNING: cannot read region map
{ "user_id": "rados",
  "display_name": "rados",
  "email": "n...@none.com",
  "suspended": 0,
  "max_buckets": 1000,
  "auid": 0,
  "subusers": [
{ "id": "rados:swift",
  "permissions": "full-control"}],
  "keys": [
{ "user": "rados",
  "access_key": "R5F0D2UCSK3618DJ829A",
  "secret_key": "PJR1rvV2+Xrzlwo+AZZKXextsDl45EaLljzopgjD"}],
  "swift_keys": [
{ "user": "rados:swift",
  "secret_key": "77iJvemrxWvYk47HW7pxsL+eHdA53AtLl2T0OyuG"}],
  "caps": [],
  "op_mask": "read, write, delete",
  "default_placement": "",
  "placement_tags": []}
-Original Message-
From: Matt McNulty [mailto:ma...@codero.com] 
Sent: Friday, September 27, 2013 4:40 PM
To: Snider, Tim; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] rdosgw swift subuser creation

Hi Tim,

Try adding --gen-key to your create command (you should be able to create a key 
for the subuser you already created).

Thanks,
Matt

On 9/27/13 4:35 PM, "Snider, Tim"  wrote:

>I created an rdosgw user and swift subuser and attempted to generate a 
>key for the swift user. Using the commands below. However the swift key 
>was empty when the command completed. What did I miss?
>
>root@controller21:/etc# radosgw-admin user create --uid=rados 
>--display-name=rados --email=n...@none.com
>2013-09-27 13:34:08.155162 7f984f0a5780  0 WARNING: cannot read region 
>map { "user_id": "rados",
>  "display_name": "rados",
>  "email": "n...@none.com",
>  "suspended": 0,
>  "max_buckets": 1000,
>  "auid": 0,
>  "subusers": [],
>  "keys": [
>{ "user": "rados",
>  "access_key": "R5F0D2UCSK3618DJ829A",
>  "secret_key": "PJR1rvV2+Xrzlwo+AZZKXextsDl45EaLljzopgjD"}],
>  "swift_keys": [],
>  "caps": [],
>  "op_mask": "read, write, delete",
>  "default_placement": "",
>  "placement_tags": []}
>
>root@controller21:/etc# radosgw-admin subuser create  --uid=rados
>--subuser=rados:swift   --key-type=swift --access=full
>2013-09-27 13:34:58.761911 7f5307c04780  0 WARNING: cannot read region 
>map { "user_id": "rados",
>  "display_name": "rados",
>  "email": "n...@none.com",
>  "suspended": 0,
>  "max_buckets": 1000,
>  "auid": 0,
>  "subusers": [
>{ "id": "rados:swift",
>  "permissions": "full-control"}],
>  "keys": [
>{ "user": "rados",
>  "access_key": "R5F0D2UCSK3618DJ829A",
>  "secret_key": "PJR1rvV2+Xrzlwo+AZZKXextsDl45EaLljzopgjD"}],
>  "swift_keys": [],
>  "caps": [],
>  "op_mask": "read, write, delete",
>  "default_placement": "",
>  "placement_tags": []}
>
>root@controller21:/etc# radosgw-admin key create   --subuser=rados:swift
> --key-type=swift
>2013-09-27 13:35:43.544005 7f599e672780  0 WARNING: cannot read region 
>map { "user_id": "rados",
>  "display_name": "rados",
>  "email": "n...@none.com",
>  "suspended": 0,
>  "max_buckets": 1000,
>  "auid": 0,
>  "subusers": [
>{ "id": "rados:swift",
>  "permissions": "full-control"}],
>  "keys": [
>{ "user": "rados",
>  "access_key": "R5F0D2UCSK3618DJ829A",
>  "secret_key": "PJR1rvV2+Xrzlwo+AZZKXextsDl45EaLljzopgjD"}],
>  "swift_keys": [
>{ "user": "rados:swift",
>  "secret_key": ""}],
>  "caps": [],
>  "op_mask": "read, write, delete",
>  "default_placement": "",
>  "placement_tags": []}
>
>Thanks,
>Tim
>
>Timothy Snider
>Strategic Planning & Architecture - Advanced Development NetApp
>316-636-8736 Direct Phone
>316-213-0223 Mobile Phone
>tim.sni...@netapp.com
>netapp.com
> 
>
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't mount CephFS - where to start troubleshooting?

2013-09-27 Thread Aaron Ten Clay
On Fri, Sep 27, 2013 at 2:44 PM, Gregory Farnum  wrote:

> What is the output of ceph -s? It could be something underneath the
> filesystem.
>
> root@chekov:~# ceph -s
  cluster 18b7cba7-ccc3-4945-bb39-99450be81c98
   health HEALTH_OK
   monmap e3: 3 mons at {chekov=
10.42.6.29:6789/0,laforge=10.42.5.30:6789/0,picard=10.42.6.21:6789/0},
election epoch 30, quorum 0,1,2 chekov,laforge,picard
   osdmap e387: 4 osds: 4 up, 4 in
pgmap v1100: 320 pgs: 320 active+clean; 7568 MB data, 15445 MB used,
14873 GB / 14888 GB avail
   mdsmap e28: 1/1/1 up {0=1=up:active}, 2 up:standby



> What kernel version are you using?


I have two configurations I've tested:
root@chekov:~# uname -a
Linux chekov 3.5.0-40-generic #62~precise1-Ubuntu SMP Fri Aug 23 17:38:26
UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

aaron@seven ~ $ uname -a
Linux seven 3.10.7-gentoo-r1 #1 SMP PREEMPT Thu Sep 26 07:23:03 PDT 2013
x86_64 Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz GenuineIntel GNU/Linux



> Did you enable crush tunables?

I haven't edited the CRUSH map for this cluster after creating it... so I
don't think I've edited any of the tunables either.

It
> could be that your kernel doesn't support all the options you enabled.
>
Well, on both my Ubuntu and Gentoo systems, I can mount the other cluster
just fine (the one that started as 0.61 and got upgraded):

seven ~ # mount -t ceph 10.42.100.20:/ /mnt/ceph -o name=admin,secret=...
seven ~ #

I should have mentioned in my initial email that with or without -o
name=,secret= mounting the new cluster fails with the same error 95 =
Operation not supported.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't mount CephFS - where to start troubleshooting?

2013-09-27 Thread Gregory Farnum
On Fri, Sep 27, 2013 at 2:12 PM, Aaron Ten Clay  wrote:
> Hi,
>
> I probably did something wrong setting up my cluster with 0.67.3. I
> previously built a cluster with 0.61 and everything went well, even after an
> upgrade to 0.67.3. Now I built a fresh 0.67.3 cluster and when I try to
> mount CephFS:
>
> aaron@seven ~ $ sudo mount -t ceph 10.42.6.21:/ /mnt/ceph
> mount error 95 = Operation not supported
>
> Nothing new shows in dmesg / syslog after this attempt, and I don't see
> anything telling in the mds or mon logs. Any pointers on where to look?
>
> I have three mons, three mds (2 standby), and four osds. I'm just doing more
> testing/learning Ceph, so if I did something wrong data loss is not a
> problem.

What is the output of ceph -s? It could be something underneath the filesystem.

What kernel version are you using? Did you enable crush tunables? It
could be that your kernel doesn't support all the options you enabled.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rdosgw swift subuser creation

2013-09-27 Thread Matt McNulty
Hi Tim,

Try adding --gen-key to your create command (you should be able to create
a key for the subuser you already created).

Thanks,
Matt

On 9/27/13 4:35 PM, "Snider, Tim"  wrote:

>I created an rdosgw user and swift subuser and attempted to generate a
>key for the swift user. Using the commands below. However the swift key
>was empty when the command completed. What did I miss?
>
>root@controller21:/etc# radosgw-admin user create --uid=rados
>--display-name=rados --email=n...@none.com
>2013-09-27 13:34:08.155162 7f984f0a5780  0 WARNING: cannot read region map
>{ "user_id": "rados",
>  "display_name": "rados",
>  "email": "n...@none.com",
>  "suspended": 0,
>  "max_buckets": 1000,
>  "auid": 0,
>  "subusers": [],
>  "keys": [
>{ "user": "rados",
>  "access_key": "R5F0D2UCSK3618DJ829A",
>  "secret_key": "PJR1rvV2+Xrzlwo+AZZKXextsDl45EaLljzopgjD"}],
>  "swift_keys": [],
>  "caps": [],
>  "op_mask": "read, write, delete",
>  "default_placement": "",
>  "placement_tags": []}
>
>root@controller21:/etc# radosgw-admin subuser create  --uid=rados
>--subuser=rados:swift   --key-type=swift --access=full
>2013-09-27 13:34:58.761911 7f5307c04780  0 WARNING: cannot read region map
>{ "user_id": "rados",
>  "display_name": "rados",
>  "email": "n...@none.com",
>  "suspended": 0,
>  "max_buckets": 1000,
>  "auid": 0,
>  "subusers": [
>{ "id": "rados:swift",
>  "permissions": "full-control"}],
>  "keys": [
>{ "user": "rados",
>  "access_key": "R5F0D2UCSK3618DJ829A",
>  "secret_key": "PJR1rvV2+Xrzlwo+AZZKXextsDl45EaLljzopgjD"}],
>  "swift_keys": [],
>  "caps": [],
>  "op_mask": "read, write, delete",
>  "default_placement": "",
>  "placement_tags": []}
>
>root@controller21:/etc# radosgw-admin key create   --subuser=rados:swift
> --key-type=swift
>2013-09-27 13:35:43.544005 7f599e672780  0 WARNING: cannot read region map
>{ "user_id": "rados",
>  "display_name": "rados",
>  "email": "n...@none.com",
>  "suspended": 0,
>  "max_buckets": 1000,
>  "auid": 0,
>  "subusers": [
>{ "id": "rados:swift",
>  "permissions": "full-control"}],
>  "keys": [
>{ "user": "rados",
>  "access_key": "R5F0D2UCSK3618DJ829A",
>  "secret_key": "PJR1rvV2+Xrzlwo+AZZKXextsDl45EaLljzopgjD"}],
>  "swift_keys": [
>{ "user": "rados:swift",
>  "secret_key": ""}],
>  "caps": [],
>  "op_mask": "read, write, delete",
>  "default_placement": "",
>  "placement_tags": []}
>
>Thanks,
>Tim
>
>Timothy Snider
>Strategic Planning & Architecture - Advanced Development
>NetApp
>316-636-8736 Direct Phone
>316-213-0223 Mobile Phone
>tim.sni...@netapp.com
>netapp.com
> 
>
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rdosgw swift subuser creation

2013-09-27 Thread Snider, Tim
I created an rdosgw user and swift subuser and attempted to generate a key for 
the swift user. Using the commands below. However the swift key was empty when 
the command completed. What did I miss? 

root@controller21:/etc# radosgw-admin user create --uid=rados 
--display-name=rados --email=n...@none.com
2013-09-27 13:34:08.155162 7f984f0a5780  0 WARNING: cannot read region map
{ "user_id": "rados",
  "display_name": "rados",
  "email": "n...@none.com",
  "suspended": 0,
  "max_buckets": 1000,
  "auid": 0,
  "subusers": [],
  "keys": [
{ "user": "rados",
  "access_key": "R5F0D2UCSK3618DJ829A",
  "secret_key": "PJR1rvV2+Xrzlwo+AZZKXextsDl45EaLljzopgjD"}],
  "swift_keys": [],
  "caps": [],
  "op_mask": "read, write, delete",
  "default_placement": "",
  "placement_tags": []}

root@controller21:/etc# radosgw-admin subuser create  --uid=rados  
--subuser=rados:swift   --key-type=swift --access=full
2013-09-27 13:34:58.761911 7f5307c04780  0 WARNING: cannot read region map
{ "user_id": "rados",
  "display_name": "rados",
  "email": "n...@none.com",
  "suspended": 0,
  "max_buckets": 1000,
  "auid": 0,
  "subusers": [
{ "id": "rados:swift",
  "permissions": "full-control"}],
  "keys": [
{ "user": "rados",
  "access_key": "R5F0D2UCSK3618DJ829A",
  "secret_key": "PJR1rvV2+Xrzlwo+AZZKXextsDl45EaLljzopgjD"}],
  "swift_keys": [],
  "caps": [],
  "op_mask": "read, write, delete",
  "default_placement": "",
  "placement_tags": []}

root@controller21:/etc# radosgw-admin key create   --subuser=rados:swift   
--key-type=swift
2013-09-27 13:35:43.544005 7f599e672780  0 WARNING: cannot read region map
{ "user_id": "rados",
  "display_name": "rados",
  "email": "n...@none.com",
  "suspended": 0,
  "max_buckets": 1000,
  "auid": 0,
  "subusers": [
{ "id": "rados:swift",
  "permissions": "full-control"}],
  "keys": [
{ "user": "rados",
  "access_key": "R5F0D2UCSK3618DJ829A",
  "secret_key": "PJR1rvV2+Xrzlwo+AZZKXextsDl45EaLljzopgjD"}],
  "swift_keys": [
{ "user": "rados:swift",
  "secret_key": ""}],
  "caps": [],
  "op_mask": "read, write, delete",
  "default_placement": "",
  "placement_tags": []}

Thanks,
Tim

Timothy Snider
Strategic Planning & Architecture - Advanced Development
NetApp
316-636-8736 Direct Phone
316-213-0223 Mobile Phone
tim.sni...@netapp.com
netapp.com
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Can't mount CephFS - where to start troubleshooting?

2013-09-27 Thread Aaron Ten Clay
Hi,

I probably did something wrong setting up my cluster with 0.67.3. I
previously built a cluster with 0.61 and everything went well, even after
an upgrade to 0.67.3. Now I built a fresh 0.67.3 cluster and when I try to
mount CephFS:

aaron@seven ~ $ sudo mount -t ceph 10.42.6.21:/ /mnt/ceph
mount error 95 = Operation not supported

Nothing new shows in dmesg / syslog after this attempt, and I don't see
anything telling in the mds or mon logs. Any pointers on where to look?

I have three mons, three mds (2 standby), and four osds. I'm just doing
more testing/learning Ceph, so if I did something wrong data loss is not a
problem.

Thanks!
-Aaron
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD: Newbie question regarding ceph-deploy odd create

2013-09-27 Thread Piers Dawson-Damer
Hi,

I'm trying to setup my first cluster,   (have never manually bootstrapped a 
cluster)

Is ceph-deploy odd activate/prepare supposed to write to the master ceph.conf 
file, specific entries for each OSD along the lines of 
http://ceph.com/docs/master/rados/configuration/osd-config-ref/ ?

I appear to have the OSDs prepared without error, but then.. no OSD entries in 
master cepf.conf nor node /etc/cepf.conf

Am I missing something?

Thanks in advance,

Piers Dawson-Damer
Tasmania


2013-09-28 06:47:00,471 [ceph_deploy.sudo_pushy][DEBUG ] will use a remote 
connection with sudo
2013-09-28 06:47:01,205 [ceph_deploy.osd][INFO  ] Distro info: Ubuntu 12.04 
precise
2013-09-28 06:47:01,205 [ceph_deploy.osd][DEBUG ] Preparing host 
storage03-vs-e2 disk /dev/sdm journal /dev/mapper/ceph_journal-osd_12 activate 
True
2013-09-28 06:47:01,206 [storage03-vs-e2][INFO  ] Running command: 
ceph-disk-prepare --cluster ceph -- /dev/sdm /dev/mapper/ceph_journal-osd_12
2013-09-28 06:47:20,247 [storage03-vs-e2][INFO  ] Information: Moved requested 
sector from 4194338 to 4196352 in
2013-09-28 06:47:20,248 [storage03-vs-e2][INFO  ] order to align on 2048-sector 
boundaries.
2013-09-28 06:47:20,248 [storage03-vs-e2][INFO  ] Warning: The kernel is still 
using the old partition table.
2013-09-28 06:47:20,248 [storage03-vs-e2][INFO  ] The new table will be used at 
the next reboot.
2013-09-28 06:47:20,248 [storage03-vs-e2][INFO  ] The operation has completed 
successfully.
2013-09-28 06:47:20,248 [storage03-vs-e2][INFO  ] Information: Moved requested 
sector from 34 to 2048 in
2013-09-28 06:47:20,249 [storage03-vs-e2][INFO  ] order to align on 2048-sector 
boundaries.
2013-09-28 06:47:20,249 [storage03-vs-e2][INFO  ] The operation has completed 
successfully.
2013-09-28 06:47:20,249 [storage03-vs-e2][INFO  ] meta-data=/dev/sdm1   
   isize=2048   agcount=4, agsize=183105343 blks
2013-09-28 06:47:20,250 [storage03-vs-e2][INFO  ]  =
   sectsz=512   attr=2, projid32bit=0
2013-09-28 06:47:20,250 [storage03-vs-e2][INFO  ] data =
   bsize=4096   blocks=732421371, imaxpct=5
2013-09-28 06:47:20,250 [storage03-vs-e2][INFO  ]  =
   sunit=0  swidth=0 blks
2013-09-28 06:47:20,250 [storage03-vs-e2][INFO  ] naming   =version 2   
   bsize=4096   ascii-ci=0
2013-09-28 06:47:20,251 [storage03-vs-e2][INFO  ] log  =internal log
   bsize=4096   blocks=357627, version=2
2013-09-28 06:47:20,251 [storage03-vs-e2][INFO  ]  =
   sectsz=512   sunit=0 blks, lazy-count=1
2013-09-28 06:47:20,251 [storage03-vs-e2][INFO  ] realtime =none
   extsz=4096   blocks=0, rtextents=0
2013-09-28 06:47:20,251 [storage03-vs-e2][INFO  ] The operation has completed 
successfully.
2013-09-28 06:47:20,252 [storage03-vs-e2][ERROR ] WARNING:ceph-disk:OSD will 
not be hot-swappable if journal is not the same device as the osd data
2013-09-28 06:47:20,266 [storage03-vs-e2][INFO  ] Running command: udevadm 
trigger --subsystem-match=block --action=add
2013-09-28 06:47:20,413 [ceph_deploy.osd][DEBUG ] Host storage03-vs-e2 is now 
ready for osd use.





2013-09-27 10:13:25,349 [storage03-vs-e2][DEBUG ] status for monitor: 
mon.storage03-vs-e2
2013-09-27 10:13:25,349 [storage03-vs-e2][DEBUG ] { "name": "storage03-vs-e2",
2013-09-27 10:13:25,350 [storage03-vs-e2][DEBUG ]   "rank": 2,
2013-09-27 10:13:25,350 [storage03-vs-e2][DEBUG ]   "state": "electing",
2013-09-27 10:13:25,350 [storage03-vs-e2][DEBUG ]   "election_epoch": 1,
2013-09-27 10:13:25,351 [storage03-vs-e2][DEBUG ]   "quorum": [],
2013-09-27 10:13:25,351 [storage03-vs-e2][DEBUG ]   "outside_quorum": [],
2013-09-27 10:13:25,351 [storage03-vs-e2][DEBUG ]   "extra_probe_peers": [
2013-09-27 10:13:25,351 [storage03-vs-e2][DEBUG ] 
"172.17.181.47:6789\/0",
2013-09-27 10:13:25,352 [storage03-vs-e2][DEBUG ] 
"172.17.181.48:6789\/0"],
2013-09-27 10:13:25,352 [storage03-vs-e2][DEBUG ]   "sync_provider": [],
2013-09-27 10:13:25,352 [storage03-vs-e2][DEBUG ]   "monmap": { "epoch": 0,
2013-09-27 10:13:25,352 [storage03-vs-e2][DEBUG ]   "fsid": 
"28626c0a-0266-4b80-8c06-0562bf48b793",
2013-09-27 10:13:25,353 [storage03-vs-e2][DEBUG ]   "modified": "0.00",
2013-09-27 10:13:25,353 [storage03-vs-e2][DEBUG ]   "created": "0.00",
2013-09-27 10:13:25,353 [storage03-vs-e2][DEBUG ]   "mons": [
2013-09-27 10:13:25,353 [storage03-vs-e2][DEBUG ] { "rank": 0,
2013-09-27 10:13:25,354 [storage03-vs-e2][DEBUG ]   "name": 
"storage01-vs-e2",
2013-09-27 10:13:25,354 [storage03-vs-e2][DEBUG ]   "addr": 
"172.17.181.47:6789\/0"},
2013-09-27 10:13:25,354 [storage03-vs-e2][DEBUG ] { "rank": 1,
2013-09-27 10:13:25,354 [storage03-vs-e2][DEBUG ]   "name": 
"storage02-vs-e2",
2013-09-27 10:13:25,355 [storage03-vs-e2][DEBUG ]   "addr": 
"172.17.181.48:6789\/0"},
2013-09-27 10:13:25,355 [storag

Re: [ceph-users] RBD Snap removal priority

2013-09-27 Thread Travis Rhoden
Hi Mike,

Thanks for the info.  I had seem some of the previous reports of
reduced performance during various recovery tasks (and certainly
experienced them) but you summarized them all quite nicely.

Yes, I'm running XFS on the OSDs.  I checked fragmentation on a few of
my OSDs -- all came back ~38% (better than I thought!).

 - Travis

On Fri, Sep 27, 2013 at 2:05 PM, Mike Dawson  wrote:
> [cc ceph-devel]
>
> Travis,
>
> RBD doesn't behave well when Ceph maintainance operations create spindle
> contention (i.e. 100% util from iostat). More about that below.
>
> Do you run XFS under your OSDs? If so, can you check for extent
> fragmentation? Should be something like:
>
> xfs_db -c frag -r /dev/sdb1
>
> We recently saw a fragmentation factors of over 80%, with lots of ino's
> having hundreds of extents. After 24 hours+ of defrag'ing, we got it under
> control, but we're seeing the fragmentation factor grow by ~1.5% daily. We
> experienced spindle contention issues even after the defrag.
>
>
>
> Sage, Sam, etc,
>
> I think the real issue is Ceph has several states where it performs what I
> would call "maintanance operations" that saturate the underlying storage
> without properly yielding to client i/o (which should have a higher
> priority).
>
> I have experienced or seen reports of Ceph maintainance affecting rbd client
> i/o in many ways:
>
> - QEMU/RBD Client I/O Stalls or Halts Due to Spindle Contention from Ceph
> Maintainance [1]
> - Recovery and/or Backfill Cause QEMU/RBD Reads to Hang [2]
> - rbd snap rm (Travis' report below)
>
> [1] http://tracker.ceph.com/issues/6278
> [2] http://tracker.ceph.com/issues/6333
>
> I think this family of issues speak to the need for Ceph to have more
> visibility into the underlying storage's limitations (especially spindle
> contention) when performing known expensive maintainance operations.
>
> Thanks,
> Mike Dawson
>
>
> On 9/27/2013 12:25 PM, Travis Rhoden wrote:
>>
>> Hello everyone,
>>
>> I'm running a Cuttlefish cluster that hosts a lot of RBDs.  I recently
>> removed a snapshot of a large one (rbd snap rm -- 12TB), and I noticed
>> that all of the clients had markedly decreased performance.  Looking
>> at iostat on the OSD nodes had most disks pegged at 100% util.
>>
>> I know there are thread priorities that can be set for clients vs
>> recovery, but I'm not sure what deleting a snapshot falls under.  I
>> couldn't really find anything relevant.  Is there anything I can tweak
>> to lower the priority of such an operation?  I didn't need it to
>> complete fast, as "rbd snap rm" returns immediately and the actual
>> deletion is done asynchronously.  I'd be fine with it taking longer at
>> a lower priority, but as it stands now it brings my cluster to a crawl
>> and is causing issues with several VMs.
>>
>> I see an "osd snap trim thread timeout" option in the docs -- Is the
>> operation occuring here what you would call snap trimming?  If so, any
>> chance of adding an option for "osd snap trim priority" just like
>> there is for osd client op and osd recovery op?
>>
>> Hope what I am saying makes sense...
>>
>>   - Travis
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Scaling radosgw module

2013-09-27 Thread Mark Nelson
Likely on the radosgw side you are going to see the top consumers be 
malloc/free/memcpy/memcmp.  If you have kernel 3.9 or newer compiled 
with libunwind, you might get better callgraphs in perf which could be 
helpful.


Mark

On 09/27/2013 01:56 PM, Somnath Roy wrote:

Yes, I understand that..
I tried with thread pool size of 300 (default 100, I believe). I am in process 
of running perf on radosgw as well as on osds for profiling.
BTW,  let me know if any particular ceph component  you want me to focus.

Thanks & Regards
Somnath

-Original Message-
From: Mark Nelson [mailto:mark.nel...@inktank.com]
Sent: Friday, September 27, 2013 11:50 AM
To: Somnath Roy
Cc: Yehuda Sadeh; ceph-users@lists.ceph.com; Anirban Ray; 
ceph-de...@vger.kernel.org
Subject: Re: [ceph-users] Scaling radosgw module

Hi Somnath,

With SSDs, you almost certainly are going to be running into bottlenecks on the 
RGW side... Maybe even fastcgi or apache depending on the machine and how 
things are configured.  Unfortunately this is probably one of the more complex 
performance optimization scenarios in the Ceph world and is going to require 
figuring out exactly where things are slowing down.

I don't remember if you've done this already, but you could try increasing the 
number of radosgw threads and try to throw more concurrency at the problem, but 
other than that it's probably going to come down to profiling, and lots of it. 
:)

Mark

On 09/26/2013 07:04 PM, Somnath Roy wrote:

Hi Yehuda,
With my 3 node cluster (30 OSDs in total, all in ssds), I am getting avg of 
~3000 Gets/s from a single swift-bench client hitting single radosgw instance. 
Put is ~1000/s. BTW, I am not able to generate very big load yet and as the 
server has ~140G RAM, all the GET requests are served from memory , no disk 
utilization here.

Thanks & Regards
Somnath

-Original Message-
From: Yehuda Sadeh [mailto:yeh...@inktank.com]
Sent: Thursday, September 26, 2013 4:48 PM
To: Somnath Roy
Cc: Mark Nelson; ceph-users@lists.ceph.com; Anirban Ray;
ceph-de...@vger.kernel.org
Subject: Re: [ceph-users] Scaling radosgw module

You specify the relative performance, but what the actual numbers that you're 
seeing? How many GETs per second, and how many PUTs per second do you see?

On Thu, Sep 26, 2013 at 4:00 PM, Somnath Roy  wrote:

Mark,
One more thing, all my test is with rgw cache enabled , disabling the cache the 
performance is around 3x slower.

Thanks & Regards
Somnath

-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy
Sent: Thursday, September 26, 2013 3:59 PM
To: Mark Nelson
Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org; Anirban
Ray
Subject: RE: [ceph-users] Scaling radosgw module

Nope...With one client hitting the radaosgw , the daemon cpu usage is going up 
till 400-450% i.e taking in avg 4 core..In one client scenario, the server node 
(having radosgw + osds) cpu usage is ~80% idle and out of the 20% usage bulk is 
consumed by radosgw.

Thanks & Regards
Somnath

-Original Message-
From: Mark Nelson [mailto:mark.nel...@inktank.com]
Sent: Thursday, September 26, 2013 3:50 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org; Anirban
Ray
Subject: Re: [ceph-users] Scaling radosgw module

Ah, that's very good to know!

And RGW CPU usage you said was low?

Mark

On 09/26/2013 05:40 PM, Somnath Roy wrote:

Mark,
I did set up 3 radosgw servers in 3 server nodes and the tested with 3 
swift-bench client hitting 3 radosgw in the same time. I saw the aggregated 
throughput is linearly scaling. But, as an individual radosgw performance is 
very low we need to put lots of radosgw/apache server combination to get very 
high throughput. I guess that will be a problem.
I will try to do some profiling and share the data.

Thanks & Regards
Somnath

-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Mark Nelson
Sent: Thursday, September 26, 2013 3:33 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org; Anirban
Ray
Subject: Re: [ceph-users] Scaling radosgw module

It's kind of annoying, but it may be worth setting up a 2nd RGW server and 
seeing if having two copies of the benchmark going at the same time on two 
separate RGW servers increases aggregate throughput.

Also, it may be worth tracking down latencies with messenger
debugging enabled, but I'm afraid I'm pretty bogged down right now
and probably wouldn't be able to look at it for a while. :(

Mark

On 09/26/2013 05:15 PM, Somnath Roy wrote:

Hi Mark,
FYI, I tried with wip-6286-dumpling release and the results are the same for 
me. The radosgw throughput is around ~6x slower than the single rados bench 
output!
 Any other suggestion ?

Thanks & Regards
Somnath
-Original Message-
From: Somnath Roy
Sent: Friday, September 20, 2013 4:08 PM
To: 'Mark Nelson'
Cc: ceph-users@

Re: [ceph-users] Scaling radosgw module

2013-09-27 Thread Somnath Roy
Yes, I understand that..
I tried with thread pool size of 300 (default 100, I believe). I am in process 
of running perf on radosgw as well as on osds for profiling.
BTW,  let me know if any particular ceph component  you want me to focus.

Thanks & Regards
Somnath

-Original Message-
From: Mark Nelson [mailto:mark.nel...@inktank.com] 
Sent: Friday, September 27, 2013 11:50 AM
To: Somnath Roy
Cc: Yehuda Sadeh; ceph-users@lists.ceph.com; Anirban Ray; 
ceph-de...@vger.kernel.org
Subject: Re: [ceph-users] Scaling radosgw module

Hi Somnath,

With SSDs, you almost certainly are going to be running into bottlenecks on the 
RGW side... Maybe even fastcgi or apache depending on the machine and how 
things are configured.  Unfortunately this is probably one of the more complex 
performance optimization scenarios in the Ceph world and is going to require 
figuring out exactly where things are slowing down.

I don't remember if you've done this already, but you could try increasing the 
number of radosgw threads and try to throw more concurrency at the problem, but 
other than that it's probably going to come down to profiling, and lots of it. 
:)

Mark

On 09/26/2013 07:04 PM, Somnath Roy wrote:
> Hi Yehuda,
> With my 3 node cluster (30 OSDs in total, all in ssds), I am getting avg of 
> ~3000 Gets/s from a single swift-bench client hitting single radosgw 
> instance. Put is ~1000/s. BTW, I am not able to generate very big load yet 
> and as the server has ~140G RAM, all the GET requests are served from memory 
> , no disk utilization here.
>
> Thanks & Regards
> Somnath
>
> -Original Message-
> From: Yehuda Sadeh [mailto:yeh...@inktank.com]
> Sent: Thursday, September 26, 2013 4:48 PM
> To: Somnath Roy
> Cc: Mark Nelson; ceph-users@lists.ceph.com; Anirban Ray; 
> ceph-de...@vger.kernel.org
> Subject: Re: [ceph-users] Scaling radosgw module
>
> You specify the relative performance, but what the actual numbers that you're 
> seeing? How many GETs per second, and how many PUTs per second do you see?
>
> On Thu, Sep 26, 2013 at 4:00 PM, Somnath Roy  wrote:
>> Mark,
>> One more thing, all my test is with rgw cache enabled , disabling the cache 
>> the performance is around 3x slower.
>>
>> Thanks & Regards
>> Somnath
>>
>> -Original Message-
>> From: ceph-devel-ow...@vger.kernel.org 
>> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy
>> Sent: Thursday, September 26, 2013 3:59 PM
>> To: Mark Nelson
>> Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org; Anirban 
>> Ray
>> Subject: RE: [ceph-users] Scaling radosgw module
>>
>> Nope...With one client hitting the radaosgw , the daemon cpu usage is going 
>> up till 400-450% i.e taking in avg 4 core..In one client scenario, the 
>> server node (having radosgw + osds) cpu usage is ~80% idle and out of the 
>> 20% usage bulk is consumed by radosgw.
>>
>> Thanks & Regards
>> Somnath
>>
>> -Original Message-
>> From: Mark Nelson [mailto:mark.nel...@inktank.com]
>> Sent: Thursday, September 26, 2013 3:50 PM
>> To: Somnath Roy
>> Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org; Anirban 
>> Ray
>> Subject: Re: [ceph-users] Scaling radosgw module
>>
>> Ah, that's very good to know!
>>
>> And RGW CPU usage you said was low?
>>
>> Mark
>>
>> On 09/26/2013 05:40 PM, Somnath Roy wrote:
>>> Mark,
>>> I did set up 3 radosgw servers in 3 server nodes and the tested with 3 
>>> swift-bench client hitting 3 radosgw in the same time. I saw the aggregated 
>>> throughput is linearly scaling. But, as an individual radosgw performance 
>>> is very low we need to put lots of radosgw/apache server combination to get 
>>> very high throughput. I guess that will be a problem.
>>> I will try to do some profiling and share the data.
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>> -Original Message-
>>> From: ceph-devel-ow...@vger.kernel.org 
>>> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Mark Nelson
>>> Sent: Thursday, September 26, 2013 3:33 PM
>>> To: Somnath Roy
>>> Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org; Anirban 
>>> Ray
>>> Subject: Re: [ceph-users] Scaling radosgw module
>>>
>>> It's kind of annoying, but it may be worth setting up a 2nd RGW server and 
>>> seeing if having two copies of the benchmark going at the same time on two 
>>> separate RGW servers increases aggregate throughput.
>>>
>>> Also, it may be worth tracking down latencies with messenger 
>>> debugging enabled, but I'm afraid I'm pretty bogged down right now 
>>> and probably wouldn't be able to look at it for a while. :(
>>>
>>> Mark
>>>
>>> On 09/26/2013 05:15 PM, Somnath Roy wrote:
 Hi Mark,
 FYI, I tried with wip-6286-dumpling release and the results are the same 
 for me. The radosgw throughput is around ~6x slower than the single rados 
 bench output!
 Any other suggestion ?

 Thanks & Regards
 Somnath
 -Original Message-
 From: Somnath Roy
 Sent: Friday, September

Re: [ceph-users] Scaling radosgw module

2013-09-27 Thread Mark Nelson

Hi Somnath,

With SSDs, you almost certainly are going to be running into bottlenecks 
on the RGW side... Maybe even fastcgi or apache depending on the machine 
and how things are configured.  Unfortunately this is probably one of 
the more complex performance optimization scenarios in the Ceph world 
and is going to require figuring out exactly where things are slowing down.


I don't remember if you've done this already, but you could try 
increasing the number of radosgw threads and try to throw more 
concurrency at the problem, but other than that it's probably going to 
come down to profiling, and lots of it. :)


Mark

On 09/26/2013 07:04 PM, Somnath Roy wrote:

Hi Yehuda,
With my 3 node cluster (30 OSDs in total, all in ssds), I am getting avg of 
~3000 Gets/s from a single swift-bench client hitting single radosgw instance. 
Put is ~1000/s. BTW, I am not able to generate very big load yet and as the 
server has ~140G RAM, all the GET requests are served from memory , no disk 
utilization here.

Thanks & Regards
Somnath

-Original Message-
From: Yehuda Sadeh [mailto:yeh...@inktank.com]
Sent: Thursday, September 26, 2013 4:48 PM
To: Somnath Roy
Cc: Mark Nelson; ceph-users@lists.ceph.com; Anirban Ray; 
ceph-de...@vger.kernel.org
Subject: Re: [ceph-users] Scaling radosgw module

You specify the relative performance, but what the actual numbers that you're 
seeing? How many GETs per second, and how many PUTs per second do you see?

On Thu, Sep 26, 2013 at 4:00 PM, Somnath Roy  wrote:

Mark,
One more thing, all my test is with rgw cache enabled , disabling the cache the 
performance is around 3x slower.

Thanks & Regards
Somnath

-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy
Sent: Thursday, September 26, 2013 3:59 PM
To: Mark Nelson
Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org; Anirban Ray
Subject: RE: [ceph-users] Scaling radosgw module

Nope...With one client hitting the radaosgw , the daemon cpu usage is going up 
till 400-450% i.e taking in avg 4 core..In one client scenario, the server node 
(having radosgw + osds) cpu usage is ~80% idle and out of the 20% usage bulk is 
consumed by radosgw.

Thanks & Regards
Somnath

-Original Message-
From: Mark Nelson [mailto:mark.nel...@inktank.com]
Sent: Thursday, September 26, 2013 3:50 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org; Anirban Ray
Subject: Re: [ceph-users] Scaling radosgw module

Ah, that's very good to know!

And RGW CPU usage you said was low?

Mark

On 09/26/2013 05:40 PM, Somnath Roy wrote:

Mark,
I did set up 3 radosgw servers in 3 server nodes and the tested with 3 
swift-bench client hitting 3 radosgw in the same time. I saw the aggregated 
throughput is linearly scaling. But, as an individual radosgw performance is 
very low we need to put lots of radosgw/apache server combination to get very 
high throughput. I guess that will be a problem.
I will try to do some profiling and share the data.

Thanks & Regards
Somnath

-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Mark Nelson
Sent: Thursday, September 26, 2013 3:33 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org; Anirban
Ray
Subject: Re: [ceph-users] Scaling radosgw module

It's kind of annoying, but it may be worth setting up a 2nd RGW server and 
seeing if having two copies of the benchmark going at the same time on two 
separate RGW servers increases aggregate throughput.

Also, it may be worth tracking down latencies with messenger
debugging enabled, but I'm afraid I'm pretty bogged down right now
and probably wouldn't be able to look at it for a while. :(

Mark

On 09/26/2013 05:15 PM, Somnath Roy wrote:

Hi Mark,
FYI, I tried with wip-6286-dumpling release and the results are the same for 
me. The radosgw throughput is around ~6x slower than the single rados bench 
output!
Any other suggestion ?

Thanks & Regards
Somnath
-Original Message-
From: Somnath Roy
Sent: Friday, September 20, 2013 4:08 PM
To: 'Mark Nelson'
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] Scaling radosgw module

Hi Mark,
It's a test cluster and I will try with the new release.
As I mentioned in the mail, I think number of rados client instance is the 
limitation. Could you please let me know how many rados client instance the 
radosgw daemon is instantiating ? Is it configurable somehow ?

Thanks & Regards
Somnath

-Original Message-
From: Mark Nelson [mailto:mark.nel...@inktank.com]
Sent: Friday, September 20, 2013 4:02 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Scaling radosgw module

On 09/20/2013 05:49 PM, Somnath Roy wrote:

Hi Mark,
Thanks for your quick response.
I tried adding the 'num_container = 100' in the job file and found that the 
performance actually decreasing with that option. I am get

Re: [ceph-users] RBD Snap removal priority

2013-09-27 Thread Mike Dawson

[cc ceph-devel]

Travis,

RBD doesn't behave well when Ceph maintainance operations create spindle 
contention (i.e. 100% util from iostat). More about that below.


Do you run XFS under your OSDs? If so, can you check for extent 
fragmentation? Should be something like:


xfs_db -c frag -r /dev/sdb1

We recently saw a fragmentation factors of over 80%, with lots of ino's 
having hundreds of extents. After 24 hours+ of defrag'ing, we got it 
under control, but we're seeing the fragmentation factor grow by ~1.5% 
daily. We experienced spindle contention issues even after the defrag.




Sage, Sam, etc,

I think the real issue is Ceph has several states where it performs what 
I would call "maintanance operations" that saturate the underlying 
storage without properly yielding to client i/o (which should have a 
higher priority).


I have experienced or seen reports of Ceph maintainance affecting rbd 
client i/o in many ways:


- QEMU/RBD Client I/O Stalls or Halts Due to Spindle Contention from 
Ceph Maintainance [1]

- Recovery and/or Backfill Cause QEMU/RBD Reads to Hang [2]
- rbd snap rm (Travis' report below)

[1] http://tracker.ceph.com/issues/6278
[2] http://tracker.ceph.com/issues/6333

I think this family of issues speak to the need for Ceph to have more 
visibility into the underlying storage's limitations (especially spindle 
contention) when performing known expensive maintainance operations.


Thanks,
Mike Dawson

On 9/27/2013 12:25 PM, Travis Rhoden wrote:

Hello everyone,

I'm running a Cuttlefish cluster that hosts a lot of RBDs.  I recently
removed a snapshot of a large one (rbd snap rm -- 12TB), and I noticed
that all of the clients had markedly decreased performance.  Looking
at iostat on the OSD nodes had most disks pegged at 100% util.

I know there are thread priorities that can be set for clients vs
recovery, but I'm not sure what deleting a snapshot falls under.  I
couldn't really find anything relevant.  Is there anything I can tweak
to lower the priority of such an operation?  I didn't need it to
complete fast, as "rbd snap rm" returns immediately and the actual
deletion is done asynchronously.  I'd be fine with it taking longer at
a lower priority, but as it stands now it brings my cluster to a crawl
and is causing issues with several VMs.

I see an "osd snap trim thread timeout" option in the docs -- Is the
operation occuring here what you would call snap trimming?  If so, any
chance of adding an option for "osd snap trim priority" just like
there is for osd client op and osd recovery op?

Hope what I am saying makes sense...

  - Travis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] gateway instance

2013-09-27 Thread Yehuda Sadeh
On Fri, Sep 27, 2013 at 1:10 AM, lixuehui  wrote:
> Hi all
> Does gateway instances mean multi-process of a gateway user for a ceph
> cluster. Though they were configured independently during the configure
> file,can they configured with zones among different region?


Not sure I follow your question, but basically each gateway process
(or 'instance') can be set to control different zone. Moreover, a
single instance cannot control more than one zone.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance and disk usage of snapshots

2013-09-27 Thread Mark Nelson

Hi Corin!

On 09/24/2013 11:37 AM, Corin Langosch wrote:

Hi there,

do snapshots have an impact on write performance? I assume on each write
all snapshots have to get updated (cow) so the more snapshots exist the
worse write performance will get?


I'll be honest, I haven't tested it so I'm not sure how much impact 
there actually is.  If you are really interested and wouldn't mind doing 
some testing, I would love to see the results!




Is there any way to see how much disk space a snapshot occupies? I
assume because of cow snapshots start with 0 real disk usage and grow
over time as the underlying object changes?


I'm not an expert here, but does rbd ls -l help?

See: http://linux.die.net/man/8/rbd



Corin

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] failure starting radosgw after setting up object storage

2013-09-27 Thread Yehuda Sadeh
On Wed, Sep 25, 2013 at 2:07 PM, Gruher, Joseph R
 wrote:
> Hi all-
>
>
>
> I am following the object storage quick start guide.  I have a cluster with
> two OSDs and have followed the steps on both.  Both are failing to start
> radosgw but each in a different manner.  All the previous steps in the quick
> start guide appeared to complete successfully.  Any tips on how to debug
> from here?  Thanks!
>
>
>
>
>
> OSD1:
>
>
>
> ceph@cephtest05:/etc/ceph$ sudo /etc/init.d/radosgw start
>
> ceph@cephtest05:/etc/ceph$
>
>
>
> ceph@cephtest05:/etc/ceph$ sudo /etc/init.d/radosgw status
>
> /usr/bin/radosgw is not running.
>
> ceph@cephtest05:/etc/ceph$
>
>
>
> ceph@cephtest05:/etc/ceph$ cat /var/log/ceph/radosgw.log
>
> ceph@cephtest05:/etc/ceph$
>
>
>
>
>
> OSD2:
>
>
>
> ceph@cephtest06:/etc/ceph$ sudo /etc/init.d/radosgw start
>
> Starting client.radosgw.gateway...
>
> 2013-09-25 14:03:01.235789 7f713d79d780 -1 WARNING: libcurl doesn't support
> curl_multi_wait()
>
> 2013-09-25 14:03:01.235797 7f713d79d780 -1 WARNING: cross zone / region
> transfer performance may be affected
>
> ceph@cephtest06:/etc/ceph$
>
>
>
> ceph@cephtest06:/etc/ceph$ sudo /etc/init.d/radosgw status
>
> /usr/bin/radosgw is not running.
>
> ceph@cephtest06:/etc/ceph$
>
>
>
> ceph@cephtest06:/etc/ceph$ cat /var/log/ceph/radosgw.log
>
> 2013-09-25 14:03:01.235760 7f713d79d780  0 ceph version 0.67.3
> (408cd61584c72c0d97b774b3d8f95c6b1b06341a), process radosgw, pid 13187
>
> 2013-09-25 14:03:01.235789 7f713d79d780 -1 WARNING: libcurl doesn't support
> curl_multi_wait()
>
> 2013-09-25 14:03:01.235797 7f713d79d780 -1 WARNING: cross zone / region
> transfer performance may be affected
>
> 2013-09-25 14:03:01.245786 7f713d79d780  0 librados: client.radosgw.gateway
> authentication error (1) Operation not permitted
>
> 2013-09-25 14:03:01.246526 7f713d79d780 -1 Couldn't init storage provider
> (RADOS)


This means that the radosgw process cannot connect to the cluster due
to user / key set up. Make sure that the user for radosgw exists, and
that the ceph keyring file (on the radosgw side) has the correct
credentials set.


Yehuda

>
> ceph@cephtest06:/etc/ceph$
>
>
>
>
>
> For reference, I think cluster health is OK:
>
>
>
> ceph@cephtest06:/etc/ceph$ sudo ceph status
>
>   cluster a45e6e54-70ef-4470-91db-2152965deec5
>
>health HEALTH_WARN clock skew detected on mon.cephtest03, mon.cephtest04
>
>monmap e1: 3 mons at
> {cephtest02=10.0.0.2:6789/0,cephtest03=10.0.0.3:6789/0,cephtest04=10.0.0.4:6789/0},
> election epoch 6, quorum 0,1,2 cephtest02,cephtest03,cephtest04
>
>osdmap e9: 2 osds: 2 up, 2 in
>
> pgmap v439: 192 pgs: 192 active+clean; 0 bytes data, 72548 KB used, 1998
> GB / 1999 GB avail
>
>mdsmap e1: 0/0/1 up
>
>
>
> ceph@cephtest06:/etc/ceph$ sudo ceph health
>
> HEALTH_WARN clock skew detected on mon.cephtest03, mon.cephtest04
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD Snap removal priority

2013-09-27 Thread Travis Rhoden
Hello everyone,

I'm running a Cuttlefish cluster that hosts a lot of RBDs.  I recently
removed a snapshot of a large one (rbd snap rm -- 12TB), and I noticed
that all of the clients had markedly decreased performance.  Looking
at iostat on the OSD nodes had most disks pegged at 100% util.

I know there are thread priorities that can be set for clients vs
recovery, but I'm not sure what deleting a snapshot falls under.  I
couldn't really find anything relevant.  Is there anything I can tweak
to lower the priority of such an operation?  I didn't need it to
complete fast, as "rbd snap rm" returns immediately and the actual
deletion is done asynchronously.  I'd be fine with it taking longer at
a lower priority, but as it stands now it brings my cluster to a crawl
and is causing issues with several VMs.

I see an "osd snap trim thread timeout" option in the docs -- Is the
operation occuring here what you would call snap trimming?  If so, any
chance of adding an option for "osd snap trim priority" just like
there is for osd client op and osd recovery op?

Hope what I am saying makes sense...

 - Travis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Pool Specification?

2013-09-27 Thread Aronesty, Erik
I see it's the undocumented "ceph.dir.layout.pool"

Something like:

setfattr -n ceph.dir.layout.pool -v mynewpool 

On an empty dir should work.   I'd like one directory to be more heavily 
mirrored so that a) objects are more likely to be on a less busy server b) 
availability increases (at the expense of size/write speed).


-Original Message-
From: Gregory Farnum [mailto:g...@inktank.com] 
Sent: Friday, September 27, 2013 11:14 AM
To: Aronesty, Erik
Cc: Aaron Ten Clay; Sage Weil; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] CephFS Pool Specification?

On Fri, Sep 27, 2013 at 7:10 AM, Aronesty, Erik
 wrote:
> Ø  You can also create additional data pools and map directories to them,
> but
>
>
> this probably isn't what you need (yet).
>
> Is there a link to a web page where you can read how to map a directory to a
> pool?  (I googled ceph map directory to pool . and got this post)

Nothing official at this point. Sébastien wrote a short blog about it
earlier this year that will give you the basics:
http://www.sebastien-han.fr/blog/2013/02/11/mount-a-specific-pool-with-cephfs/
But at this point it's easier to use the virtual xattrs at
"ceph.dir.layout", as shown in this ticket:
http://tracker.ceph.com/issues/4215
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Pool Specification?

2013-09-27 Thread Gregory Farnum
On Fri, Sep 27, 2013 at 7:10 AM, Aronesty, Erik
 wrote:
> Ø  You can also create additional data pools and map directories to them,
> but
>
>
> this probably isn't what you need (yet).
>
> Is there a link to a web page where you can read how to map a directory to a
> pool?  (I googled ceph map directory to pool … and got this post)

Nothing official at this point. Sébastien wrote a short blog about it
earlier this year that will give you the basics:
http://www.sebastien-han.fr/blog/2013/02/11/mount-a-specific-pool-with-cephfs/
But at this point it's easier to use the virtual xattrs at
"ceph.dir.layout", as shown in this ticket:
http://tracker.ceph.com/issues/4215
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Pool Specification?

2013-09-27 Thread Aronesty, Erik
Ø  You can also create additional data pools and map directories to them, but
this probably isn't what you need (yet).

Is there a link to a web page where you can read how to map a directory to a 
pool?  (I googled ceph map directory to pool ... and got this post)

From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Aaron Ten Clay
Sent: Thursday, September 26, 2013 5:15 PM
To: Sage Weil
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] CephFS Pool Specification?

On Wed, Sep 25, 2013 at 8:44 PM, Sage Weil 
mailto:s...@inktank.com>> wrote:
On Wed, 25 Sep 2013, Aaron Ten Clay wrote:
> Hi all,
>
> Does anyone know how to specify which pool the mds and CephFS data will be
> stored in?
>
> After creating a new cluster, the pools "data", "metadata", and "rbd" all
> exist but with pg count too small to be useful. The documentation indicates
> the pg count can be set only at pool creation time,
This is no longer true. Can you tell us where you read it so we can fix
the documentation?

 ceph osd pool set data pg_num 1234
 ceph osd pool set data pgp_num 1234

Repeat for metadata and/or rbd with an appropriate pg count.

Thanks! Maybe I just misinterpreted the documentation. The page

http://ceph.com/docs/master/rados/operations/placement-groups/
implies (to me, anyway) that the number of placement groups can't be changed 
once a pool is created. Under the "Set Pool Values" heading, pg_num isn't 
listed as an option.


> so I am working under the assumption I must create a new pool with a
> larger pg count and use that for CephFS and the mds storage.
You can also create additional data pools and map directories to them, but
this probably isn't what you need (yet).

sage

-Aaron
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy issues on RHEL6.4

2013-09-27 Thread Alfredo Deza
On Fri, Sep 27, 2013 at 3:30 AM, Guang  wrote:
> Hi ceph-users,
> I recently deployed a ceph cluster with use of *ceph-deploy* utility, on
> RHEL6.4, during the time, I came across a couple of issues / questions which
> I would like to ask for your help.
>
> 1. ceph-deploy does not help to install dependencies (snappy leveldb gdisk
> python-argparse gperftools-libs) on the target host, so I will need to
> manually install those dependencies before performing 'ceph-deploy install
> {host_name}'. I am investigate the way to deploy ceph onto a hundred nodes
> and it is time-consuming to manually install those dependencies manually. Am
> I missing something here? I am thinking the dependency installation should
> be handled by *ceph-deploy* itself.
>
> 2. When performing 'ceph-deploy -v disk zap ceph.host.name:/dev/sdb', I have
> the following errors:
> [ceph_deploy.osd][DEBUG ] zapping /dev/sdc on ceph.host.name
> [ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection with sudo
> Traceback (most recent call last):
>  File "/usr/bin/ceph-deploy", line 21, in 
>sys.exit(main())
>  File "/usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py",
> line 83, in newfunc
>return f(*a, **kw)
>  File "/usr/lib/python2.6/site-packages/ceph_deploy/cli.py", line 147, in
> main
>return args.func(args)
>  File "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 381, in
> disk
>disk_zap(args)
>  File "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 317, in
> disk_zap
>zap_r(disk)
>  File "/usr/lib/python2.6/site-packages/pushy/protocol/proxy.py", line 255,
> in 
>(conn.operator(type_, self, args, kwargs))
>  File "/usr/lib/python2.6/site-packages/pushy/protocol/connection.py", line
> 66, in operator
>return self.send_request(type_, (object, args, kwargs))
>  File "/usr/lib/python2.6/site-packages/pushy/protocol/baseconnection.py",
> line 329, in send_request
>return self.__handle(m)
>  File "/usr/lib/python2.6/site-packages/pushy/protocol/baseconnection.py",
> line 645, in __handle
>raise e
> pushy.protocol.proxy.ExceptionProxy: [Errno 2] No such file or directory

ceph-deploy should handle better this specific problem, you've hit the
issue where $PATH is not enabled for a certain user over ssh, so it
will not be able to execute commands because it can't find the
executables.

A temporary workaround is to set the $PATH for all users explicitly
until we fix this issue (I opened: http://tracker.ceph.com/issues/6428
to track this).

>
> And then I logon to the host to perform 'ceph-disk zap /dev/sdb' and it can
> be successful without any issues.
>
> 3. When performing 'ceph-deploy -v disk activate  ceph.host.name:/dev/sdb',
> I have the following errors:
> ceph_deploy.osd][DEBUG ] Activating cluster ceph disks
> ceph.host.name:/dev/sdb:
> [ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection with sudo
> [ceph_deploy.osd][DEBUG ] Activating host ceph.host.name disk /dev/sdb
> [ceph_deploy.osd][DEBUG ] Distro RedHatEnterpriseServer codename Santiago,
> will use sysvinit
> Traceback (most recent call last):
>  File "/usr/bin/ceph-deploy", line 21, in 
>sys.exit(main())
>  File "/usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py",
> line 83, in newfunc
>return f(*a, **kw)
>  File "/usr/lib/python2.6/site-packages/ceph_deploy/cli.py", line 147, in
> main
>return args.func(args)
>  File "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 379, in
> disk
>activate(args, cfg)
>  File "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 271, in
> activate
>cmd=cmd, ret=ret, out=out, err=err)
> NameError: global name 'ret' is not defined

Ah good find, somehow this error went pass our checks, I just opened an issue to
fix this asap: http://tracker.ceph.com/issues/6427

>
> Also, I logon to the host to perform 'ceph-disk activate /dev/sdb' and it is
> good.
>
> Any help is appreciated.
>
> Thanks,
> Guang
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG distribution scattered

2013-09-27 Thread Niklas Goerke

Sorry for replying only now, I did not get to try it earlier…

On Thu, 19 Sep 2013 08:43:11 -0500, Mark Nelson wrote:

On 09/19/2013 08:36 AM, Niklas Goerke wrote:

[…]

My Setup:
* Two Hosts with 45 Disks each --> 90 OSDs
* Only one newly created pool with 4500 PGs and a Replica Size of 2 
-->

should be about 100 PGs per OSD

What I found was that one OSD only had 72 PGs, while another had 123 
PGs
[1]. That means that - if I did the math correctly - I can only fill 
the

cluster to about 81%, because thats when the first OSD is completely
full[2].


Does distribution improve if you make a pool with significantly more 
PGs?


Yes it does. I tried 45000 PGs and got a range of minimum 922 to a 
maximum of 1066 PGs per OSD (average is 1000). This is better, I can now 
fill my cluster up to 93,8% (theoretically) but I still don't get why I 
would want to limit myself to that. Also 1000 PGs are was to many for 
one OSD (I think 100 is suggested). What should I do about this?


I did some experimenting and found, that if I add another pool with 
4500

PGs, each OSD will have exacly doubled the amount of PGs as with one
pool. So this is not an accident (tried it multiple times). On 
another

test-cluster with 4 Hosts and 15 Disks each, the Distribution was
similarly worse.


This is a bug that causes each pool to more or less be distributed
the same way on the same hosts.  We have a fix, but it impacts
backwards compatibility so it's off by default.  If you set:

osd pool default flag hashpspool = true

Theoretically that will cause different pools to be distributed more
randomly.

I did not try this, becuase in my production scenario we will probably 
only have one or two very large pools, so it does not matter all that 
much to me.



[…]


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy issues on RHEL6.4

2013-09-27 Thread Mariusz Gronczewski
Dnia 2013-09-27, o godz. 15:30:21
Guang  napisał(a):

> Hi ceph-users,
> I recently deployed a ceph cluster with use of *ceph-deploy* utility,
> on RHEL6.4, during the time, I came across a couple of issues /
> questions which I would like to ask for your help.
> 
> 1. ceph-deploy does not help to install dependencies (snappy leveldb
> gdisk python-argparse gperftools-libs) on the target host, so I will
> need to manually install those dependencies before performing
> 'ceph-deploy install {host_name}'. I am investigate the way to deploy
> ceph onto a hundred nodes and it is time-consuming to manually
> install those dependencies manually. Am I missing something here? I
> am thinking the dependency installation should be handled by
> *ceph-deploy* itself.


You might want to use some kind of configuration management system,
like Puppet, for that.

It is not ceph specific (so you can use it for everything) and there are
modules to manage ceph.

It *is* harder to start with than just doing ceph-deploy, but if you
would want to install anything more than only ceph on nodes it is very
useful.

Of course, nothing stops you from just using puppet to install deps and
ceph-deploy for all ceph related stuff

Cheers
XANi


signature.asc
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] gateway instance

2013-09-27 Thread lixuehui
Hi all
Does gateway instances mean multi-process of a gateway user for a ceph cluster. 
Though they were configured independently during the configure file,can they 
configured with zones among different region?


lixuehui___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy issues on RHEL6.4

2013-09-27 Thread Guang
Hi ceph-users,
I recently deployed a ceph cluster with use of *ceph-deploy* utility, on 
RHEL6.4, during the time, I came across a couple of issues / questions which I 
would like to ask for your help.

1. ceph-deploy does not help to install dependencies (snappy leveldb gdisk 
python-argparse gperftools-libs) on the target host, so I will need to manually 
install those dependencies before performing 'ceph-deploy install {host_name}'. 
I am investigate the way to deploy ceph onto a hundred nodes and it is 
time-consuming to manually install those dependencies manually. Am I missing 
something here? I am thinking the dependency installation should be handled by 
*ceph-deploy* itself.

2. When performing 'ceph-deploy -v disk zap ceph.host.name:/dev/sdb', I have 
the following errors:
[ceph_deploy.osd][DEBUG ] zapping /dev/sdc on ceph.host.name
[ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection with sudo
Traceback (most recent call last):
  File "/usr/bin/ceph-deploy", line 21, in 
sys.exit(main())
  File "/usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py", line 
83, in newfunc
return f(*a, **kw)
  File "/usr/lib/python2.6/site-packages/ceph_deploy/cli.py", line 147, in main
return args.func(args)
  File "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 381, in disk
disk_zap(args)
  File "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 317, in 
disk_zap
zap_r(disk)
  File "/usr/lib/python2.6/site-packages/pushy/protocol/proxy.py", line 255, in 

(conn.operator(type_, self, args, kwargs))
  File "/usr/lib/python2.6/site-packages/pushy/protocol/connection.py", line 
66, in operator
return self.send_request(type_, (object, args, kwargs))
  File "/usr/lib/python2.6/site-packages/pushy/protocol/baseconnection.py", 
line 329, in send_request
return self.__handle(m)
  File "/usr/lib/python2.6/site-packages/pushy/protocol/baseconnection.py", 
line 645, in __handle
raise e
pushy.protocol.proxy.ExceptionProxy: [Errno 2] No such file or directory 

And then I logon to the host to perform 'ceph-disk zap /dev/sdb' and it can be 
successful without any issues.

3. When performing 'ceph-deploy -v disk activate  ceph.host.name:/dev/sdb', I 
have the following errors:
ceph_deploy.osd][DEBUG ] Activating cluster ceph disks ceph.host.name:/dev/sdb:
[ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection with sudo
[ceph_deploy.osd][DEBUG ] Activating host ceph.host.name disk /dev/sdb
[ceph_deploy.osd][DEBUG ] Distro RedHatEnterpriseServer codename Santiago, will 
use sysvinit
Traceback (most recent call last):
  File "/usr/bin/ceph-deploy", line 21, in 
sys.exit(main())
  File "/usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py", line 
83, in newfunc
return f(*a, **kw)
  File "/usr/lib/python2.6/site-packages/ceph_deploy/cli.py", line 147, in main
return args.func(args)
  File "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 379, in disk
activate(args, cfg)
  File "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 271, in 
activate
cmd=cmd, ret=ret, out=out, err=err)
NameError: global name 'ret' is not defined

Also, I logon to the host to perform 'ceph-disk activate /dev/sdb' and it is 
good.

Any help is appreciated.

Thanks,
Guang___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com