[ceph-users] ceph df full allocation

2015-02-27 Thread pixelfairy
is there a way to see how much data is allocated as opposed to just
what was used? for example, this 20gig image is only taking up 8gigs.
id like to see a df with the full allocation of images.

root@ceph1:~# rbd --image vm-101-disk-1 info
rbd image 'vm-101-disk-1':
size 20480 MB in 5120 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.105c2ae8944a
format: 2
features: layering
root@ceph1:~# ceph df
GLOBAL:
SIZE   AVAIL  RAW USED %RAW USED
66850G 66842G8136M  0.01
POOLS:
NAME ID USED  %USED MAX AVAIL OBJECTS
rbd  2  2563M 022280G 671
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] old osds take much longer to start than newer osd

2015-02-27 Thread Corin Langosch
Hi guys,

I'm using ceph for a long time now, since bobtail. I always upgraded every few 
weeks/ months to the latest stable
release. Of course I also removed some osds and added new ones. Now during the 
last few upgrades (I just upgraded from
80.6 to 80.8) I noticed that old osds take much longer to startup than equal 
newer osds (same amount of data/ disk
usage, same kind of storage+journal backing device (ssd), same weight, same 
number of pgs, ...). I know I observed the
same behavior earlier but just didn't really care about it. Here are the 
relevant log entries (host of osd.0 and osd.15
has less cpu power than the others):

old osds (average pgs load time: 1.5 minutes)

2015-02-27 13:44:23.134086 7ffbfdcbe780  0 osd.0 19323 load_pgs
2015-02-27 13:49:21.453186 7ffbfdcbe780  0 osd.0 19323 load_pgs opened 824 pgs

2015-02-27 13:41:32.219503 7f197b0dd780  0 osd.3 19317 load_pgs
2015-02-27 13:42:56.310874 7f197b0dd780  0 osd.3 19317 load_pgs opened 776 pgs

2015-02-27 13:38:43.909464 7f450ac90780  0 osd.6 19309 load_pgs
2015-02-27 13:40:40.080390 7f450ac90780  0 osd.6 19309 load_pgs opened 806 pgs

2015-02-27 13:36:14.451275 7f3c41d33780  0 osd.9 19301 load_pgs
2015-02-27 13:37:22.446285 7f3c41d33780  0 osd.9 19301 load_pgs opened 795 pgs

new osds (average pgs load time: 3 seconds)

2015-02-27 13:44:25.529743 7f2004617780  0 osd.15 19325 load_pgs
2015-02-27 13:44:36.197221 7f2004617780  0 osd.15 19325 load_pgs opened 873 pgs

2015-02-27 13:41:29.176647 7fb147fb3780  0 osd.16 19315 load_pgs
2015-02-27 13:41:31.681722 7fb147fb3780  0 osd.16 19315 load_pgs opened 848 pgs

2015-02-27 13:38:41.470761 7f9c404be780  0 osd.17 19307 load_pgs
2015-02-27 13:38:43.737473 7f9c404be780  0 osd.17 19307 load_pgs opened 821 pgs

2015-02-27 13:36:10.997766 7f7315e99780  0 osd.18 19299 load_pgs
2015-02-27 13:36:13.511898 7f7315e99780  0 osd.18 19299 load_pgs opened 815 pgs

The old osds also take more memory, here's an example:

root 15700 22.8  0.7 1423816 485552 ?  Ssl  13:36   4:55 
/usr/bin/ceph-osd -i 9 --pid-file
/var/run/ceph/osd.9.pid -c /etc/ceph/ceph.conf --cluster ceph
root 15270 15.4  0.4 1227140 297032 ?  Ssl  13:36   3:20 
/usr/bin/ceph-osd -i 18 --pid-file
/var/run/ceph/osd.18.pid -c /etc/ceph/ceph.conf --cluster ceph


It seems to me there is still some old data around for the old osds which was 
not properly migrated/ cleaned up during
the upgrades. The cluster is healthy, no problems at all the last few weeks. Is 
there any way to clean this up?

Thanks
Corin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] What does the parameter journal_align_min_size mean?

2015-02-27 Thread Mark Wu
I am wondering how the value of journal_align_min_size gives impact on
journal padding. Is there any document describing the disk layout of
journal?

Thanks for help!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph - networking question

2015-02-27 Thread Tony Harris
Hi all,
I've only been using ceph for a few months now and currently have a small 
cluster (3 nodes, 18 OSDs).  I get decent performance based upon the 
configuration.
My question is, should I have a larger pipe on the client/public network or on 
the ceph cluster private network?  I can only have a larger pipe on one of the 
two.  The most Ceph nodes we'd have in the foreseeable future is 7, current 
client VM Host count is 3 with a max of 5 in the future.
Currently I can near about max out the throughput on the larger pipe with the 
read, but not even close on the write when the larger pipe is connected to the 
public/client side when benchmarking with rados.  The smaller pipe, I still max 
out with read, but still not close with the write (well, close being relative 
to how many replications, if I use 2 replications, I can get 60% of theoretical 
max, 3 replications I get about 40% when using the smaller pipe on the 
client/public side).
Basically, I'm not sure how to determine when I'm getting to a point of when 
the back end cluster private network starts to become the bottleneck that needs 
to be expanded.
-Tony___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed

2015-02-27 Thread Steffen W Sørensen

On 27/02/2015, at 17.20, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote:
 I'd look at two things first. One is the '{fqdn}' string, which I'm not sure 
 whether that's the actual string that you have, or whether you just replaced 
 it for the sake of anonymity. The second is the port number, which should be 
 fine, but maybe the fact that it appears as part of the script uri triggers 
 some issue.
When launching radosgw it logs this:

...
2015-02-27 18:33:58.663960 7f200b67a8a0 20 rados-read obj-ofs=0 read_ofs=0 
read_len=524288
2015-02-27 18:33:58.675821 7f200b67a8a0 20 rados-read r=0 bl.length=678
2015-02-27 18:33:58.676532 7f200b67a8a0 10 cache put: 
name=.rgw.root+zone_info.default
2015-02-27 18:33:58.676573 7f200b67a8a0 10 moving .rgw.root+zone_info.default 
to cache LRU end
2015-02-27 18:33:58.677415 7f200b67a8a0  2 zone default is master
2015-02-27 18:33:58.677666 7f200b67a8a0 20 get_obj_state: rctx=0x2a85cd0 
obj=.rgw.root:region_map state=0x2a86498 s-prefetch_data=0
2015-02-27 18:33:58.677760 7f200b67a8a0 10 cache get: name=.rgw.root+region_map 
: miss
2015-02-27 18:33:58.709411 7f200b67a8a0 10 cache put: name=.rgw.root+region_map
2015-02-27 18:33:58.709846 7f200b67a8a0 10 adding .rgw.root+region_map to cache 
LRU end
2015-02-27 18:33:58.957336 7f1ff17f2700  2 garbage collection: start
2015-02-27 18:33:58.959189 7f1ff0df1700 20 BucketsSyncThread: start
2015-02-27 18:33:58.985486 7f200b67a8a0  0 framework: fastcgi
2015-02-27 18:33:58.985778 7f200b67a8a0  0 framework: civetweb
2015-02-27 18:33:58.985879 7f200b67a8a0  0 framework conf key: port, val: 7480
2015-02-27 18:33:58.986462 7f200b67a8a0  0 starting handler: civetweb
2015-02-27 18:33:59.032173 7f1fc3fff700 20 UserSyncThread: start
2015-02-27 18:33:59.214739 7f200b67a8a0  0 starting handler: fastcgi
2015-02-27 18:33:59.286723 7f1fb59e8700 10 allocated request req=0x2aa1b20
2015-02-27 18:34:00.533188 7f1fc3fff700 20 RGWRados::pool_iterate: got {my user 
name}
2015-02-27 18:34:01.038190 7f1ff17f2700  2 garbage collection: stop
2015-02-27 18:34:01.670780 7f1fc3fff700 20 RGWUserStatsCache: sync user={my 
user name}
2015-02-27 18:34:01.687730 7f1fc3fff700  0 ERROR: can't read user header: ret=-2
2015-02-27 18:34:01.689734 7f1fc3fff700  0 ERROR: sync_user() failed, user={my 
user name} ret=-2

Why does it seem to find my radosgw defined user name as a pool and what might 
bring it to fail to read user header?

/Steffen


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] old osds take much longer to start than newer osd

2015-02-27 Thread Robert LeBlanc
Does deleting/reformatting the old osds improve the performance?

On Fri, Feb 27, 2015 at 6:02 AM, Corin Langosch
corin.lango...@netskin.com wrote:
 Hi guys,

 I'm using ceph for a long time now, since bobtail. I always upgraded every 
 few weeks/ months to the latest stable
 release. Of course I also removed some osds and added new ones. Now during 
 the last few upgrades (I just upgraded from
 80.6 to 80.8) I noticed that old osds take much longer to startup than equal 
 newer osds (same amount of data/ disk
 usage, same kind of storage+journal backing device (ssd), same weight, same 
 number of pgs, ...). I know I observed the
 same behavior earlier but just didn't really care about it. Here are the 
 relevant log entries (host of osd.0 and osd.15
 has less cpu power than the others):

 old osds (average pgs load time: 1.5 minutes)

 2015-02-27 13:44:23.134086 7ffbfdcbe780  0 osd.0 19323 load_pgs
 2015-02-27 13:49:21.453186 7ffbfdcbe780  0 osd.0 19323 load_pgs opened 824 pgs

 2015-02-27 13:41:32.219503 7f197b0dd780  0 osd.3 19317 load_pgs
 2015-02-27 13:42:56.310874 7f197b0dd780  0 osd.3 19317 load_pgs opened 776 pgs

 2015-02-27 13:38:43.909464 7f450ac90780  0 osd.6 19309 load_pgs
 2015-02-27 13:40:40.080390 7f450ac90780  0 osd.6 19309 load_pgs opened 806 pgs

 2015-02-27 13:36:14.451275 7f3c41d33780  0 osd.9 19301 load_pgs
 2015-02-27 13:37:22.446285 7f3c41d33780  0 osd.9 19301 load_pgs opened 795 pgs

 new osds (average pgs load time: 3 seconds)

 2015-02-27 13:44:25.529743 7f2004617780  0 osd.15 19325 load_pgs
 2015-02-27 13:44:36.197221 7f2004617780  0 osd.15 19325 load_pgs opened 873 
 pgs

 2015-02-27 13:41:29.176647 7fb147fb3780  0 osd.16 19315 load_pgs
 2015-02-27 13:41:31.681722 7fb147fb3780  0 osd.16 19315 load_pgs opened 848 
 pgs

 2015-02-27 13:38:41.470761 7f9c404be780  0 osd.17 19307 load_pgs
 2015-02-27 13:38:43.737473 7f9c404be780  0 osd.17 19307 load_pgs opened 821 
 pgs

 2015-02-27 13:36:10.997766 7f7315e99780  0 osd.18 19299 load_pgs
 2015-02-27 13:36:13.511898 7f7315e99780  0 osd.18 19299 load_pgs opened 815 
 pgs

 The old osds also take more memory, here's an example:

 root 15700 22.8  0.7 1423816 485552 ?  Ssl  13:36   4:55 
 /usr/bin/ceph-osd -i 9 --pid-file
 /var/run/ceph/osd.9.pid -c /etc/ceph/ceph.conf --cluster ceph
 root 15270 15.4  0.4 1227140 297032 ?  Ssl  13:36   3:20 
 /usr/bin/ceph-osd -i 18 --pid-file
 /var/run/ceph/osd.18.pid -c /etc/ceph/ceph.conf --cluster ceph


 It seems to me there is still some old data around for the old osds which was 
 not properly migrated/ cleaned up during
 the upgrades. The cluster is healthy, no problems at all the last few weeks. 
 Is there any way to clean this up?

 Thanks
 Corin
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed

2015-02-27 Thread Steffen W Sørensen
 That's the old way of defining pools. The new way involves in defining a zone 
 and placement targets for that zone. Then you can have different default 
 placement targets for different users.
Anu URL/pointers to better understand such matters?

 Do you have any special config in your ceph.conf? E.g., did you modify the 
 rgw_enable_apis configurable by any chance?

# tail -20 /etc/ceph/ceph.conf 

[client.radosgw.owmblob]
 keyring = /etc/ceph/ceph.client.radosgw.keyring
 host = rgw
 user = apache
 rgw data = /var/lib/ceph/radosgw/ceph-rgw
 log file = /var/log/radosgw/client.radosgw.owmblob.log
 debug rgw = 20
 rgw enable log rados = true
 rgw enable ops log = true
 rgw enable apis = s3
 rgw cache enabled = true
 rgw cache lru size = 1
 rgw socket path = /var/run/ceph/ceph.radosgw.owmblob.fastcgi.sock
 ;#rgw host = localhost
 ;#rgw port = 8004
 rgw dns name = {fqdn}
 rgw print continue = true
 rgw thread pool size = 20


What is the purpose of the data directory btw?

/Steffen


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed

2015-02-27 Thread Steffen W Sørensen
Sorry forgot to send to the list...

Begin forwarded message:

 From: Steffen W Sørensen ste...@me.com
 Subject: Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
 Date: 27. feb. 2015 18.29.51 CET
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 
 
 It seems that your request did find its way to the gateway, but the question 
 here is why doesn't it match to a known operation. This really looks like a 
 valid list all buckets request, so I'm not sure what's happening.
 I'd look at two things first. One is the '{fqdn}' string, which I'm not sure 
 whether that's the actual string that you have, or whether you just replaced 
 it for the sake of anonymity.
 I replaced for anonymity thou I run on private IP but still :)
 
 The second is the port number, which should be fine, but maybe the fact that 
 it appears as part of the script uri triggers some issue.
 Hmm will try with default port 80... though I would assume that  anything 
 before the 'slash'  gets cut off as part of the hostname[:port] portion.
 Makes not difference using port 80.
 
 ...
 2015-02-27 18:15:43.402729 7f37889e0700 20 SERVER_PORT=80
 2015-02-27 18:15:43.402747 7f37889e0700 20 SERVER_PROTOCOL=HTTP/1.1
 2015-02-27 18:15:43.402765 7f37889e0700 20 SERVER_SIGNATURE=
 2015-02-27 18:15:43.402783 7f37889e0700 20 SERVER_SOFTWARE=Apache/2.2.22 
 (Fedora)
 2015-02-27 18:15:43.402814 7f37889e0700  1 == starting new request 
 req=0x7f37b80083d0 =
 2015-02-27 18:15:43.403157 7f37889e0700  2 req 1:0.000345::GET /::initializing
 2015-02-27 18:15:43.403491 7f37889e0700 10 host={fqdn} rgw_dns_name={fqdn}
 2015-02-27 18:15:43.404624 7f37889e0700  2 req 1:0.001816::GET /::http 
 status=405
 2015-02-27 18:15:43.404676 7f37889e0700  1 == req done req=0x7f37b80083d0 
 http_status=405 ==
 2015-02-27 18:15:43.404901 7f37889e0700 20 process_request() returned -2003
 
 
 
 I'm not sure how to define my radosgw user, i made one with full rights  key 
 type s3:
 
 # radosgw-admin user info --uid='{user name}'
 { user_id: {user name},
   display_name: test user for testlab,
   email: {email},
   suspended: 0,
   max_buckets: 1000,
   auid: 0,
   subusers: [],
   keys: [
 { user: {user name},
   access_key: WL4EJJYTLVYXEHNR6QSA,
   secret_key: {secret}}],
   swift_keys: [],
   caps: [],
   op_mask: read, write, delete,
   default_placement: ,
   placement_tags: [],
   bucket_quota: { enabled: false,
   max_size_kb: -1,
   max_objects: -1},
   user_quota: { enabled: false,
   max_size_kb: -1,
   max_objects: -1},
   temp_url_keys: []}
 
 When authenticating to the S3 API should I then use the unencrypted access 
 key string or the encrypted seen above plus my secret?
 Howto verify if I authenticate successfully through S3 maybe this is my 
 problem?
 
 test example:
 
 #!/usr/bin/python
 
 import boto
 import boto.s3.connection
 access_key = 'WL4EJJYTLVYXEHNR6QSA'
 secret_key = '{secret}'
 
 conn = boto.connect_s3(
 aws_access_key_id = access_key,
 aws_secret_access_key = secret_key,
 host = '{fqdn}', port = 8005, debug = 1,
 is_secure=False,   # uncomment if you are not using ssl
 calling_format = boto.s3.connection.OrdinaryCallingFormat(),
 )
 
 ## Any access on conn object fails with 405 not allowed
 for bucket in conn.get_all_buckets():
 print {name}\t{created}.format(
 name = bucket.name,
 created = bucket.creation_date,
 )
 bucket = conn.create_bucket('my-new-bucket')
 
 
 
 How does one btw control/map a user to/with a Ceph Pool or will an user with 
 full right be able to create Ceph Pools through the admin API?
 
 I've added a pool to radosgw before creating my user with --pool=owmblob 
 option not sure though that this will 'limit' a user to a default pool like 
 that.
 Would have thought that this would set the default_placement attribute on the 
 user then.
 Any good URLs to doc on the understanding of such matters as ACL, users and 
 pool mapping etc in a gateway are also appreciated.
 
 # radosgw-admin pools list
 [
 { name: owmblob}]
 



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed

2015-02-27 Thread Yehuda Sadeh-Weinraub
- Original Message -

 From: Steffen W Sørensen ste...@me.com
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Cc: ceph-users@lists.ceph.com
 Sent: Friday, February 27, 2015 9:39:46 AM
 Subject: Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed

 On 27/02/2015, at 17.20, Yehuda Sadeh-Weinraub  yeh...@redhat.com  wrote:

  I'd look at two things first. One is the '{fqdn}' string, which I'm not
  sure
  whether that's the actual string that you have, or whether you just
  replaced
  it for the sake of anonymity. The second is the port number, which should
  be
  fine, but maybe the fact that it appears as part of the script uri triggers
  some issue.
 

 When launching radosgw it logs this:

 ...
 2015-02-27 18:33:58.663960 7f200b67a8a0 20 rados-read obj-ofs=0 read_ofs=0
 read_len=524288
 2015-02-27 18:33:58.675821 7f200b67a8a0 20 rados-read r=0 bl.length=678
 2015-02-27 18:33:58.676532 7f200b67a8a0 10 cache put:
 name=.rgw.root+zone_info.default
 2015-02-27 18:33:58.676573 7f200b67a8a0 10 moving .rgw.root+zone_info.default
 to cache LRU end
 2015-02-27 18:33:58.677415 7f200b67a8a0 2 zone default is master
 2015-02-27 18:33:58.677666 7f200b67a8a0 20 get_obj_state: rctx=0x2a85cd0
 obj=.rgw.root:region_map state=0x2a86498 s-prefetch_data=0
 2015-02-27 18:33:58.677760 7f200b67a8a0 10 cache get:
 name=.rgw.root+region_map : miss
 2015-02-27 18:33:58.709411 7f200b67a8a0 10 cache put:
 name=.rgw.root+region_map
 2015-02-27 18:33:58.709846 7f200b67a8a0 10 adding .rgw.root+region_map to
 cache LRU end
 2015-02-27 18:33:58.957336 7f1ff17f2700 2 garbage collection: start
 2015-02-27 18:33:58.959189 7f1ff0df1700 20 BucketsSyncThread: start
 2015-02-27 18:33:58.985486 7f200b67a8a0 0 framework: fastcgi
 2015-02-27 18:33:58.985778 7f200b67a8a0 0 framework: civetweb
 2015-02-27 18:33:58.985879 7f200b67a8a0 0 framework conf key: port, val: 7480
 2015-02-27 18:33:58.986462 7f200b67a8a0 0 starting handler: civetweb
 2015-02-27 18:33:59.032173 7f1fc3fff700 20 UserSyncThread: start
 2015-02-27 18:33:59.214739 7f200b67a8a0 0 starting handler: fastcgi
 2015-02-27 18:33:59.286723 7f1fb59e8700 10 allocated request req=0x2aa1b20
 2015-02-27 18:34:00.533188 7f1fc3fff700 20 RGWRados::pool_iterate: got {my
 user name}
 2015-02-27 18:34:01.038190 7f1ff17f2700 2 garbage collection: stop
 2015-02-27 18:34:01.670780 7f1fc3fff700 20 RGWUserStatsCache: sync user={my
 user name}
 2015-02-27 18:34:01.687730 7f1fc3fff700 0 ERROR: can't read user header:
 ret=-2
 2015-02-27 18:34:01.689734 7f1fc3fff700 0 ERROR: sync_user() failed, user={my
 user name} ret=-2

 Why does it seem to find my radosgw defined user name as a pool and what
 might bring it to fail to read user header?

That's just a red herring. It tries to sync the user stats, but it can't 
because quota is not enabled (iirc). We should probably get rid of these 
messages as they're pretty confusing. 

Yehuda 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed

2015-02-27 Thread Steffen W Sørensen
  rgw enable apis = s3
Commenting this out makes it work :)

[root@rgw tests3]# ./lsbuckets.py 
[root@rgw tests3]# ./lsbuckets.py 
my-new-bucket   2015-02-27T17:49:04.000Z
[root@rgw tests3]#

...
2015-02-27 18:49:22.601578 7f48f2bdd700 20 rgw_create_bucket returned ret=-17 
bucket=my-new-bucket(@{i=.rgw.buckets.index,e=.rgw.buckets.extra}.rgw.buckets[default.5234475.2])
2015-02-27 18:49:22.625672 7f48f2bdd700  2 req 4:0.350444:s3:PUT 
/my-new-bucket/:create_bucket:http status=200
2015-02-27 18:49:22.625758 7f48f2bdd700  1 == req done req=0x7f4938007810 
http_status=200 ==
...

Why I just wants a S3 API available not admin API?

/Steffen


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed

2015-02-27 Thread Steffen W Sørensen
Hi,

Newbie to RadosGW+Ceph, but learning...
Got a running Ceph Cluster working with rbd+CephFS clients. Now I'm trying to 
verify a RadosGW S3 api, but seems to have an issue with RadosGW access.

I get the error (not found anything searching so far...):

S3ResponseError: 405 Method Not Allowed

when trying to access the rgw.

Apache vhost access log file says:

10.20.0.29 - - [27/Feb/2015:14:09:04 +0100] GET / HTTP/1.1 405 27 - 
Boto/2.34.0 Python/2.6.6 Linux/2.6.32-504.8.1.el6.x86_64

and Apache's general error_log file says:

[Fri Feb 27 14:09:04 2015] [warn] FastCGI: 10.20.0.29 GET http://{fqdn}:8005/ 
auth AWS WL4EJJYTLVYXEHNR6QSA:X6XR4z7Gr9qTMNDphTNlRUk3gfc=


RadosGW seems to launch and run fine, though /var/log/messages at launches says:

Feb 27 14:12:34 rgw kernel: radosgw[14985]: segfault at e0 ip 003fb36cb1dc 
sp 7fffde221410 error 4 in librados.so.2.0.0[3fb320+6d]

# ps -fuapache
UIDPID  PPID  C STIME TTY  TIME CMD
apache   15113 15111  0 14:07 ?00:00:00 /usr/sbin/fcgi-
apache   15114 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
apache   15115 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
apache   15116 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
apache   15117 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
apache   15118 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
apache   15119 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
apache   15120 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
apache   15121 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
apache   15224 1  1 14:12 ?00:00:25 /usr/bin/radosgw -n 
client.radosgw.owmblob

RadosGW create my FastCGI socket and a default .asok, (not sure why/what 
default socket are meant for) as well as the configured log file though it 
never logs anything...

# tail -18 /etc/ceph/ceph.conf:

[client.radosgw.owmblob]
 keyring = /etc/ceph/ceph.client.radosgw.keyring
 host = rgw
 rgw data = /var/lib/ceph/radosgw/ceph-rgw
 log file = /var/log/radosgw/client.radosgw.owmblob.log
 debug rgw = 20
 rgw enable log rados = true
 rgw enable ops log = true
 rgw enable apis = s3
 rgw cache enabled = true
 rgw cache lru size = 1
 rgw socket path = /var/run/ceph/ceph.radosgw.owmblob.fastcgi.sock
 ;#rgw host = localhost
 ;#rgw port = 8004
 rgw dns name = {fqdn}
 rgw print continue = true
 rgw thread pool size = 20

Turned out /etc/init.d/ceph-radosgw didn't chown $USER even when log_file 
didn't exist,
assuming radosgw creates this log file when opening it, only it creates it as 
root not $USER, thus not output, manually chowning it and restarting GW gives 
output ala:

2015-02-27 15:25:14.464112 7fef463e9700 20 enqueued request req=0x25dea40
2015-02-27 15:25:14.465750 7fef463e9700 20 RGWWQ:
2015-02-27 15:25:14.465786 7fef463e9700 20 req: 0x25dea40
2015-02-27 15:25:14.465864 7fef463e9700 10 allocated request req=0x25e3050
2015-02-27 15:25:14.466214 7fef431e4700 20 dequeued request req=0x25dea40
2015-02-27 15:25:14.466677 7fef431e4700 20 RGWWQ: empty
2015-02-27 15:25:14.467888 7fef431e4700 20 CONTENT_LENGTH=0
2015-02-27 15:25:14.467922 7fef431e4700 20 DOCUMENT_ROOT=/var/www/html
2015-02-27 15:25:14.467941 7fef431e4700 20 FCGI_ROLE=RESPONDER
2015-02-27 15:25:14.467958 7fef431e4700 20 GATEWAY_INTERFACE=CGI/1.1
2015-02-27 15:25:14.467976 7fef431e4700 20 HTTP_ACCEPT_ENCODING=identity
2015-02-27 15:25:14.469476 7fef431e4700 20 HTTP_AUTHORIZATION=AWS 
WL4EJJYTLVYXEHNR6QSA:OAT0zVItGyp98T5mALeHz4p1fcg=
2015-02-27 15:25:14.469516 7fef431e4700 20 HTTP_DATE=Fri, 27 Feb 2015 14:25:14 
GMT
2015-02-27 15:25:14.469533 7fef431e4700 20 HTTP_HOST={fqdn}:8005
2015-02-27 15:25:14.469550 7fef431e4700 20 HTTP_USER_AGENT=Boto/2.34.0 
Python/2.6.6 Linux/2.6.32-504.8.1.el6.x86_64
2015-02-27 15:25:14.469571 7fef431e4700 20 PATH=/sbin:/usr/sbin:/bin:/usr/bin
2015-02-27 15:25:14.469589 7fef431e4700 20 QUERY_STRING=
2015-02-27 15:25:14.469607 7fef431e4700 20 REMOTE_ADDR=10.20.0.29
2015-02-27 15:25:14.469624 7fef431e4700 20 REMOTE_PORT=34386
2015-02-27 15:25:14.469641 7fef431e4700 20 REQUEST_METHOD=GET
2015-02-27 15:25:14.469658 7fef431e4700 20 REQUEST_URI=/
2015-02-27 15:25:14.469677 7fef431e4700 20 
SCRIPT_FILENAME=/var/www/html/s3gw.fcgi
2015-02-27 15:25:14.469694 7fef431e4700 20 SCRIPT_NAME=/
2015-02-27 15:25:14.469711 7fef431e4700 20 SCRIPT_URI=http://{fqdn}:8005/
2015-02-27 15:25:14.469730 7fef431e4700 20 SCRIPT_URL=/
2015-02-27 15:25:14.469748 7fef431e4700 20 SERVER_ADDR=10.20.0.29
2015-02-27 15:25:14.469765 7fef431e4700 20 SERVER_ADMIN={email}
2015-02-27 15:25:14.469782 7fef431e4700 20 SERVER_NAME={fqdn}
2015-02-27 15:25:14.469801 7fef431e4700 20 SERVER_PORT=8005
2015-02-27 15:25:14.469818 7fef431e4700 20 SERVER_PROTOCOL=HTTP/1.1
2015-02-27 15:25:14.469835 7fef431e4700 20 SERVER_SIGNATURE=
2015-02-27 15:25:14.469852 7fef431e4700 20 SERVER_SOFTWARE=Apache/2.2.22 
(Fedora)
2015-02-27 

[ceph-users] Possibly misleading/outdated documentation about qemu/kvm and rbd cache settings

2015-02-27 Thread Florian Haas
Hi everyone,

I always have a bit of trouble wrapping my head around how libvirt seems
to ignore ceph.conf option while qemu/kvm does not, so I thought I'd
ask. Maybe Josh, Wido or someone else can clarify the following.

http://ceph.com/docs/master/rbd/qemu-rbd/ says:

Important: If you set rbd_cache=true, you must set cache=writeback or
risk data loss. Without cache=writeback, QEMU will not send flush
requests to librbd. If QEMU exits uncleanly in this configuration,
filesystems on top of rbd can be corrupted.

Now this refers to explicitly setting rbd_cache=true on the qemu command
line, not having rbd_cache=true in the [client] section in ceph.conf,
and I'm not even sure whether qemu supports that anymore.

Even if it does, I'm still not sure whether the statement is accurate.

qemu has, for some time, had a cache=directsync mode which is intended
to be used as follows (from
http://lists.nongnu.org/archive/html/qemu-devel/2011-08/msg00020.html):

This mode is useful when guests may not be sending flushes when
appropriate and therefore leave data at risk in case of power failure.
When cache=directsync is used, write operations are only completed to
the guest when data is safely on disk.

So even if there are no flush requests to librbd, users should still be
safe from corruption when using cache=directsync, no?

So in summary, I *think* the following considerations apply, but I'd be
grateful if someone could confirm or refute them:

cache = writethrough
Maps to rbd_cache=true, rbd_cache_max_dirty=0. Read cache only, safe to
use whether or not guest I/O stack sends flushes.

cache = writeback
Maps to rbd_cache=true, rbd_cache_max_dirty  0. Safe to use only if
guest I/O stack sends flushes. Maps to cache = writethrough until first
flush if rbd_cache_writethrough_until_flush = true (default in master).

cache = none
Maps to rbd_cache=false. No caching, safe to use regardless of guest I/O
stack flush support.

cache = unsafe
Maps to rbd_cache=true, rbd_cache_max_dirty  0, but also *ignores* all
flush requests from the guest. Not safe to use (except in the unlikely
case that your guest never-ever writes).

cache=directsync
Maps to rbd_cache=true, rbd_cache_max_dirty=0. Bypasses the host page
cache altogether, which I think would be meaningless with the rbd
storage driver because it doesn't use the host page cache (unlike
qcow2). Read cache only, safe to use whether or not guest I/O stack
sends flushes.

Is the above an accurate summary? If so, I'll be happy to send a doc patch.

Cheers,
Florian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] multiple CephFS filesystems on the same pools

2015-02-27 Thread Florian Haas
On 02/27/2015 11:37 AM, Blair Bethwaite wrote:
 Sorry if this is actually documented somewhere,

It is. :)

 but is it possible to
 create and use multiple filesystems on the data data and metadata
 pools? I'm guessing yes, but requires multiple MDSs?

Nope. Every fs needs one data and one metadata pool, which (as of 0.84)
can be arbitrarily named, but as yet there's no support for multiple
filesystems on a single cluster.

http://ceph.com/docs/master/cephfs/createfs/

Cheers,
Florian

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Possibly misleading/outdated documentation about qemu/kvm and rbd cache settings

2015-02-27 Thread Mark Wu
2015-02-27 20:56 GMT+08:00 Alexandre DERUMIER aderum...@odiso.com:

 Hi,

 from qemu rbd.c

 if (flags  BDRV_O_NOCACHE) {
 rados_conf_set(s-cluster, rbd_cache, false);
 } else {
 rados_conf_set(s-cluster, rbd_cache, true);
 }

 and
 block.c

 int bdrv_parse_cache_flags(const char *mode, int *flags)
 {
 *flags = ~BDRV_O_CACHE_MASK;

 if (!strcmp(mode, off) || !strcmp(mode, none)) {
 *flags |= BDRV_O_NOCACHE | BDRV_O_CACHE_WB;
 } else if (!strcmp(mode, directsync)) {
 *flags |= BDRV_O_NOCACHE;
 } else if (!strcmp(mode, writeback)) {
 *flags |= BDRV_O_CACHE_WB;
 } else if (!strcmp(mode, unsafe)) {
 *flags |= BDRV_O_CACHE_WB;
 *flags |= BDRV_O_NO_FLUSH;
 } else if (!strcmp(mode, writethrough)) {
 /* this is the default */
 } else {
 return -1;
 }

 return 0;
 }


 So rbd_cache is

 disabled for cache=directsync|none

 and enabled for writethrough|writeback|unsafe


 so directsync or none should be safe if guest does not send flush.



 - Mail original -
 De: Florian Haas flor...@hastexo.com
 À: ceph-users ceph-users@lists.ceph.com
 Envoyé: Vendredi 27 Février 2015 13:38:13
 Objet: [ceph-users] Possibly misleading/outdated documentation about
 qemu/kvm and rbd cache settings

 Hi everyone,

 I always have a bit of trouble wrapping my head around how libvirt seems
 to ignore ceph.conf option while qemu/kvm does not, so I thought I'd
 ask. Maybe Josh, Wido or someone else can clarify the following.

 http://ceph.com/docs/master/rbd/qemu-rbd/ says:

 Important: If you set rbd_cache=true, you must set cache=writeback or
 risk data loss. Without cache=writeback, QEMU will not send flush
 requests to librbd. If QEMU exits uncleanly in this configuration,
 filesystems on top of rbd can be corrupted.

 Now this refers to explicitly setting rbd_cache=true on the qemu command
 line, not having rbd_cache=true in the [client] section in ceph.conf,
 and I'm not even sure whether qemu supports that anymore.

 Even if it does, I'm still not sure whether the statement is accurate.

 qemu has, for some time, had a cache=directsync mode which is intended
 to be used as follows (from
 http://lists.nongnu.org/archive/html/qemu-devel/2011-08/msg00020.html):

 This mode is useful when guests may not be sending flushes when
 appropriate and therefore leave data at risk in case of power failure.
 When cache=directsync is used, write operations are only completed to
 the guest when data is safely on disk.

 So even if there are no flush requests to librbd, users should still be
 safe from corruption when using cache=directsync, no?

 So in summary, I *think* the following considerations apply, but I'd be
 grateful if someone could confirm or refute them:


 cache = writethrough
 Maps to rbd_cache=true, rbd_cache_max_dirty=0. Read cache only, safe to

 Actually, qemu doesn't care about the setting rbd_cache_max_dirty. In the
mode of writethrough,
qemu always sends flush following every write request.

 use whether or not guest I/O stack sends flushes.

 cache = writeback
 Maps to rbd_cache=true, rbd_cache_max_dirty  0. Safe to use only if
 guest I/O stack sends flushes. Maps to cache = writethrough until first

Qemu can report to guest if the write cache is enabled and guest kernel can
manage the cache
as what it does against volatile writeback cache on physical storage
controller
(Please see
https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt)
If filesystem barrier is not disabled on guest, it can avoid data
corruption.

 flush if rbd_cache_writethrough_until_flush = true (default in master).

 cache = none
 Maps to rbd_cache=false. No caching, safe to use regardless of guest I/O
 stack flush support.

 cache = unsafe
 Maps to rbd_cache=true, rbd_cache_max_dirty  0, but also *ignores* all
 flush requests from the guest. Not safe to use (except in the unlikely
 case that your guest never-ever writes).

 cache=directsync
 Maps to rbd_cache=true, rbd_cache_max_dirty=0. Bypasses the host page
 cache altogether, which I think would be meaningless with the rbd
 storage driver because it doesn't use the host page cache (unlike
 qcow2). Read cache only, safe to use whether or not guest I/O stack
 sends flushes.

 Is the above an accurate summary? If so, I'll be happy to send a doc patch.

 Cheers,
 Florian
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Possibly misleading/outdated documentation about qemu/kvm and rbd cache settings

2015-02-27 Thread Alexandre DERUMIER
Hi,

from qemu rbd.c

if (flags  BDRV_O_NOCACHE) {
rados_conf_set(s-cluster, rbd_cache, false);
} else {
rados_conf_set(s-cluster, rbd_cache, true);
}

and
block.c

int bdrv_parse_cache_flags(const char *mode, int *flags)
{
*flags = ~BDRV_O_CACHE_MASK;

if (!strcmp(mode, off) || !strcmp(mode, none)) {
*flags |= BDRV_O_NOCACHE | BDRV_O_CACHE_WB;
} else if (!strcmp(mode, directsync)) {
*flags |= BDRV_O_NOCACHE;
} else if (!strcmp(mode, writeback)) {
*flags |= BDRV_O_CACHE_WB;
} else if (!strcmp(mode, unsafe)) {
*flags |= BDRV_O_CACHE_WB;
*flags |= BDRV_O_NO_FLUSH;
} else if (!strcmp(mode, writethrough)) {
/* this is the default */
} else {
return -1;
}

return 0;
}


So rbd_cache is 

disabled for cache=directsync|none

and enabled for writethrough|writeback|unsafe


so directsync or none should be safe if guest does not send flush.



- Mail original -
De: Florian Haas flor...@hastexo.com
À: ceph-users ceph-users@lists.ceph.com
Envoyé: Vendredi 27 Février 2015 13:38:13
Objet: [ceph-users] Possibly misleading/outdated documentation about qemu/kvm 
and rbd cache settings

Hi everyone, 

I always have a bit of trouble wrapping my head around how libvirt seems 
to ignore ceph.conf option while qemu/kvm does not, so I thought I'd 
ask. Maybe Josh, Wido or someone else can clarify the following. 

http://ceph.com/docs/master/rbd/qemu-rbd/ says: 

Important: If you set rbd_cache=true, you must set cache=writeback or 
risk data loss. Without cache=writeback, QEMU will not send flush 
requests to librbd. If QEMU exits uncleanly in this configuration, 
filesystems on top of rbd can be corrupted. 

Now this refers to explicitly setting rbd_cache=true on the qemu command 
line, not having rbd_cache=true in the [client] section in ceph.conf, 
and I'm not even sure whether qemu supports that anymore. 

Even if it does, I'm still not sure whether the statement is accurate. 

qemu has, for some time, had a cache=directsync mode which is intended 
to be used as follows (from 
http://lists.nongnu.org/archive/html/qemu-devel/2011-08/msg00020.html): 

This mode is useful when guests may not be sending flushes when 
appropriate and therefore leave data at risk in case of power failure. 
When cache=directsync is used, write operations are only completed to 
the guest when data is safely on disk. 

So even if there are no flush requests to librbd, users should still be 
safe from corruption when using cache=directsync, no? 

So in summary, I *think* the following considerations apply, but I'd be 
grateful if someone could confirm or refute them: 

cache = writethrough 
Maps to rbd_cache=true, rbd_cache_max_dirty=0. Read cache only, safe to 
use whether or not guest I/O stack sends flushes. 

cache = writeback 
Maps to rbd_cache=true, rbd_cache_max_dirty  0. Safe to use only if 
guest I/O stack sends flushes. Maps to cache = writethrough until first 
flush if rbd_cache_writethrough_until_flush = true (default in master). 

cache = none 
Maps to rbd_cache=false. No caching, safe to use regardless of guest I/O 
stack flush support. 

cache = unsafe 
Maps to rbd_cache=true, rbd_cache_max_dirty  0, but also *ignores* all 
flush requests from the guest. Not safe to use (except in the unlikely 
case that your guest never-ever writes). 

cache=directsync 
Maps to rbd_cache=true, rbd_cache_max_dirty=0. Bypasses the host page 
cache altogether, which I think would be meaningless with the rbd 
storage driver because it doesn't use the host page cache (unlike 
qcow2). Read cache only, safe to use whether or not guest I/O stack 
sends flushes. 

Is the above an accurate summary? If so, I'll be happy to send a doc patch. 

Cheers, 
Florian 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Possibly misleading/outdated documentation about qemu/kvm and rbd cache settings

2015-02-27 Thread Florian Haas
On 02/27/2015 01:56 PM, Alexandre DERUMIER wrote:
 Hi,
 
 from qemu rbd.c
 
 if (flags  BDRV_O_NOCACHE) {
 rados_conf_set(s-cluster, rbd_cache, false);
 } else {
 rados_conf_set(s-cluster, rbd_cache, true);
 }
 
 and
 block.c
 
 int bdrv_parse_cache_flags(const char *mode, int *flags)
 {
 *flags = ~BDRV_O_CACHE_MASK;
 
 if (!strcmp(mode, off) || !strcmp(mode, none)) {
 *flags |= BDRV_O_NOCACHE | BDRV_O_CACHE_WB;
 } else if (!strcmp(mode, directsync)) {
 *flags |= BDRV_O_NOCACHE;
 } else if (!strcmp(mode, writeback)) {
 *flags |= BDRV_O_CACHE_WB;
 } else if (!strcmp(mode, unsafe)) {
 *flags |= BDRV_O_CACHE_WB;
 *flags |= BDRV_O_NO_FLUSH;
 } else if (!strcmp(mode, writethrough)) {
 /* this is the default */
 } else {
 return -1;
 }
 
 return 0;
 }
 
 
 So rbd_cache is 
 
 disabled for cache=directsync|none
 
 and enabled for writethrough|writeback|unsafe
 
 
 so directsync or none should be safe if guest does not send flush.

That's what I figured too, but then where does the important warning
in the documentation come from that implores people to always set
writeback? As per git blame it came directly from Josh. If anyone's an
authority on RBD, it would be him. :)

Cheers,
Florian

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed

2015-02-27 Thread Steffen W Sørensen

On 27/02/2015, at 18.51, Steffen W Sørensen ste...@me.com wrote:

 rgw enable apis = s3
 Commenting this out makes it work :)
Thanks for helping on this initial issue!

 [root@rgw tests3]# ./lsbuckets.py 
 [root@rgw tests3]# ./lsbuckets.py 
 my-new-bucket   2015-02-27T17:49:04.000Z
 [root@rgw tests3]#
 
 ...
 2015-02-27 18:49:22.601578 7f48f2bdd700 20 rgw_create_bucket returned ret=-17 
 bucket=my-new-bucket(@{i=.rgw.buckets.index,e=.rgw.buckets.extra}.rgw.buckets[default.5234475.2])
 2015-02-27 18:49:22.625672 7f48f2bdd700  2 req 4:0.350444:s3:PUT 
 /my-new-bucket/:create_bucket:http status=200
 2015-02-27 18:49:22.625758 7f48f2bdd700  1 == req done req=0x7f4938007810 
 http_status=200 ==
 ...
Into which pool does such user data (buckets and objects) gets stored and 
possible howto direct user data into a dedicated pool?

[root@rgw ~]# rados df
pool name   category KB  objects   clones 
degraded  unfound   rdrd KB   wrwr KB
.intent-log -  000  
  0   00000
.log-  110  
  0   00022
.rgw-  140  
  0   0   17   14   104
.rgw.buckets-  000  
  0   00000
.rgw.buckets.extra -  000   
 0   00000
.rgw.buckets.index -  010   
 0   02030
.rgw.control-  080  
  0   00000
.rgw.gc -  0   320  
  0   0 8302 8302 55560
.rgw.root   -  130  
  0   0  929  61833
.usage  -  000  
  0   00000
.users  -  110  
  0   06453
.users.email-  110  
  0   03253
.users.swift-  000  
  0   00000
.users.uid  -  120  
  0   0   65   54   164

Assume a bucket is a naming container for objects in a pool maybe similar to a 
directory with files.

/Steffen


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Lost Object

2015-02-27 Thread Daniel Takatori Ohara
Anyone help me, please?

In the attach, the log of mds with debug = 20.

Thanks,

Att.

---
Daniel Takatori Ohara.
System Administrator - Lab. of Bioinformatics
Molecular Oncology Center
Instituto Sírio-Libanês de Ensino e Pesquisa
Hospital Sírio-Libanês
Phone: +55 11 3155-0200 (extension 1927)
R: Cel. Nicolau dos Santos, 69
São Paulo-SP. 01308-060
http://www.bioinfo.mochsl.org.br


On Thu, Feb 26, 2015 at 4:21 PM, Daniel Takatori Ohara 
dtoh...@mochsl.org.br wrote:

 Hello,

 I have an problem. I will make a symbolic link for an file, but return the
 message : ln: failed to create symbolic link
 ‘./M_S8_L001_R1-2_001.fastq.gz_sylvio.sam_fixed.bam’: File exists

 When i do the command ls, the result is

 l? ? ?  ?   ??
 M_S8_L001_R1-2_001.fastq.gz_sylvio.sam_fixed.bam

 But, when do the command ls in the second time, the result not show the
 file.

 Anyone help me, please?

 Thank you,

 Att.

 ---
 Daniel Takatori Ohara.
 System Administrator - Lab. of Bioinformatics
 Molecular Oncology Center
 Instituto Sírio-Libanês de Ensino e Pesquisa
 Hospital Sírio-Libanês
 Phone: +55 11 3155-0200 (extension 1927)
 R: Cel. Nicolau dos Santos, 69
 São Paulo-SP. 01308-060
 http://www.bioinfo.mochsl.org.br




log_mds.gz
Description: GNU Zip compressed data
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Clarification of SSD journals for BTRFS rotational HDD

2015-02-27 Thread Robert LeBlanc
Also sending to the devel list to see if they have some insight.

On Wed, Feb 25, 2015 at 3:01 PM, Robert LeBlanc rob...@leblancnet.us wrote:
 I tried finding an answer to this on Google, but couldn't find it.

 Since BTRFS can parallel the journal with the write, does it make
 sense to have the journal on the SSD (because then we are forcing two
 writes instead of one)?

 Our plan is to have a caching tier of SSDs in front of our rotational
 HDDs and it sounds like the improvements in Hammer will really help
 here. If we can take the journals off the SSDs, that just opens up a
 bit more space for caching (albeit not much). It specifically makes
 the configuration of the host much simpler and a single SSD doesn't
 take out 5 HHDs.

 Thanks,
 Robert LeBlanc
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] too few pgs in cache tier

2015-02-27 Thread Udo Lembke
Hi all,
we use an EC-Pool with an small cache tier in front of, for our
archive-data (4 * 16TB VM-disks).

The ec-pool has k=3;m=2 because we startet with 5 nodes and want to
migrate to an new ec-pool with k=5;m=2. Therefor we migrate one VM-disk
(16TB) from the ceph-cluster to an fc-raid with the proxmox-ve interface
move disk.

The move was finished, but during removing the ceph-vm file the warning
'ssd-archiv' at/near target max; pool ssd-archiv has too few pgs occour.

Some hour later only the second warning exsist.

ceph health detail
HEALTH_WARN pool ssd-archiv has too few pgs
pool ssd-archiv objects per pg (51196) is more than 14.7709 times
cluster average (3466)

info about the image, which was deleted:
rbd image 'vm-409-disk-1':
size 16384 GB in 4194304 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.2b8fda574b0dc51
format: 2
features: layering

I think we hit http://tracker.ceph.com/issues/8103
but normaly one reading should not put the data in the cache tier, or??
Is deleting a second read??

Our ceph version: 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)


Regards

Udo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Minor flaw in /etc/init.d/ceph-radsgw script

2015-02-27 Thread Steffen W Sørensen
Hi

Seems there's a minor flaw in CentOS/RHEL niit script:

line 91 reads:

   daemon --user=$user $RADOSGW -n $name

should ImHO be:

   daemon --user=$user $RADOSGW -n $name

to avoid /etc/rc.d/init.d/functions:__pids_var_run line 151 complain in dirname 
:)

/Steffen


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed

2015-02-27 Thread Yehuda Sadeh-Weinraub


- Original Message -
 From: Steffen W Sørensen ste...@me.com
 To: ceph-users@lists.ceph.com
 Sent: Friday, February 27, 2015 6:40:01 AM
 Subject: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
 
 Hi,
 
 Newbie to RadosGW+Ceph, but learning...
 Got a running Ceph Cluster working with rbd+CephFS clients. Now I'm trying to
 verify a RadosGW S3 api, but seems to have an issue with RadosGW access.
 
 I get the error (not found anything searching so far...):
 
 S3ResponseError: 405 Method Not Allowed
 
 when trying to access the rgw.
 
 Apache vhost access log file says:
 
 10.20.0.29 - - [27/Feb/2015:14:09:04 +0100] GET / HTTP/1.1 405 27 -
 Boto/2.34.0 Python/2.6.6 Linux/2.6.32-504.8.1.el6.x86_64
 
 and Apache's general error_log file says:
 
 [Fri Feb 27 14:09:04 2015] [warn] FastCGI: 10.20.0.29 GET http://{fqdn}:8005/
 auth AWS WL4EJJYTLVYXEHNR6QSA:X6XR4z7Gr9qTMNDphTNlRUk3gfc=
 
 
 RadosGW seems to launch and run fine, though /var/log/messages at launches
 says:
 
 Feb 27 14:12:34 rgw kernel: radosgw[14985]: segfault at e0 ip
 003fb36cb1dc sp 7fffde221410 error 4 in
 librados.so.2.0.0[3fb320+6d]
 
 # ps -fuapache
 UIDPID  PPID  C STIME TTY  TIME CMD
 apache   15113 15111  0 14:07 ?00:00:00 /usr/sbin/fcgi-
 apache   15114 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
 apache   15115 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
 apache   15116 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
 apache   15117 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
 apache   15118 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
 apache   15119 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
 apache   15120 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
 apache   15121 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
 apache   15224 1  1 14:12 ?00:00:25 /usr/bin/radosgw -n
 client.radosgw.owmblob
 
 RadosGW create my FastCGI socket and a default .asok, (not sure why/what
 default socket are meant for) as well as the configured log file though it
 never logs anything...
 
 # tail -18 /etc/ceph/ceph.conf:
 
 [client.radosgw.owmblob]
  keyring = /etc/ceph/ceph.client.radosgw.keyring
  host = rgw
  rgw data = /var/lib/ceph/radosgw/ceph-rgw
  log file = /var/log/radosgw/client.radosgw.owmblob.log
  debug rgw = 20
  rgw enable log rados = true
  rgw enable ops log = true
  rgw enable apis = s3
  rgw cache enabled = true
  rgw cache lru size = 1
  rgw socket path = /var/run/ceph/ceph.radosgw.owmblob.fastcgi.sock
  ;#rgw host = localhost
  ;#rgw port = 8004
  rgw dns name = {fqdn}
  rgw print continue = true
  rgw thread pool size = 20
 
 Turned out /etc/init.d/ceph-radosgw didn't chown $USER even when log_file
 didn't exist,
 assuming radosgw creates this log file when opening it, only it creates it as
 root not $USER, thus not output, manually chowning it and restarting GW
 gives output ala:
 
 2015-02-27 15:25:14.464112 7fef463e9700 20 enqueued request req=0x25dea40
 2015-02-27 15:25:14.465750 7fef463e9700 20 RGWWQ:
 2015-02-27 15:25:14.465786 7fef463e9700 20 req: 0x25dea40
 2015-02-27 15:25:14.465864 7fef463e9700 10 allocated request req=0x25e3050
 2015-02-27 15:25:14.466214 7fef431e4700 20 dequeued request req=0x25dea40
 2015-02-27 15:25:14.466677 7fef431e4700 20 RGWWQ: empty
 2015-02-27 15:25:14.467888 7fef431e4700 20 CONTENT_LENGTH=0
 2015-02-27 15:25:14.467922 7fef431e4700 20 DOCUMENT_ROOT=/var/www/html
 2015-02-27 15:25:14.467941 7fef431e4700 20 FCGI_ROLE=RESPONDER
 2015-02-27 15:25:14.467958 7fef431e4700 20 GATEWAY_INTERFACE=CGI/1.1
 2015-02-27 15:25:14.467976 7fef431e4700 20 HTTP_ACCEPT_ENCODING=identity
 2015-02-27 15:25:14.469476 7fef431e4700 20 HTTP_AUTHORIZATION=AWS
 WL4EJJYTLVYXEHNR6QSA:OAT0zVItGyp98T5mALeHz4p1fcg=
 2015-02-27 15:25:14.469516 7fef431e4700 20 HTTP_DATE=Fri, 27 Feb 2015
 14:25:14 GMT
 2015-02-27 15:25:14.469533 7fef431e4700 20 HTTP_HOST={fqdn}:8005
 2015-02-27 15:25:14.469550 7fef431e4700 20 HTTP_USER_AGENT=Boto/2.34.0
 Python/2.6.6 Linux/2.6.32-504.8.1.el6.x86_64
 2015-02-27 15:25:14.469571 7fef431e4700 20 PATH=/sbin:/usr/sbin:/bin:/usr/bin
 2015-02-27 15:25:14.469589 7fef431e4700 20 QUERY_STRING=
 2015-02-27 15:25:14.469607 7fef431e4700 20 REMOTE_ADDR=10.20.0.29
 2015-02-27 15:25:14.469624 7fef431e4700 20 REMOTE_PORT=34386
 2015-02-27 15:25:14.469641 7fef431e4700 20 REQUEST_METHOD=GET
 2015-02-27 15:25:14.469658 7fef431e4700 20 REQUEST_URI=/
 2015-02-27 15:25:14.469677 7fef431e4700 20
 SCRIPT_FILENAME=/var/www/html/s3gw.fcgi
 2015-02-27 15:25:14.469694 7fef431e4700 20 SCRIPT_NAME=/
 2015-02-27 15:25:14.469711 7fef431e4700 20 SCRIPT_URI=http://{fqdn}:8005/
 2015-02-27 15:25:14.469730 7fef431e4700 20 SCRIPT_URL=/
 2015-02-27 15:25:14.469748 7fef431e4700 20 SERVER_ADDR=10.20.0.29
 2015-02-27 15:25:14.469765 7fef431e4700 20 SERVER_ADMIN={email}
 2015-02-27 15:25:14.469782 7fef431e4700 

Re: [ceph-users] old osds take much longer to start than newer osd

2015-02-27 Thread Corin Langosch
I'd guess so, but that's not what I want to do ;)

Am 27.02.2015 um 18:43 schrieb Robert LeBlanc:
 Does deleting/reformatting the old osds improve the performance?
 
 On Fri, Feb 27, 2015 at 6:02 AM, Corin Langosch
 corin.lango...@netskin.com wrote:
 Hi guys,

 I'm using ceph for a long time now, since bobtail. I always upgraded every 
 few weeks/ months to the latest stable
 release. Of course I also removed some osds and added new ones. Now during 
 the last few upgrades (I just upgraded from
 80.6 to 80.8) I noticed that old osds take much longer to startup than equal 
 newer osds (same amount of data/ disk
 usage, same kind of storage+journal backing device (ssd), same weight, same 
 number of pgs, ...). I know I observed the
 same behavior earlier but just didn't really care about it. Here are the 
 relevant log entries (host of osd.0 and osd.15
 has less cpu power than the others):

 old osds (average pgs load time: 1.5 minutes)

 2015-02-27 13:44:23.134086 7ffbfdcbe780  0 osd.0 19323 load_pgs
 2015-02-27 13:49:21.453186 7ffbfdcbe780  0 osd.0 19323 load_pgs opened 824 
 pgs

 2015-02-27 13:41:32.219503 7f197b0dd780  0 osd.3 19317 load_pgs
 2015-02-27 13:42:56.310874 7f197b0dd780  0 osd.3 19317 load_pgs opened 776 
 pgs

 2015-02-27 13:38:43.909464 7f450ac90780  0 osd.6 19309 load_pgs
 2015-02-27 13:40:40.080390 7f450ac90780  0 osd.6 19309 load_pgs opened 806 
 pgs

 2015-02-27 13:36:14.451275 7f3c41d33780  0 osd.9 19301 load_pgs
 2015-02-27 13:37:22.446285 7f3c41d33780  0 osd.9 19301 load_pgs opened 795 
 pgs

 new osds (average pgs load time: 3 seconds)

 2015-02-27 13:44:25.529743 7f2004617780  0 osd.15 19325 load_pgs
 2015-02-27 13:44:36.197221 7f2004617780  0 osd.15 19325 load_pgs opened 873 
 pgs

 2015-02-27 13:41:29.176647 7fb147fb3780  0 osd.16 19315 load_pgs
 2015-02-27 13:41:31.681722 7fb147fb3780  0 osd.16 19315 load_pgs opened 848 
 pgs

 2015-02-27 13:38:41.470761 7f9c404be780  0 osd.17 19307 load_pgs
 2015-02-27 13:38:43.737473 7f9c404be780  0 osd.17 19307 load_pgs opened 821 
 pgs

 2015-02-27 13:36:10.997766 7f7315e99780  0 osd.18 19299 load_pgs
 2015-02-27 13:36:13.511898 7f7315e99780  0 osd.18 19299 load_pgs opened 815 
 pgs

 The old osds also take more memory, here's an example:

 root 15700 22.8  0.7 1423816 485552 ?  Ssl  13:36   4:55 
 /usr/bin/ceph-osd -i 9 --pid-file
 /var/run/ceph/osd.9.pid -c /etc/ceph/ceph.conf --cluster ceph
 root 15270 15.4  0.4 1227140 297032 ?  Ssl  13:36   3:20 
 /usr/bin/ceph-osd -i 18 --pid-file
 /var/run/ceph/osd.18.pid -c /etc/ceph/ceph.conf --cluster ceph


 It seems to me there is still some old data around for the old osds which 
 was not properly migrated/ cleaned up during
 the upgrades. The cluster is healthy, no problems at all the last few weeks. 
 Is there any way to clean this up?

 Thanks
 Corin
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] too few pgs in cache tier

2015-02-27 Thread Steffen W Sørensen
On 27/02/2015, at 17.04, Udo Lembke ulem...@polarzone.de wrote:

 ceph health detail
 HEALTH_WARN pool ssd-archiv has too few pgs
Slightly different I had an issue with my Ceph Cluster underneath a PVE cluster 
yesterday.

Had two Ceph pools for RBD virt disks, vm_images (boot hdd images) + rbd_data 
(extra hdd images).

Then while adding pools for a rados GW (.rgw.*) suddenly health status said 
that my vm_images pool had too few PGs, thus I ran:

ceph osd pool set vm_images pg_num larger_number
ceph osd pool set vm_images pgp_num larger_number

Kicking off a 20 min rebalancing with a lot of IO in the Ceph Cluster, 
eventually Ceph Cluster was fine again, only almost all my PVE VMs ended up in 
stopped state, wondering why, a watchdog thingy maybe...

/Steffen




signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed

2015-02-27 Thread Steffen W Sørensen
On 27/02/2015, at 19.02, Steffen W Sørensen ste...@me.com wrote:
 Into which pool does such user data (buckets and objects) gets stored and 
 possible howto direct user data into a dedicated pool?
 
 [root@rgw ~]# rados df
 pool name   category KB  objects   clones 
 degraded  unfound   rdrd KB   wrwr KB
 .intent-log -  000
 0   00000
 .log-  110
 0   00022
 .rgw-  140
 0   0   17   14   104
 .rgw.buckets-  000
 0   00000
 .rgw.buckets.extra -  000 
0   00000
 .rgw.buckets.index -  010 
0   02030
 .rgw.control-  080
 0   00000
 .rgw.gc -  0   320
 0   0 8302 8302 55560
 .rgw.root   -  130
 0   0  929  61833
 .usage  -  000
 0   00000
 .users  -  110
 0   06453
 .users.email-  110
 0   03253
 .users.swift-  000
 0   00000
 .users.uid  -  120
 0   0   65   54   164
So it's mapped into a zone (at least on my Giant version 0.87)
and in my simple non-federated config it's in the default region+zone:

[root@rgw ~]# radosgw-admin region list
{ default_info: { default_region: default},
  regions: [
default]}
[root@rgw ~]# radosgw-admin zone list
{ zones: [
default]}

[root@rgw ~]# radosgw-admin region get
{ name: default,
  api_name: ,
  is_master: true,
  endpoints: [],
  master_zone: ,
  zones: [
{ name: default,
  endpoints: [],
  log_meta: false,
  log_data: false}],
  placement_targets: [
{ name: default-placement,
  tags: []}],
  default_placement: default-placement}

[root@rgw ~]# radosgw-admin zone get
{ domain_root: .rgw,
  control_pool: .rgw.control,
  gc_pool: .rgw.gc,
  log_pool: .log,
  intent_log_pool: .intent-log,
  usage_log_pool: .usage,
  user_keys_pool: .users,
  user_email_pool: .users.email,
  user_swift_pool: .users.swift,
  user_uid_pool: .users.uid,
  system_key: { access_key: ,
  secret_key: },
  placement_pools: [
{ key: default-placement,
  val: { index_pool: .rgw.buckets.index,
  data_pool: .rgw.buckets,
  data_extra_pool: .rgw.buckets.extra}}]}

and my user if associated with the default region+zone, thus it's data goes 
into .rgw.buckets + .rgw.buckets.index [+ .rgw.buckets.extra]
Buckets seems a naming container at the radosgw level, above the underlying 
Ceph pool abstraction level, 'just' providing object persistence for radosgw 
abstraction/object FS on top of Ceph Pools... I think.

So more users associated with same region+zone can share buckets+objects?

Would be nice with a drawing showing abstractions at the different levels 
possible woth links to details on administration at different levels :)
Lot of stuff to grasp for a newbie just in the need of a S3 service for an App 
usage :)

/Steffen


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph and docker

2015-02-27 Thread Sage Weil
The online Ceph Developer Summit is next week, and there is a session 
proposed for discussing ongoing Ceph and Docker integration efforts:


https://wiki.ceph.com/Planning/Blueprints/Infernalis/Continue_Ceph%2F%2FDocker_integration_work

Right now there is mostly a catalog of existing efforts.  It would be 
great to come out of this discussion with a more consolidated view of what 
the requirements are and what direction we should be going in.  If 
anyone is interested, please add your name to the blueprint and/or comment 
and edit as you see fit.  And join the discussion next week.. it's all 
video chat and irc and etherpad based.

Thanks!
sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v0.93 Hammer release candidate released

2015-02-27 Thread Sage Weil
This is the first release candidate for Hammer, and includes all of
the features that will be present in the final release.  We welcome
and encourage any and all testing in non-production clusters to identify
any problems with functionality, stability, or performance before the
final Hammer release.

We suggest some caution in one area: librbd.  There is a lot of new
functionality around object maps and locking that is disabled by
default but may still affect stability for existing images.  We are
continuing to shake out those bugs so that the final Hammer release
(probably v0.94) will be stable.

Major features since Giant include:

* cephfs: journal scavenger repair tool (John Spray)
* crush: new and improved straw2 bucket type (Sage Weil, Christina 
  Anderson, Xiaoxi Chen)
* doc: improved guidance for CephFS early adopters (John Spray)
* librbd: add per-image object map for improved performance (Jason 
  Dillaman)
* librbd: copy-on-read (Min Chen, Li Wang, Yunchuan Wen, Cheng Cheng)
* librados: fadvise-style IO hints (Jianpeng Ma)
* mds: many many snapshot-related fixes (Yan, Zheng)
* mon: new 'ceph osd df' command (Mykola Golub)
* mon: new 'ceph pg ls ...' command (Xinxin Shu)
* osd: improved performance for high-performance backends
* osd: improved recovery behavior (Samuel Just)
* osd: improved cache tier behavior with reads (Zhiqiang Wang)
* rgw: S3-compatible bucket versioning support (Yehuda Sadeh)
* rgw: large bucket index sharding (Guang Yang, Yehuda Sadeh)
* RDMA xio messenger support (Matt Benjamin, Vu Pham)

Upgrading
-

* No special restrictions when upgrading from firefly or giant

Notable Changes
---

* build: CMake support (Ali Maredia, Casey Bodley, Adam Emerson, Marcus 
  Watts, Matt Benjamin)
* ceph-disk: do not re-use partition if encryption is required (Loic 
  Dachary)
* ceph-disk: support LUKS for encrypted partitions (Andrew Bartlett, Loic 
  Dachary)
* ceph-fuse,libcephfs: add support for O_NOFOLLOW and O_PATH (Greg Farnum)
* ceph-fuse,libcephfs: resend requests before completing cap reconnect 
  (#10912 Yan, Zheng)
* ceph-fuse: select kernel cache invalidation mechanism based on kernel 
  version (Greg Farnum)
* ceph-objectstore-tool: improved import (David Zafman)
* ceph-objectstore-tool: misc improvements, fixes (#9870 #9871 David 
  Zafman)
* ceph: add 'ceph osd df [tree]' command (#10452 Mykola Golub)
* ceph: fix 'ceph tell ...' command validation (#10439 Joao Eduardo Luis)
* ceph: improve 'ceph osd tree' output (Mykola Golub)
* cephfs-journal-tool: add recover_dentries function (#9883 John Spray)
* common: add newline to flushed json output (Sage Weil)
* common: filtering for 'perf dump' (John Spray)
* common: fix Formatter factory breakage (#10547 Loic Dachary)
* common: make json-pretty output prettier (Sage Weil)
* crush: new and improved straw2 bucket type (Sage Weil, Christina 
  Anderson, Xiaoxi Chen)
* crush: update tries stats for indep rules (#10349 Loic Dachary)
* crush: use larger choose_tries value for erasure code rulesets (#10353 
  Loic Dachary)
* debian,rpm: move RBD udev rules to ceph-common (#10864 Ken Dreyer)
* debian: split python-ceph into python-{rbd,rados,cephfs} (Boris Ranto)
* doc: CephFS disaster recovery guidance (John Spray)
* doc: CephFS for early adopters (John Spray)
* doc: fix OpenStack Glance docs (#10478 Sebastien Han)
* doc: misc updates (#9793 #9922 #10204 #10203 Travis Rhoden, Hazem, 
  Ayari, Florian Coste, Andy Allan, Frank Yu, Baptiste Veuillez-Mainard, 
  Yuan Zhou, Armando Segnini, Robert Jansen, Tyler Brekke, Viktor Suprun)
* doc: replace cloudfiles with swiftclient Python Swift example (Tim 
  Freund)
* erasure-code: add mSHEC erasure code support (Takeshi Miyamae)
* erasure-code: improved docs (#10340 Loic Dachary)
* erasure-code: set max_size to 20 (#10363 Loic Dachary)
* libcephfs,ceph-fuse: fix getting zero-length xattr (#10552 Yan, Zheng)
* librados: add blacklist_add convenience method (Jason Dillaman)
* librados: expose rados_{read|write}_op_assert_version in C API (Kim 
  Vandry)
* librados: fix pool name caching (#10458 Radoslaw Zarzynski)
* librados: fix resource leak, misc bugs (#10425 Radoslaw Zarzynski)
* librados: fix some watch/notify locking (Jason Dillaman, Josh Durgin)
* libradosstriper: fix write_full when ENOENT (#10758 Sebastien Ponce)
* librbd: CRC protection for RBD image map (Jason Dillaman)
* librbd: add per-image object map for improved performance (Jason 
  Dillaman)
* librbd: add support for an object map indicating which objects exist 
  (Jason Dillaman)
* librbd: adjust internal locking (Josh Durgin, Jason Dillaman)
* librbd: better handling of watch errors (Jason Dillaman)
* librbd: coordinate maint operations through lock owner (Jason Dillaman)
* librbd: copy-on-read (Min Chen, Li Wang, Yunchuan Wen, Cheng Cheng, 
  Jason Dillaman)
* librbd: enforce write ordering with a snapshot (Jason Dillaman)
* librbd: fadvise-style hints; add misc hints for certain operations 
  (Jianpeng 

Re: [ceph-users] Possibly misleading/outdated documentation about qemu/kvm and rbd cache settings

2015-02-27 Thread Florian Haas
On 02/27/2015 02:46 PM, Mark Wu wrote:
 
 
 2015-02-27 20:56 GMT+08:00 Alexandre DERUMIER aderum...@odiso.com
 mailto:aderum...@odiso.com:
 
 Hi,
 
 from qemu rbd.c
 
 if (flags  BDRV_O_NOCACHE) {
 rados_conf_set(s-cluster, rbd_cache, false);
 } else {
 rados_conf_set(s-cluster, rbd_cache, true);
 }
 
 and
 block.c
 
 int bdrv_parse_cache_flags(const char *mode, int *flags)
 {
 *flags = ~BDRV_O_CACHE_MASK;
 
 if (!strcmp(mode, off) || !strcmp(mode, none)) {
 *flags |= BDRV_O_NOCACHE | BDRV_O_CACHE_WB;
 } else if (!strcmp(mode, directsync)) {
 *flags |= BDRV_O_NOCACHE;
 } else if (!strcmp(mode, writeback)) {
 *flags |= BDRV_O_CACHE_WB;
 } else if (!strcmp(mode, unsafe)) {
 *flags |= BDRV_O_CACHE_WB;
 *flags |= BDRV_O_NO_FLUSH;
 } else if (!strcmp(mode, writethrough)) {
 /* this is the default */
 } else {
 return -1;
 }
 
 return 0;
 }
 
 
 So rbd_cache is
 
 disabled for cache=directsync|none
 
 and enabled for writethrough|writeback|unsafe
 
 
 so directsync or none should be safe if guest does not send flush.
 
 
 
 - Mail original -
 De: Florian Haas flor...@hastexo.com mailto:flor...@hastexo.com
 À: ceph-users ceph-users@lists.ceph.com
 mailto:ceph-users@lists.ceph.com
 Envoyé: Vendredi 27 Février 2015 13:38:13
 Objet: [ceph-users] Possibly misleading/outdated documentation about
 qemu/kvm and rbd cache settings
 
 Hi everyone,
 
 I always have a bit of trouble wrapping my head around how libvirt seems
 to ignore ceph.conf option while qemu/kvm does not, so I thought I'd
 ask. Maybe Josh, Wido or someone else can clarify the following.
 
 http://ceph.com/docs/master/rbd/qemu-rbd/ says:
 
 Important: If you set rbd_cache=true, you must set cache=writeback or
 risk data loss. Without cache=writeback, QEMU will not send flush
 requests to librbd. If QEMU exits uncleanly in this configuration,
 filesystems on top of rbd can be corrupted.
 
 Now this refers to explicitly setting rbd_cache=true on the qemu command
 line, not having rbd_cache=true in the [client] section in ceph.conf,
 and I'm not even sure whether qemu supports that anymore.
 
 Even if it does, I'm still not sure whether the statement is accurate.
 
 qemu has, for some time, had a cache=directsync mode which is intended
 to be used as follows (from
 http://lists.nongnu.org/archive/html/qemu-devel/2011-08/msg00020.html):
 
 This mode is useful when guests may not be sending flushes when
 appropriate and therefore leave data at risk in case of power failure.
 When cache=directsync is used, write operations are only completed to
 the guest when data is safely on disk.
 
 So even if there are no flush requests to librbd, users should still be
 safe from corruption when using cache=directsync, no?
 
 So in summary, I *think* the following considerations apply, but I'd be
 grateful if someone could confirm or refute them: 
 
 
 cache = writethrough
 Maps to rbd_cache=true, rbd_cache_max_dirty=0. Read cache only, safe to
 
  Actually, qemu doesn't care about the setting rbd_cache_max_dirty. In
 the mode of writethrough,
 qemu always sends flush following every write request.

So how exactly is that functionally different from rbd_cache_max_dirty=0?


 use whether or not guest I/O stack sends flushes.
 
 cache = writeback
 Maps to rbd_cache=true, rbd_cache_max_dirty  0. Safe to use only if
 guest I/O stack sends flushes. Maps to cache = writethrough until first 
 
 Qemu can report to guest if the write cache is enabled and guest kernel
 can manage the cache
 as what it does against volatile writeback cache on physical storage
 controller
 (Please see
 https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt)
 If filesystem barrier is not disabled on guest, it can avoid data
 corruption.

You mean block barriers? I thought those were killed upstream like 4
years ago.

Cheers,
Florian

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-02-27 Thread Chris Murray
That's interesting, it seems to be alternating between two lines, but only one 
thread this time? I'm guessing the 62738 is the osdmap, which is much behind 
where it should be? Osd.0 and osd.3 are on 63675, if I'm understanding that 
correctly.

2015-02-27 08:18:48.724645 7f2fbd1e8700 20 osd.11 62738 update_osd_stat 
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist [])
2015-02-27 08:18:48.724683 7f2fbd1e8700  5 osd.11 62738 heartbeat: 
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist [])
2015-02-27 08:19:00.025003 7f2fbd1e8700 20 osd.11 62738 update_osd_stat 
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist [])
2015-02-27 08:19:00.025040 7f2fbd1e8700  5 osd.11 62738 heartbeat: 
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist [])
2015-02-27 08:19:04.125395 7f2fbd1e8700 20 osd.11 62738 update_osd_stat 
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist [])
2015-02-27 08:19:04.125431 7f2fbd1e8700  5 osd.11 62738 heartbeat: 
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist [])
2015-02-27 08:19:26.225763 7f2fbd1e8700 20 osd.11 62738 update_osd_stat 
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist [])
2015-02-27 08:19:26.225797 7f2fbd1e8700  5 osd.11 62738 heartbeat: 
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist [])
2015-02-27 08:19:26.726140 7f2fbd1e8700 20 osd.11 62738 update_osd_stat 
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist [])
2015-02-27 08:19:26.726177 7f2fbd1e8700  5 osd.11 62738 heartbeat: 
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist [])

Activity on /dev/sdb looks similar to how it did without debugging:

sdb   5.95 0.00   701.20  0  42072
sdb   5.10 0.00   625.60  0  37536
sdb   4.97 0.00   611.33  0  36680
sdb   5.77 0.00   701.20  0  42072


Some Googling reveals references to log files which have very similar entries, 
but I can't see anything that just repeats like mine does.

-Original Message-
From: Gregory Farnum [mailto:g...@gregs42.com] 
Sent: 26 February 2015 22:37
To: Chris Murray
Cc: ceph-users
Subject: Re: [ceph-users] More than 50% osds down, CPUs still busy; will the 
cluster recover without help?

If you turn up debug osd = 20 or something it'll apply a good bit more disk 
load but give you more debugging logs about what's going on.
It could be that you're in enough of a mess now that it's stuck trying to 
calculate past intervals for a bunch of PGs across so many maps that it's 
swapping things in and out of memory and going slower (if that's the case, then 
increasing the size of your map cache will help).
-Greg

On Thu, Feb 26, 2015 at 1:56 PM, Chris Murray chrismurra...@gmail.com wrote:
 Tackling this on a more piecemeal basis, I've stopped all OSDs, and 
 started just the three which exist on the first host.

 osd.0 comes up without complaint: osd.0 63675 done with init, 
 starting boot process
 osd.3 comes up without complaint: osd.3 63675 done with init, 
 starting boot process
 osd.11 is a problematic one.

 It does something like this ...

 2015-02-26 10:44:50.260593 7f7e23551780  0 ceph version 0.80.8 
 (69eaad7f8308f21573c604f121956e64679a52a7), process ceph-osd, pid 
 305080
 2015-02-26 10:44:50.265525 7f7e23551780  0
 filestore(/var/lib/ceph/osd/ceph-11) mount detected btrfs
 2015-02-26 10:44:51.155501 7f7e23551780  0
 genericfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_features:
 FIEMAP ioctl is supported and appears to work
 2015-02-26 10:44:51.155536 7f7e23551780  0
 genericfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_features:
 FIEMAP ioctl is disabled via 'filestore fiemap' config option
 2015-02-26 10:44:51.433239 7f7e23551780  0
 genericfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_features:
 syscall(SYS_syncfs, fd) fully supported
 2015-02-26 10:44:51.433467 7f7e23551780  0
 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_feature:
 CLONE_RANGE ioctl is supported
 2015-02-26 10:44:51.644373 7f7e23551780  0
 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_feature:
 SNAP_CREATE is supported
 2015-02-26 10:44:51.668424 7f7e23551780  0
 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_feature:
 SNAP_DESTROY is supported
 2015-02-26 10:44:51.668741 7f7e23551780  0
 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_feature:
 START_SYNC is supported (transid 43205)
 2015-02-26 10:44:51.766577 7f7e23551780  0
 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_feature:
 WAIT_SYNC is supported
 2015-02-26 10:44:51.814761 7f7e23551780  0
 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_feature:
 SNAP_CREATE_V2 is supported
 2015-02-26 10:44:52.181382 7f7e23551780  0
 filestore(/var/lib/ceph/osd/ceph-11) mount: 

[ceph-users] multiple CephFS filesystems on the same pools

2015-02-27 Thread Blair Bethwaite
Sorry if this is actually documented somewhere, but is it possible to
create and use multiple filesystems on the data data and metadata
pools? I'm guessing yes, but requires multiple MDSs?

-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-02-27 Thread Chris Murray
A little further logging:

2015-02-27 10:27:15.745585 7fe8e3f2f700 20 osd.11 62839 update_osd_stat
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:27:15.745619 7fe8e3f2f700  5 osd.11 62839 heartbeat:
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:27:23.530913 7fe8e8536700  1 -- 192.168.12.25:6800/673078
-- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0})
v2 -- ?+0 0xe5f26380 con 0xe1f0cc60
2015-02-27 10:27:30.645902 7fe8e3f2f700 20 osd.11 62839 update_osd_stat
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:27:30.645938 7fe8e3f2f700  5 osd.11 62839 heartbeat:
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:27:33.531142 7fe8e8536700  1 -- 192.168.12.25:6800/673078
-- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0})
v2 -- ?+0 0xe5f26540 con 0xe1f0cc60
2015-02-27 10:27:43.531333 7fe8e8536700  1 -- 192.168.12.25:6800/673078
-- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0})
v2 -- ?+0 0xe5f26700 con 0xe1f0cc60
2015-02-27 10:27:45.546275 7fe8e3f2f700 20 osd.11 62839 update_osd_stat
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:27:45.546311 7fe8e3f2f700  5 osd.11 62839 heartbeat:
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:27:53.531564 7fe8e8536700  1 -- 192.168.12.25:6800/673078
-- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0})
v2 -- ?+0 0xe5f268c0 con 0xe1f0cc60
2015-02-27 10:27:56.846593 7fe8e3f2f700 20 osd.11 62839 update_osd_stat
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:27:56.846627 7fe8e3f2f700  5 osd.11 62839 heartbeat:
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:27:57.346965 7fe8e3f2f700 20 osd.11 62839 update_osd_stat
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:27:57.347001 7fe8e3f2f700  5 osd.11 62839 heartbeat:
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:28:03.531785 7fe8e8536700  1 -- 192.168.12.25:6800/673078
-- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0})
v2 -- ?+0 0xe5f26a80 con 0xe1f0cc60
2015-02-27 10:28:13.532027 7fe8e8536700  1 -- 192.168.12.25:6800/673078
-- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0})
v2 -- ?+0 0xe5f26c40 con 0xe1f0cc60
2015-02-27 10:28:23.047382 7fe8e3f2f700 20 osd.11 62839 update_osd_stat
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:28:23.047419 7fe8e3f2f700  5 osd.11 62839 heartbeat:
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 10:28:23.532271 7fe8e8536700  1 -- 192.168.12.25:6800/673078
-- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0})
v2 -- ?+0 0xe5f26e00 con 0xe1f0cc60
2015-02-27 10:28:33.532496 7fe8e8536700  1 -- 192.168.12.25:6800/673078
-- 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0})
v2 -- ?+0 0xe5f26fc0 con 0xe1f0cc60

62839? But it was 62738 earlier, so it is actually advancing toward the
63675? If what I've assumed about the osd map numbers is true.

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Chris Murray
Sent: 27 February 2015 08:33
To: Gregory Farnum
Cc: ceph-users
Subject: Re: [ceph-users] More than 50% osds down, CPUs still busy;will
the cluster recover without help?

That's interesting, it seems to be alternating between two lines, but
only one thread this time? I'm guessing the 62738 is the osdmap, which
is much behind where it should be? Osd.0 and osd.3 are on 63675, if I'm
understanding that correctly.

2015-02-27 08:18:48.724645 7f2fbd1e8700 20 osd.11 62738 update_osd_stat
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 08:18:48.724683 7f2fbd1e8700  5 osd.11 62738 heartbeat:
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 08:19:00.025003 7f2fbd1e8700 20 osd.11 62738 update_osd_stat
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 08:19:00.025040 7f2fbd1e8700  5 osd.11 62738 heartbeat:
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 08:19:04.125395 7f2fbd1e8700 20 osd.11 62738 update_osd_stat
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 08:19:04.125431 7f2fbd1e8700  5 osd.11 62738 heartbeat:
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 08:19:26.225763 7f2fbd1e8700 20 osd.11 62738 update_osd_stat
osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
[])
2015-02-27 08:19:26.225797 7f2fbd1e8700  5 osd.11 62738 heartbeat:
osd_stat(1305 GB used, 

Re: [ceph-users] Cluster never reaching clean after osd out

2015-02-27 Thread Yves Kretzschmar
Hi Stéphane,
 
I think I got it.
I purged my complete Cluster and set up the new one like the old and got 
exactly the same problem again.
Then I did ceph osd crush tunables optimal which added the option 
chooseleaf_vary_r 1 to the crushmap.
After that everything works fine.

Try it at your cluster.

Greetings
Yves
 

Gesendet: Dienstag, 24. Februar 2015 um 10:49 Uhr
Von: Stéphane DUGRAVOT stephane.dugra...@univ-lorraine.fr
An: Yves Kretzschmar yveskretzsch...@web.de, ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] Cluster never reaching clean after osd out

 
 
 

I have a Cluster of 3 hosts, running Debian wheezy and Backports Kernel 
3.16.0-0.bpo.4-amd64.
For testing I did a 
~# ceph osd out 20
from a clean state.
Ceph starts rebalancing, watching ceph -w one sees changing pgs stuck unclean 
to get up and then go down to about 11.

Short after that the cluster keeps stuck forever in this state:
health HEALTH_WARN 68 pgs stuck unclean; recovery 450/169647 objects degraded 
(0.265%); 3691/169647 objects misplaced (2.176%)

According to the documentation at 
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/ the Cluster should 
reach a clean state after an osd out.

What am I doing wrong?
 
Hi Yves and Cephers,
 
I have a cluster with 6 nodes and 36 OSD. I have the same pb :
 
    cluster 1d0503fb-36d0-4dbc-aabe-a2a0709163cd
 health HEALTH_WARN 76 pgs stuck unclean; recovery 1/624 objects degraded 
(0.160%); 7/624 objects misplaced (1.122%)
 monmap e6: 6 mons
 osdmap e616: 36 osds: 36 up, 35 in
  pgmap v16344: 2048 pgs, 1 pools, 689 MB data, 208 objects
    178 GB used, 127 TB / 127 TB avail
    1/624 objects degraded (0.160%); 7/624 objects misplaced (1.122%)
  76 active+remapped
    1972 active+clean
 
After 'out' osd.15, ceph didn't return to health ok, and get misplaced object 
... :-/
I noticed that this happen when i use a replicated 3 pool. When the pool use a 
replicated 2, ceph returned to health ok... Have you try with a replicated 2 
pool ?
 
In the same way, I wonder why he does not return to the status ok
 
 
CEPH OSD TREE
 
# id    weight    type name    up/down    reweight
-1000    144    root default
-200    48        datacenter mo
-133    48            rack mom02
-4    24                host mom02h01
12    4                    osd.12    up    1    
13    4                    osd.13    up    1    
14    4                    osd.14    up    1    
16    4                    osd.16    up    1    
17    4                    osd.17    up    1    
15    4                    osd.15    up    0    
-5    24                host mom02h02
18    4                    osd.18    up    1    
19    4                    osd.19    up    1    
20    4                    osd.20    up    1    
21    4                    osd.21    up    1    
22    4                    osd.22    up    1    
23    4                    osd.23    up    1    
-202    48        datacenter me
-135    48            rack mem04
-6    24                host mem04h01
24    4                    osd.24    up    1    
25    4                    osd.25    up    1    
26    4                    osd.26    up    1    
27    4                    osd.27    up    1    
28    4                    osd.28    up    1    
29    4                    osd.29    up    1    
-7    24                host mem04h02
30    4                    osd.30    up    1    
31    4                    osd.31    up    1    
32    4                    osd.32    up    1    
33    4                    osd.33    up    1    
34    4                    osd.34    up    1    
35    4                    osd.35    up    1    
-201    48        datacenter li
-134    48            rack lis04
-2    24                host lis04h01
0    4                    osd.0    up    1    
2    4                    osd.2    up    1    
3    4                    osd.3    up    1    
4    4                    osd.4    up    1    
5    4                    osd.5    up    1    
1    4                    osd.1    up    1    
-3    24                host lis04h02
6    4                    osd.6    up    1    
7    4                    osd.7    up    1    
8    4                    osd.8    up    1    
9    4                    osd.9    up    1    
10    4                    osd.10    up    1    
11    4                    osd.11    up    1   
 
 
Crushmap
 
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14
device 15 osd.15
device 16 osd.16
device 17 osd.17
device 18 osd.18
device 19 osd.19
device 20 osd.20
device 21 osd.21
device 22