[ceph-users] radosgw + s3 + keystone + Browser-Based POST problem

2015-01-29 Thread Valery Tschopp

Hi guys,

We have integrated our radosgw (v0.80.7) with our OpenStack Keystone 
server (icehouse) successfully.


The normal S3 operations can be executed with the Keystone user's EC2 
credentials (EC2_ACCESS_KEY, EC2_SECRET_KEY). The radosgw correctly 
handles these user credentials, ask keystone to validate them, and the 
resulting objects belong to the Keystone tenant/project or the user 
(user is member of the tenant/project).


But for the Browser-based upload POST [1] it doesn't work! The user is 
not correctly resolved, and the radosgw returns a 403 code!


It looks like the s3 keystone integration doesn't work correctly when a 
S3 browser-based upload POST is used.


See the attached log file (radosgw.log), you can clearly see the user 
lookup failing, and the status being set to 403:



2015-01-29 15:11:30.151157 7f25616fa700  0 User lookup failed!
2015-01-29 15:11:30.151171 7f25616fa700 15 Read 
RGWCORSConfigurationCORSConfigurationCORSRuleAllowedMethodPOST/AllowedMethodAllowedOriginhttps://staging.tube.switch.ch/AllowedOriginAllowedHeader*/AllowedHeader/CORSRule/CORSConfiguration

2015-01-29 15:11:30.151184 7f25616fa700 10 Method POST is supported
2015-01-29 15:11:30.151195 7f25616fa700  2 req 1123:0.013204:s3:POST 
/:post_obj:http status=403



Is this a bug? Or did we miss something else?

Cheers,
Valery

[1] http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingHTTPPOST.html
--
SWITCH
--
Valery Tschopp, Software Engineer, Peta Solutions
Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
email: valery.tsch...@switch.ch phone: +41 44 268 1544

2015-01-29 15:11:30.130054 7f2634cef700 20 enqueued request req=0x7f26040838d0
2015-01-29 15:11:30.130084 7f2634cef700 20 RGWWQ:
2015-01-29 15:11:30.130086 7f2634cef700 20 req: 0x7f26040838d0
2015-01-29 15:11:30.130108 7f2634cef700 10 allocated request req=0x7f26040c58d0
2015-01-29 15:11:30.130200 7f2454ce1700 20 dequeued request req=0x7f26040838d0
2015-01-29 15:11:30.130208 7f2454ce1700 20 RGWWQ: empty
2015-01-29 15:11:30.130303 7f2454ce1700 20 CONTEXT_DOCUMENT_ROOT=/var/www
2015-01-29 15:11:30.130305 7f2454ce1700 20 CONTEXT_PREFIX=
2015-01-29 15:11:30.130306 7f2454ce1700 20 DOCUMENT_ROOT=/var/www
2015-01-29 15:11:30.130307 7f2454ce1700 20 FCGI_ROLE=RESPONDER
2015-01-29 15:11:30.130308 7f2454ce1700 20 GATEWAY_INTERFACE=CGI/1.1
2015-01-29 15:11:30.130308 7f2454ce1700 20 HTTP_ACCEPT=*/*
2015-01-29 15:11:30.130309 7f2454ce1700 20 HTTP_ACCEPT_ENCODING=gzip, deflate, 
sdch
2015-01-29 15:11:30.130310 7f2454ce1700 20 
HTTP_ACCEPT_LANGUAGE=en-US,en;q=0.8,it;q=0.6
2015-01-29 15:11:30.130311 7f2454ce1700 20 
HTTP_ACCESS_CONTROL_REQUEST_HEADERS=content-type
2015-01-29 15:11:30.130312 7f2454ce1700 20 
HTTP_ACCESS_CONTROL_REQUEST_METHOD=POST
2015-01-29 15:11:30.130312 7f2454ce1700 20 HTTP_AUTHORIZATION=
2015-01-29 15:11:30.130313 7f2454ce1700 20 HTTP_CACHE_CONTROL=no-cache
2015-01-29 15:11:30.130314 7f2454ce1700 20 HTTP_CONNECTION=keep-alive
2015-01-29 15:11:30.130314 7f2454ce1700 20 
HTTP_HOST=switch-original-staging.os.zhdk.cloud.switch.ch
2015-01-29 15:11:30.130315 7f2454ce1700 20 
HTTP_ORIGIN=https://staging.tube.switch.ch
2015-01-29 15:11:30.130316 7f2454ce1700 20 HTTP_PRAGMA=no-cache
2015-01-29 15:11:30.130317 7f2454ce1700 20 
HTTP_REFERER=https://staging.tube.switch.ch/channels/04238519/videos
2015-01-29 15:11:30.130318 7f2454ce1700 20 HTTP_USER_AGENT=Mozilla/5.0 
(Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/40.0.2214.93 Safari/537.36
2015-01-29 15:11:30.130320 7f2454ce1700 20 HTTPS=on
2015-01-29 15:11:30.130321 7f2454ce1700 20 
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2015-01-29 15:11:30.130322 7f2454ce1700 20 QUERY_STRING=
2015-01-29 15:11:30.130322 7f2454ce1700 20 REMOTE_ADDR=130.59.17.201
2015-01-29 15:11:30.130323 7f2454ce1700 20 REMOTE_PORT=53901
2015-01-29 15:11:30.130324 7f2454ce1700 20 REQUEST_METHOD=OPTIONS
2015-01-29 15:11:30.130325 7f2454ce1700 20 REQUEST_SCHEME=https
2015-01-29 15:11:30.130326 7f2454ce1700 20 REQUEST_URI=/
2015-01-29 15:11:30.130327 7f2454ce1700 20 SCRIPT_FILENAME=/var/www/radosgw.fcgi
2015-01-29 15:11:30.130328 7f2454ce1700 20 SCRIPT_NAME=/
2015-01-29 15:11:30.130329 7f2454ce1700 20 
SCRIPT_URI=https://switch-original-staging.os.zhdk.cloud.switch.ch/
2015-01-29 15:11:30.130330 7f2454ce1700 20 SCRIPT_URL=/
2015-01-29 15:11:30.130331 7f2454ce1700 20 SERVER_ADDR=86.119.32.13
2015-01-29 15:11:30.130332 7f2454ce1700 20 SERVER_ADMIN=cl...@switch.ch
2015-01-29 15:11:30.130333 7f2454ce1700 20 
SERVER_NAME=switch-original-staging.os.zhdk.cloud.switch.ch
2015-01-29 15:11:30.130334 7f2454ce1700 20 SERVER_PORT=443
2015-01-29 15:11:30.130334 7f2454ce1700 20 SERVER_PROTOCOL=HTTP/1.1
2015-01-29 15:11:30.130335 7f2454ce1700 20 SERVER_SIGNATURE=
2015-01-29 15:11:30.130350 7f2454ce1700 20 SERVER_SOFTWARE=Apache/2.4.7 (Ubuntu)
2015-01-29 15:11:30.130351 7f2454ce1700 20 
SSL_TLS_SNI=switch-original-staging.os.zhdk.cloud.switch.ch
2015-01-29 15:11:30.130352 

Re: [ceph-users] Sizing SSD's for ceph

2015-01-29 Thread Udo Lembke
Hi,

Am 29.01.2015 07:53, schrieb Christian Balzer:
 On Thu, 29 Jan 2015 01:30:41 + Ramakrishna Nishtala (rnishtal) wrote:

 * Per my understanding once writes are complete to journal then
 it is read again from the journal before writing to data disk. Does this
 mean, we have to do, not just sync/async writes but also reads
 ( random/seq ? ) in order to correctly size them?

 You might want to read this thread:
 https://www.mail-archive.com/ceph-devel@vger.kernel.org/msg12952.html
 
 Assuming this didn't change (and just looking at my journal SSDs and OSD
 HDDs with atop I don't think so) your writes go to the HDDs pretty much in
 parallel.
 
 In either case, an SSD that can _write_ fast enough to satisfy your needs
 will definitely have no problems reading fast enough. 
 

due, that the data are in the cache (ram), there are only marginal reads
from the journal-ssd!

iostat from an journal ssd:

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sdc 304,45 0,16 82750,46  29544 15518960008

I would say, if you have much more reads, you have to less memory.


Udo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW region metadata sync prevents writes to non-master region

2015-01-29 Thread Yehuda Sadeh
On Wed, Jan 28, 2015 at 8:04 PM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:
 On 29/01/15 13:58, Mark Kirkwood wrote:


 However if I
 try to write to eu-west I get:


 Sorry - that should have said:

 However if I try to write to eu-*east* I get:

 The actual code is (see below) connecting to the endpoint for eu-east
 (ceph4:80), so seeing it redirected to us-*west* is pretty strange!

The bucket creation is synchronous, and sent to the master region for
completion. Not sure why it actually fails, that's what the master
region sends back. What does the corresponding log at the master
region show?

Yehuda


 --- code ---
 import boto
 import boto.s3.connection

 access_key = 'the key'
 secret_key = 'the secret'

 conn = boto.connect_s3(
 aws_access_key_id = access_key,
 aws_secret_access_key = secret_key,
 host = 'ceph4',
 is_secure=False,   # uncommmnt if you are not using ssl
 calling_format = boto.s3.connection.OrdinaryCallingFormat(),
 )

 bucket = conn.create_bucket('bucket1', location='eu')
 key = bucket.new_key('hello.txt')
 key.set_contents_from_string('Hello World!')


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] No auto-mount of OSDs after server reboot

2015-01-29 Thread Lindsay Mathieson
On Thu, 29 Jan 2015 03:05:41 PM Alexis KOALLA wrote:
 Hi,
 Today we  encountered an issue in  our Ceph cluster in  LAB.
 Issue: The servers that host the OSDs have rebooted and we have observed 
 that after the reboot there is no auto mount of OSD devices and we need 
 to manually performed the mount and then start the OSD as below:
 
 1- [root@osd.0] mount /dev/sdb2 /var/lib/ceph/osd/ceph-0
 2- [root@osd.0] start ceph-osd id=0


As far as I'm aware, ceph does not handle mounting of the base filesystem - its 
up to you to create an fstab entry for it.

The osd should autostart, but it will of course fail if the filesystem is not 
mounted.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] No auto-mount of OSDs after server reboot

2015-01-29 Thread Alexis KOALLA

Hi,
Today we  encountered an issue in  our Ceph cluster in  LAB.
Issue: The servers that host the OSDs have rebooted and we have observed 
that after the reboot there is no auto mount of OSD devices and we need 
to manually performed the mount and then start the OSD as below:


1- [root@osd.0] mount /dev/sdb2 /var/lib/ceph/osd/ceph-0
2- [root@osd.0] start ceph-osd id=0

After performing the two commands above the OSD is up again.

The question: Is it the normal behaviour of OSD server or probably 
something is wrong in our configuration.


Any help or idea will be appreciated.
Thanks and regards
Alex

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW region metadata sync prevents writes to non-master region

2015-01-29 Thread Mark Kirkwood

On 30/01/15 06:31, Yehuda Sadeh wrote:

On Wed, Jan 28, 2015 at 8:04 PM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:

On 29/01/15 13:58, Mark Kirkwood wrote:



However if I
try to write to eu-west I get:



Sorry - that should have said:

However if I try to write to eu-*east* I get:

The actual code is (see below) connecting to the endpoint for eu-east
(ceph4:80), so seeing it redirected to us-*west* is pretty strange!


The bucket creation is synchronous, and sent to the master region for
completion. Not sure why it actually fails, that's what the master
region sends back. What does the corresponding log at the master
region show?




The log from us-west (ceph1) below. It looks to be failing because the 
user does not exist. That is reasonable - I've created the user in 
us-*east* and it has been replicated to eu-east...


What is puzzling is why oit is going to that zone (instead of us-east). 
I'll include the region json below too (in case three is something 
obviously dumb in them)!


$ tail radosgw.log

2015-01-29 21:23:05.260158 7f9f66f7d700  1 == starting new request 
req=0x7f9fa802b390 =
2015-01-29 21:23:05.260173 7f9f66f7d700  2 req 1:0.15::PUT 
/bucket1/::initializing

2015-01-29 21:23:05.260178 7f9f66f7d700 10 host=ceph1 rgw_dns_name=ceph1
2015-01-29 21:23:05.260220 7f9f66f7d700 10 s-object=NULL 
s-bucket=bucket1
2015-01-29 21:23:05.260230 7f9f66f7d700  2 req 1:0.72:s3:PUT 
/bucket1/::getting op
2015-01-29 21:23:05.260241 7f9f66f7d700  2 req 1:0.83:s3:PUT 
/bucket1/:create_bucket:authorizing
2015-01-29 21:23:05.260282 7f9f66f7d700 20 get_obj_state: 
rctx=0x7f9fac0280a0 obj=.us-west.users:eu-east key state=0x7f9fac028380 
s-prefetch_data=0
2015-01-29 21:23:05.260291 7f9f66f7d700 10 cache get: 
name=.us-west.users+eu-east key : miss
2015-01-29 21:23:05.261188 7f9f66f7d700 10 cache put: 
name=.us-west.users+eu-east key
2015-01-29 21:23:05.261194 7f9f66f7d700 10 adding .us-west.users+eu-east 
key to cache LRU end
2015-01-29 21:23:05.261207 7f9f66f7d700  5 error reading user info, 
uid=eu-east key can't authenticate

2015-01-29 21:23:05.261210 7f9f66f7d700 10 failed to authorize request
2015-01-29 21:23:05.261237 7f9f66f7d700  2 req 1:0.001079:s3:PUT 
/bucket1/:create_bucket:http status=403
2015-01-29 21:23:05.261240 7f9f66f7d700  1 == req done 
req=0x7f9fa802b390 http_status=403 ==



$ cat us.json
{ name: us,
  api_name: us,
  is_master: true,
  endpoints: [
http:\/\/ceph2:80\/, http:\/\/ceph1:80\/ ],
  master_zone: us-east,
  zones: [
{ name: us-east,
  endpoints: [
http:\/\/ceph2:80\/],
  log_meta: true,
  log_data: true},
{ name: us-west,
  endpoints: [
http:\/\/ceph1:80\/],
  log_meta: true,
  log_data: true}],
  placement_targets: [
   {
 name: default-placement,
 tags: []
   }
  ],
  default_placement: default-placement}

$ cat eu.json
{ name: eu,
  api_name: eu,
  is_master: false,
  endpoints: [
http:\/\/ceph4:80\/, http:\/\/ceph3:80\/ ],
  master_zone: eu-east,
  zones: [
{ name: eu-east,
  endpoints: [
http:\/\/ceph4:80\/],
  log_meta: true,
  log_data: true},
{ name: eu-west,
  endpoints: [
http:\/\/ceph3:80\/],
  log_meta: true,
  log_data: true}],
  placement_targets: [
   {
 name: default-placement,
 tags: []
   }
  ],
  default_placement: default-placement}
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW region metadata sync prevents writes to non-master region

2015-01-29 Thread Yehuda Sadeh
How does your regionmap look like? Is it updated correctly on all zones?

On Thu, Jan 29, 2015 at 1:42 PM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:
 On 30/01/15 06:31, Yehuda Sadeh wrote:

 On Wed, Jan 28, 2015 at 8:04 PM, Mark Kirkwood
 mark.kirkw...@catalyst.net.nz wrote:

 On 29/01/15 13:58, Mark Kirkwood wrote:



 However if I
 try to write to eu-west I get:


 Sorry - that should have said:

 However if I try to write to eu-*east* I get:

 The actual code is (see below) connecting to the endpoint for eu-east
 (ceph4:80), so seeing it redirected to us-*west* is pretty strange!


 The bucket creation is synchronous, and sent to the master region for
 completion. Not sure why it actually fails, that's what the master
 region sends back. What does the corresponding log at the master
 region show?



 The log from us-west (ceph1) below. It looks to be failing because the user
 does not exist. That is reasonable - I've created the user in us-*east* and
 it has been replicated to eu-east...

 What is puzzling is why oit is going to that zone (instead of us-east). I'll
 include the region json below too (in case three is something obviously dumb
 in them)!

 $ tail radosgw.log

 2015-01-29 21:23:05.260158 7f9f66f7d700  1 == starting new request
 req=0x7f9fa802b390 =
 2015-01-29 21:23:05.260173 7f9f66f7d700  2 req 1:0.15::PUT
 /bucket1/::initializing
 2015-01-29 21:23:05.260178 7f9f66f7d700 10 host=ceph1 rgw_dns_name=ceph1
 2015-01-29 21:23:05.260220 7f9f66f7d700 10 s-object=NULL
 s-bucket=bucket1
 2015-01-29 21:23:05.260230 7f9f66f7d700  2 req 1:0.72:s3:PUT
 /bucket1/::getting op
 2015-01-29 21:23:05.260241 7f9f66f7d700  2 req 1:0.83:s3:PUT
 /bucket1/:create_bucket:authorizing
 2015-01-29 21:23:05.260282 7f9f66f7d700 20 get_obj_state:
 rctx=0x7f9fac0280a0 obj=.us-west.users:eu-east key state=0x7f9fac028380
 s-prefetch_data=0
 2015-01-29 21:23:05.260291 7f9f66f7d700 10 cache get:
 name=.us-west.users+eu-east key : miss
 2015-01-29 21:23:05.261188 7f9f66f7d700 10 cache put:
 name=.us-west.users+eu-east key
 2015-01-29 21:23:05.261194 7f9f66f7d700 10 adding .us-west.users+eu-east key
 to cache LRU end
 2015-01-29 21:23:05.261207 7f9f66f7d700  5 error reading user info,
 uid=eu-east key can't authenticate
 2015-01-29 21:23:05.261210 7f9f66f7d700 10 failed to authorize request
 2015-01-29 21:23:05.261237 7f9f66f7d700  2 req 1:0.001079:s3:PUT
 /bucket1/:create_bucket:http status=403
 2015-01-29 21:23:05.261240 7f9f66f7d700  1 == req done
 req=0x7f9fa802b390 http_status=403 ==


 $ cat us.json
 { name: us,
   api_name: us,
   is_master: true,
   endpoints: [
 http:\/\/ceph2:80\/, http:\/\/ceph1:80\/ ],
   master_zone: us-east,
   zones: [
 { name: us-east,
   endpoints: [
 http:\/\/ceph2:80\/],
   log_meta: true,
   log_data: true},
 { name: us-west,
   endpoints: [
 http:\/\/ceph1:80\/],
   log_meta: true,
   log_data: true}],
   placement_targets: [
{
  name: default-placement,
  tags: []
}
   ],
   default_placement: default-placement}

 $ cat eu.json
 { name: eu,
   api_name: eu,
   is_master: false,
   endpoints: [
 http:\/\/ceph4:80\/, http:\/\/ceph3:80\/ ],
   master_zone: eu-east,
   zones: [
 { name: eu-east,
   endpoints: [
 http:\/\/ceph4:80\/],
   log_meta: true,
   log_data: true},
 { name: eu-west,
   endpoints: [
 http:\/\/ceph3:80\/],
   log_meta: true,
   log_data: true}],
   placement_targets: [
{
  name: default-placement,
  tags: []
}
   ],
   default_placement: default-placement}
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] keyvaluestore backend metadata overhead

2015-01-29 Thread Chris Pacejo
Hi, we've been experimenting with the keyvaluestore backend, and have found
that, on every object write (e.g. with `rados put`), a single transaction
is issued containing an additional 9 KeyValueDB writes, beyond those which
constitute the object data.  Given the key names, these are clearly all
metadata of some sort, but this poses a problem when the objects themselves
are very small.  Given the default strip block size of 4 KiB, with objects
of size 36 KiB or less, half or more of all key-value store writes are
metadata writes.  With objects of size 4 KiB or less, the metadata overhead
grows to 90%+.

Is there any way to reduce the number of metadata rows which must be
written with each object?

(Alternatively, if there is a way to convince the OSD to issue multiple
concurrent write transactions, that would also help.  But even with
keyvaluestore op threads set as high as 64, and `rados bench` issuing 64
concurrent writes, we never see more than a single active write transaction
on the (multithread-capable) backend.  Is there some other option we're
missing?)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw (0.87) and multipart upload (result object size = 0)

2015-01-29 Thread Gleb Borisov
Hi,

We're experiencing some issues with our radosgw setup. Today we tried to
copy bunch of objects between two separate clusters (using our own tool
built on top of java s3 api).

All went smooth until we start copying large objects (200G+). We can see
that our code handles this case correctly and started multipart upload
(s3.initiateMultipartUpload), then it uploaded all the parts in serial mode
(s3.uploadPart) and finally completed upload (s3.completeMultipartUpload).

When we've checked consistency of two clusters we found that we have a lot
of zero-sized objects (which turns to be our large objects).

I've made more verbose log from radosgw:

two requests (put_obj, complete_multipart) -
https://gist.github.com/anonymous/840e0aee5a7ce0326368 (all finished with
200)

radosgw-admin object stat output:
https://gist.github.com/anonymous/2b6771bbbad3021364e2

We've tried to upload these objects several times without any luck.

# radosgw --version
ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)

Thanks in advance.

-- 
Best regards,
Gleb M Borisov
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] error in sys.exitfunc

2015-01-29 Thread Blake, Karl D
Please advise.

Thanks,
-Karl

From: Blake, Karl D
Sent: Monday, January 19, 2015 7:23 PM
To: 'ceph-us...@ceph.com'
Subject: error in sys.exitfunc

Anytime I run Ceph-deploy I get the above error. Can you help resolve?

Thanks,
-Karl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] error in sys.exitfunc

2015-01-29 Thread Blake, Karl D
Error is same as this posted link - 
http://www.spinics.net/lists/ceph-devel/msg21388.html

From: Blake, Karl D
Sent: Tuesday, January 20, 2015 4:29 AM
To: ceph-us...@ceph.com
Subject: RE: error in sys.exitfunc

Please advise.

Thanks,
-Karl

From: Blake, Karl D
Sent: Monday, January 19, 2015 7:23 PM
To: 'ceph-us...@ceph.com'
Subject: error in sys.exitfunc

Anytime I run Ceph-deploy I get the above error. Can you help resolve?

Thanks,
-Karl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Deploying ceph using Dell equallogic storage arrays

2015-01-29 Thread Imran Khan
Dear concerned,

Can I use Dell's equallogic storage arrays (model PS-4110) to configure
different OSDs on these storage arrays (maybe by creating different
volumes). If this is possible, then how can should I set about deploying
Ceph in my system (some user guide or introductory document will be nice).

I am deploying my OpenStack cloud and currently these storage blades can be
configured as Cinder Volumes, I have iSCSI access to my storage arrays from
all my blades.

I don't have any real experience with Ceph but I know that normal blade
servers can easily be configured as Ceph storage clusters in HA mode with
monitors et al.

What I want is to use my storage arrays as Ceph Storage Clusters.

Warm regards,
Khan, Imran
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Help:mount error

2015-01-29 Thread 于泓海
Hi:


I have completed the installation of ceph cluster,and the ceph health is ok:


cluster 15ee68b9-eb3c-4a49-8a99-e5de64449910
 health HEALTH_OK
 monmap e1: 1 mons at {ceph01=10.194.203.251:6789/0}, election epoch 1, 
quorum 0 ceph01
 mdsmap e2: 0/0/1 up
 osdmap e16: 2 osds: 2 up, 2 in
  pgmap v729: 92 pgs, 4 pools, 136 MB data, 46 objects
23632 MB used, 31172 MB / 54805 MB avail
  92 active+clean


But when i mount from client,the error is: mount error 5 = Input/output error.
I have tried lots of ways,for ex:disable selinux,update kernel... 
Could anyone help me to resolve it? Thanks!




Jason



--

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW region metadata sync prevents writes to non-master region

2015-01-29 Thread Mark Kirkwood

On 30/01/15 11:08, Yehuda Sadeh wrote:

How does your regionmap look like? Is it updated correctly on all zones?



Regionmap listed below - checking it on all 4 zones produces exactly the 
same output (md5sum is same):


{
regions: [
{
key: eu,
val: {
name: eu,
api_name: eu,
is_master: false,
endpoints: [
http:\/\/ceph4:80\/,
http:\/\/ceph3:80\/
],
master_zone: eu-east,
zones: [
{
name: eu-east,
endpoints: [
http:\/\/ceph4:80\/
],
log_meta: true,
log_data: true,
bucket_index_max_shards: 0
},
{
name: eu-west,
endpoints: [
http:\/\/ceph3:80\/
],
log_meta: true,
log_data: true,
bucket_index_max_shards: 0
}
],
placement_targets: [
{
name: default-placement,
tags: []
}
],
default_placement: default-placement
}
},
{
key: us,
val: {
name: us,
api_name: us,
is_master: true,
endpoints: [
http:\/\/ceph2:80\/,
http:\/\/ceph1:80\/
],
master_zone: us-east,
zones: [
{
name: us-east,
endpoints: [
http:\/\/ceph2:80\/
],
log_meta: true,
log_data: true,
bucket_index_max_shards: 0
},
{
name: us-west,
endpoints: [
http:\/\/ceph1:80\/
],
log_meta: true,
log_data: true,
bucket_index_max_shards: 0
}
],
placement_targets: [
{
name: default-placement,
tags: []
}
],
default_placement: default-placement
}
}
],
master_region: us,
bucket_quota: {
enabled: false,
max_size_kb: -1,
max_objects: -1
},
user_quota: {
enabled: false,
max_size_kb: -1,
max_objects: -1
}
}

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] keyvaluestore backend metadata overhead

2015-01-29 Thread Sage Weil
Hi Chris,

[Moving this thread to ceph-devel, which is probably a bit more 
appropriate.]

On Thu, 29 Jan 2015, Chris Pacejo wrote:
 Hi, we've been experimenting with the keyvaluestore backend, and have found
 that, on every object write (e.g. with `rados put`), a single transaction is
 issued containing an additional 9 KeyValueDB writes, beyond those which
 constitute the object data.  Given the key names, these are clearly all
 metadata of some sort, but this poses a problem when the objects themselves
 are very small.  Given the default strip block size of 4 KiB, with objects
 of size 36 KiB or less, half or more of all key-value store writes are
 metadata writes.  With objects of size 4 KiB or less, the metadata overhead
 grows to 90%+.
 
 Is there any way to reduce the number of metadata rows which must be written
 with each object?

There is a level (or two) of indirection in KeyValueStore's 
GenericObjectMap that is there to allow object cloning.  I wonder if we 
will want to facilitate a backend that doesn't implement clone and can 
only be used for pools that disallow clone and snap operations.

There is also some key consolidation in the OSD layer we talked about in 
the wednesday performance call that will cut this down some!

 (Alternatively, if there is a way to convince the OSD to issue multiple
 concurrent write transactions, that would also help.  But even with
 keyvaluestore op threads set as high as 64, and `rados bench` issuing 64
 concurrent writes, we never see more than a single active write transaction
 on the (multithread-capable) backend.  Is there some other option we're
 missing?)

sage___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mon leveldb loss

2015-01-29 Thread Mike Winfield
Hi, I'm hoping desperately that someone can help. I have a critical issue
with a tiny 'cluster'...

There was a power glitch earlier today (not an outage, might have been a
brownout, some things went down, others didn't) and i came home to a CPU
machine check exception on the singular host on which i keep a trio of ceph
monitors. No option but to hard reset. When the system came back up, the
monitors didn't.

Each mon is reporting possible corruption of their leveldb stores, files
are missing, one might surmise an fsck decided to discard them. See
attached txt files for ceph-mon output and corresponding store.db directory
listings.

Is there any way to recover the leveldb for the monitors? I am more than
capable and willing to dig into the structure of these files - or any
similar measures - if necessary. Perhaps correlate a compete picture
between the data files that are available?

I do have a relevant backup of the monitor data but it is now three months
old. I would prefer not to have to resort to this if there is any chance of
recovering monitor operability by other means.

Also, what would the consequences be of restoring such a backup when the
(12TB worth of) osd's are perfectly fine and contain the latest up-to-date
pg associations? Would there be a risk of data loss?

Unfortunately i don't have any backups of the actual user data (being poor,
scraping along on a shoestring budget, not exactly conducive to anything
approaching an ideal hardware setup), unless one counts a set of old disks
from a previously failed cluster from six months ago.

My last recourse will likely be to try to scavenge and piece together my
most important files from whatever i find on the osd's. Far from an
exciting prospect but i am seriously desperate.

I would be terribly grateful for any input.

Mike
2015-01-29 19:49:30.590913 7fa66458d7c0  0 ceph version 0.74 
(c165483bc72031ed9e5cca4e52fe3dd6142c8baa), process ceph-mon, pid 18788
Corruption: 10 missing files; e.g.: 
/var/lib/ceph/mon/unimatrix-0/store.db/1054928.ldb
Corruption: 10 missing files; e.g.: 
/var/lib/ceph/mon/unimatrix-0/store.db/1054928.ldb
2015-01-29 19:49:37.542790 7fa66458d7c0 -1 failed to create new leveldb store
2015-01-29 19:49:43.279940 7f03e8ec87c0  0 ceph version 0.74 
(c165483bc72031ed9e5cca4e52fe3dd6142c8baa), process ceph-mon, pid 18846
Corruption: 10 missing files; e.g.: 
/var/lib/ceph/mon/unimatrix-1/store.db/1054939.ldb
Corruption: 10 missing files; e.g.: 
/var/lib/ceph/mon/unimatrix-1/store.db/1054939.ldb
2015-01-29 19:49:50.708742 7f03e8ec87c0 -1 failed to create new leveldb store
2015-01-29 19:49:47.866736 7fb6aeebe7c0  0 ceph version 0.74 
(c165483bc72031ed9e5cca4e52fe3dd6142c8baa), process ceph-mon, pid 18869
Corruption: 10 missing files; e.g.: 
/var/lib/ceph/mon/unimatrix-2/store.db/1054942.ldb
Corruption: 10 missing files; e.g.: 
/var/lib/ceph/mon/unimatrix-2/store.db/1054942.ldb
2015-01-29 19:49:54.935436 7fb6aeebe7c0 -1 failed to create new leveldb store
mon/unimatrix-0/store.db/:
total 42160
-rw-r--r-- 1 root root   57 Aug 24 14:59 LOG
-rw-r--r-- 1 root root0 Aug 24 14:59 LOCK
drwxr-xr-x 3 root root   80 Aug 24 14:59 ..
-rw-r--r-- 1 root root   16 Nov  2 18:24 CURRENT
-rw-r--r-- 1 root root   182248 Jan 29 05:13 1051297.ldb
-rw-r--r-- 1 root root82124 Jan 29 13:53 1054697.ldb
-rw-r--r-- 1 root root46609 Jan 29 14:00 1054744.ldb
-rw-r--r-- 1 root root   165708 Jan 29 14:07 1054790.ldb
-rw-r--r-- 1 root root83304 Jan 29 14:16 1054851.ldb
-rw-r--r-- 1 root root18620 Jan 29 14:16 1054858.ldb
-rw-r--r-- 1 root root 42568979 Jan 29 14:23 MANIFEST-399002
drwxr-xr-x 2 root root  240 Jan 29 14:23 .

mon/unimatrix-2/store.db/:
total 42180
-rw-r--r-- 1 root root   57 Aug 24 15:09 LOG
-rw-r--r-- 1 root root0 Aug 24 15:09 LOCK
drwxr-xr-x 3 root root   80 Aug 24 15:09 ..
-rw-r--r-- 1 root root   16 Nov  2 18:24 CURRENT
-rw-r--r-- 1 root root   182248 Jan 29 05:13 1051311.ldb
-rw-r--r-- 1 root root82124 Jan 29 13:53 1054711.ldb
-rw-r--r-- 1 root root46609 Jan 29 14:00 1054758.ldb
-rw-r--r-- 1 root root   165708 Jan 29 14:07 1054804.ldb
-rw-r--r-- 1 root root83304 Jan 29 14:16 1054865.ldb
-rw-r--r-- 1 root root18620 Jan 29 14:16 1054872.ldb
-rw-r--r-- 1 root root 42589118 Jan 29 14:23 MANIFEST-399004
drwxr-xr-x 2 root root  240 Jan 29 14:23 .

mon/unimatrix-1/store.db/:
total 42180
-rw-r--r-- 1 root root0 Aug 24 15:03 LOCK
drwxr-xr-x 3 root root   80 Aug 24 15:03 ..
-rw-r--r-- 1 root root   57 Aug 24 15:03 LOG
-rw-r--r-- 1 root root   16 Nov  2 18:24 CURRENT
-rw-r--r-- 1 root root   182248 Jan 29 05:13 1051308.ldb
-rw-r--r-- 1 root root82124 Jan 29 13:53 1054708.ldb
-rw-r--r-- 1 root root46609 Jan 29 14:00 1054755.ldb
-rw-r--r-- 1 root root   165708 Jan 29 14:07 1054801.ldb
-rw-r--r-- 1 root root83304 Jan 29 14:16 1054862.ldb
-rw-r--r-- 1 root root18620 Jan 29 14:16 1054869.ldb
-rw-r--r-- 1 root root 4254 Jan 29 14:23 

[ceph-users] Question about ceph class usage

2015-01-29 Thread Dennis Chen
Hello,

I found the document of ceph class usage is very few, below is the
only one which can almost address my needs--
http://ceph.com/rados/dynamic-object-interfaces-with-lua/

But still some questions confusing me left there:

1. How to make the OSD to load the class lib? or what's the process
for an OSD deamon to load a customized class lib?
I checked my OSD log file(/var/log/ceph/ceph-osd.2.log), I can't find
the log message loading cls_hello, does that mean the hello class
lib hasn't been loaded by the osd deamon yet? But I can see the
'libcls_hello.so' is really under /usr/lib64/rados-classes folder of
that OSD.

2. Suppose I have an object named testobj stored into OSD0 and OSD1,
what will happen if I call the rados_exec(..., testobj, hello,
say_hello,...) in client side? Will the say_hello() function be
called twice in OSD0 and OSD1 respectively?

-- 
Den
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] error in sys.exitfunc

2015-01-29 Thread Blake, Karl D
Anytime I run Ceph-deploy I get the above error. Can you help resolve?

Thanks,
-Karl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW region metadata sync prevents writes to non-master region

2015-01-29 Thread Yehuda Sadeh
On Thu, Jan 29, 2015 at 3:27 PM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:
 On 30/01/15 11:08, Yehuda Sadeh wrote:

 How does your regionmap look like? Is it updated correctly on all zones?


 Regionmap listed below - checking it on all 4 zones produces exactly the
 same output (md5sum is same):

 {
 regions: [
 {
 key: eu,
 val: {
 name: eu,
 api_name: eu,
 is_master: false,
 endpoints: [
 http:\/\/ceph4:80\/,
 http:\/\/ceph3:80\/
 ],
 master_zone: eu-east,
 zones: [
 {
 name: eu-east,
 endpoints: [
 http:\/\/ceph4:80\/
 ],
 log_meta: true,
 log_data: true,
 bucket_index_max_shards: 0
 },
 {
 name: eu-west,
 endpoints: [
 http:\/\/ceph3:80\/
 ],
 log_meta: true,
 log_data: true,
 bucket_index_max_shards: 0
 }
 ],
 placement_targets: [
 {
 name: default-placement,
 tags: []
 }
 ],
 default_placement: default-placement
 }
 },
 {
 key: us,
 val: {
 name: us,
 api_name: us,
 is_master: true,
 endpoints: [
 http:\/\/ceph2:80\/,
 http:\/\/ceph1:80\/

Note that you have ceph1:80 specified as an endpoint to the region.
This is then used for the bucket creation. This one should only
include the master endpoint.

Yehuda

 ],
 master_zone: us-east,
 zones: [
 {
 name: us-east,
 endpoints: [
 http:\/\/ceph2:80\/
 ],
 log_meta: true,
 log_data: true,
 bucket_index_max_shards: 0
 },
 {
 name: us-west,
 endpoints: [
 http:\/\/ceph1:80\/
 ],
 log_meta: true,
 log_data: true,
 bucket_index_max_shards: 0
 }
 ],
 placement_targets: [
 {
 name: default-placement,
 tags: []
 }
 ],
 default_placement: default-placement
 }
 }
 ],
 master_region: us,
 bucket_quota: {
 enabled: false,
 max_size_kb: -1,
 max_objects: -1
 },
 user_quota: {
 enabled: false,
 max_size_kb: -1,
 max_objects: -1
 }
 }

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw (0.87) and multipart upload (result object size = 0)

2015-01-29 Thread Dong Yuan
I am curious whether the object can be uploaded without MultiUpload,
so we can determine which part is wrong.

On 21 January 2015 at 09:15, Gleb Borisov borisov.g...@gmail.com wrote:
 Hi,

 We're experiencing some issues with our radosgw setup. Today we tried to
 copy bunch of objects between two separate clusters (using our own tool
 built on top of java s3 api).

 All went smooth until we start copying large objects (200G+). We can see
 that our code handles this case correctly and started multipart upload
 (s3.initiateMultipartUpload), then it uploaded all the parts in serial mode
 (s3.uploadPart) and finally completed upload (s3.completeMultipartUpload).

 When we've checked consistency of two clusters we found that we have a lot
 of zero-sized objects (which turns to be our large objects).

 I've made more verbose log from radosgw:

 two requests (put_obj, complete_multipart) -
 https://gist.github.com/anonymous/840e0aee5a7ce0326368 (all finished with
 200)

 radosgw-admin object stat output:
 https://gist.github.com/anonymous/2b6771bbbad3021364e2

 We've tried to upload these objects several times without any luck.

 # radosgw --version
 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)

 Thanks in advance.

 --
 Best regards,
 Gleb M Borisov

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
Dong Yuan
Email:yuandong1...@gmail.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH BackUPs

2015-01-29 Thread Christian Balzer
On Fri, 30 Jan 2015 01:22:53 +0200 Georgios Dimitrakakis wrote:

  Urged by a previous post by Mike Winfield where he suffered a leveldb 
  loss
  I would like to know which files are critical for CEPH operation and 
  must
  be backed-up regularly and how are you people doing it?
 
Aside from probably being quite hard/disruptive to back up a monitor
leveldb, it will also be quite pointless, as it constantly changes.

This is why one has at least 3 monitors on different machines that are on
different UPS backed circuits and storing things on SSDs that are also
power failure proof.
And if a monitor gets destroyed like that, the official fix suggested by
the Ceph developers is to re-create it from scratch and let it catch up to
the good monitors. 

That being said, aside from a backup of the actual data on the cluster
(which is another challenge), one wonders if in Mike's case a RBD FSCK
of sorts can be created that is capable of restoring things based on the
actual data still on the OSDs.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw (0.87) and multipart upload (result object size = 0)

2015-01-29 Thread Yehuda Sadeh
I assume that the problem is not with the object itself, but with one
of the upload mechanism (either client, or rgw, or both). I would be
curious, however, to see if a different S3 client (not the homebrew
one) could upload the object correctly using multipart upload.

Yehuda

On Thu, Jan 29, 2015 at 7:54 PM, Dong Yuan yuandong1...@gmail.com wrote:
 I am curious whether the object can be uploaded without MultiUpload,
 so we can determine which part is wrong.

 On 21 January 2015 at 09:15, Gleb Borisov borisov.g...@gmail.com wrote:
 Hi,

 We're experiencing some issues with our radosgw setup. Today we tried to
 copy bunch of objects between two separate clusters (using our own tool
 built on top of java s3 api).

 All went smooth until we start copying large objects (200G+). We can see
 that our code handles this case correctly and started multipart upload
 (s3.initiateMultipartUpload), then it uploaded all the parts in serial mode
 (s3.uploadPart) and finally completed upload (s3.completeMultipartUpload).

 When we've checked consistency of two clusters we found that we have a lot
 of zero-sized objects (which turns to be our large objects).

 I've made more verbose log from radosgw:

 two requests (put_obj, complete_multipart) -
 https://gist.github.com/anonymous/840e0aee5a7ce0326368 (all finished with
 200)

 radosgw-admin object stat output:
 https://gist.github.com/anonymous/2b6771bbbad3021364e2

 We've tried to upload these objects several times without any luck.

 # radosgw --version
 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)

 Thanks in advance.

 --
 Best regards,
 Gleb M Borisov

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --
 Dong Yuan
 Email:yuandong1...@gmail.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw (0.87) and multipart upload (result object size = 0)

2015-01-29 Thread Yehuda Sadeh
On Tue, Jan 20, 2015 at 5:15 PM, Gleb Borisov borisov.g...@gmail.com wrote:
 Hi,

 We're experiencing some issues with our radosgw setup. Today we tried to
 copy bunch of objects between two separate clusters (using our own tool
 built on top of java s3 api).

 All went smooth until we start copying large objects (200G+). We can see
 that our code handles this case correctly and started multipart upload
 (s3.initiateMultipartUpload), then it uploaded all the parts in serial mode
 (s3.uploadPart) and finally completed upload (s3.completeMultipartUpload).

 When we've checked consistency of two clusters we found that we have a lot
 of zero-sized objects (which turns to be our large objects).

 I've made more verbose log from radosgw:

 two requests (put_obj, complete_multipart) -
 https://gist.github.com/anonymous/840e0aee5a7ce0326368 (all finished with
 200)

 radosgw-admin object stat output:
 https://gist.github.com/anonymous/2b6771bbbad3021364e2

 We've tried to upload these objects several times without any luck.

 # radosgw --version
 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)


It's hard to say much from these specific logs. Maybe if you could
provide some extra log that includes the http headers of the requests,
and also add 'debug ms = 1'.

Thanks,
Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CEPH BackUPs

2015-01-29 Thread Georgios Dimitrakakis
Urged by a previous post by Mike Winfield where he suffered a leveldb 
loss
I would like to know which files are critical for CEPH operation and 
must

be backed-up regularly and how are you people doing it?

Any points much appreciated!

Regards,

G.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW region metadata sync prevents writes to non-master region

2015-01-29 Thread Mark Kirkwood

On 30/01/15 12:34, Yehuda Sadeh wrote:

On Thu, Jan 29, 2015 at 3:27 PM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:

On 30/01/15 11:08, Yehuda Sadeh wrote:


How does your regionmap look like? Is it updated correctly on all zones?



Regionmap listed below - checking it on all 4 zones produces exactly the
same output (md5sum is same):




 {
 key: us,
 val: {
 name: us,
 api_name: us,
 is_master: true,
 endpoints: [
 http:\/\/ceph2:80\/,
 http:\/\/ceph1:80\/


Note that you have ceph1:80 specified as an endpoint to the region.
This is then used for the bucket creation. This one should only
include the master endpoint.




Cool, thanks - I was unclear about which endpoint(s) should be listed 
for a region. I'll change 'em and try again.


Cheers

Mark

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com