Re: [ceph-users] Ceph osd is all up and in, but every pg is incomplete

2015-03-30 Thread Yueliang
I think there no other way. :)

-- 
Yueliang
Sent with Airmail

On March 30, 2015 at 13:17:55, Kai KH Huang (huangk...@lenovo.com) wrote:

Thanks for the quick response, and it seems to work! But what I expect to have 
is (replica number = 3) on two servers ( 1 host will store 2 copies, and the 
other store the 3rd one -- do deal with disk failure, rather only server 
failure).  Is there a simple way to configure that, rather than building a 
custom CRUSH map?


From: Yueliang [yueliang9...@gmail.com]
Sent: Monday, March 30, 2015 12:04 PM
To: ceph-users@lists.ceph.com; Kai KH Huang
Subject: Re: [ceph-users] Ceph osd is all up and in, but every pg is incomplete

Hi  Kai KH

ceph -s report 493 pgs undersized”, I guess you create the pool with default 
parameter size=3, but you only have two host, so there it not enough host two 
service the pool. you should add host or set size=2 when create pool or modify 
crush rule.

-- 
Yueliang
Sent with Airmail

On March 30, 2015 at 11:16:38, Kai KH Huang (huangk...@lenovo.com) wrote:

Hi, all
    I'm a newbie to Ceph, and just setup a whole new Ceph cluster (0.87) with 
two servers. But when its status is always warning:

[root@serverA ~]# ceph osd tree
# id    weight  type name   up/down reweight
-1  62.04   root default
-2  36.4    host serverA
0   3.64    osd.0   up  1
2   3.64    osd.2   up  1
1   3.64    osd.1   up  1
3   3.64    osd.3   up  1
4   3.64    osd.4   up  1
5   3.64    osd.5   up  1
6   3.64    osd.6   up  1
7   3.64    osd.7   up  1
8   3.64    osd.8   up  1
9   3.64    osd.9   up  1
-3  25.64   host serverB
10  3.64    osd.10  up  1
11  2   osd.11  up  1
12  2   osd.12  up  1
13  2   osd.13  up  1
14  2   osd.14  up  1
15  2   osd.15  up  1
16  2   osd.16  up  1
17  2   osd.17  up  1
18  2   osd.18  up  1
19  2   osd.19  up  1
20  2   osd.20  up  1
21  2   osd.21  up  1


[root@serverA ~]# ceph -s
    cluster ???169715
 health HEALTH_WARN 493 pgs degraded; 19 pgs peering; 19 pgs stuck 
inactive; 512 pgs stuck unclean; 493 pgs undersized
 monmap e1: 2 mons at 
{serverB=10.??.78:6789/0,serverA=10.?.80:6789/0}, election epoch 10, 
quorum 0,1 mac0090fa6aaf7a,mac0090fa6ab68a
 osdmap e92634: 22 osds: 22 up, 22 in
  pgmap v189018: 512 pgs, 1 pools, 0 bytes data, 0 objects
    49099 MB used, 63427 GB / 63475 GB avail
 493 active+undersized+degraded
  19 creating+peering

[root@serverA ~]# rados -p test31 ls
2015-03-30 09:57:18.607143 7f5251fcf700  0 -- :/1005913  10.??.78:6789/0 
pipe(0x140a370 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x140a600).fault
2015-03-30 09:57:21.610994 7f52484ad700  0 -- 10..80:0/1005913  
10..78:6835/27111 pipe(0x140e010 sd=4 :0 s=1 pgs=0 cs=0 l=1 
c=0x140e2a0).fault
2015-03-30 10:02:21.650191 7f52482ab700  0 -- 10..80:0/1005913  
10.78:6835/27111 pipe(0x7f5238016c80 sd=5 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f5238016f10).fault

* serverA is 10.???.80, serverB is 10..78
* ntpdate is updated
* I tried to remove the pool and re-create it, and clean up all objects inside, 
but no change at all
* firewall are both shutoff

Any clue is welcomed, thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph osd is all up and in, but every pg is incomplete

2015-03-30 Thread Kai KH Huang
Another strange thing is that the last few (24) pg seems never get ready and 
stuck at creating (after 6 hours of waiting):

[root@serverA ~]# ceph -s
2015-03-30 17:14:48.720396 7feb5bd7a700  0 -- :/1000964  10.???.78:6789/0 
pipe(0x7feb60026120 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7feb600263b0).fault
cluster c09277a4-0eb9-41b1-b27f-a345c0169715
 health HEALTH_WARN 24 pgs peering; 24 pgs stuck inactive; 24 pgs stuck 
unclean
 monmap e1: 2 mons at 
{mac0090fa6aaf7a=10.240.212.78:6789/0,mac0090fa6ab68a=10.???.80:6789/0}, 
election epoch 10, quorum 0,1 mac0090fa6aaf7a,mac0090fa6ab68a
 osdmap e102839: 22 osds: 22 up, 22 in
  pgmap v210270: 512 pgs, 1 pools, 0 bytes data, 0 objects
51633 MB used, 63424 GB / 63475 GB avail
  24 creating+peering
 488 active+clean

And I cannot retrieve the file at ServerA, which I put into Ceph cluster at 
ServerB:

[root@serverA ~]# rados -p test32 get test.txt test.txt
2015-03-30 17:15:44.014158 7f06951b6700  0 -- 10.???.80:0/1002224  
10.???.78:6867/29047 pipe(0x21e0f90 sd=5 :0 s=1 pgs=0 cs=0 l=1 
c=0x21e1220).fault

2015-03-30 17:16:36.066125 7f0694fb4700  0 -- 10.???.80:0/1002224  
10..78:6867/29047 pipe(0x7f068000d880 sd=6 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f068000db10).fault

It looks it just hang there forever. Is it waiting for all pg to be ready? Or 
the ceph cluster is at error state?

From: Yueliang [yueliang9...@gmail.com]
Sent: Monday, March 30, 2015 1:50 PM
To: ceph-users@lists.ceph.com; Kai KH Huang
Subject: RE: [ceph-users] Ceph osd is all up and in, but every pg is incomplete

I think there no other way. :)

--
Yueliang
Sent with Airmail


On March 30, 2015 at 13:17:55, Kai KH Huang 
(huangk...@lenovo.commailto:huangk...@lenovo.com) wrote:

Thanks for the quick response, and it seems to work! But what I expect to have 
is (replica number = 3) on two servers ( 1 host will store 2 copies, and the 
other store the 3rd one -- do deal with disk failure, rather only server 
failure).  Is there a simple way to configure that, rather than building a 
custom CRUSH map?



From: Yueliang [yueliang9...@gmail.com]
Sent: Monday, March 30, 2015 12:04 PM
To: ceph-users@lists.ceph.com; Kai KH Huang
Subject: Re: [ceph-users] Ceph osd is all up and in, but every pg is incomplete

Hi  Kai KH

ceph -s report 493 pgs undersized”, I guess you create the pool with default 
parameter size=3, but you only have two host, so there it not enough host two 
service the pool. you should add host or set size=2 when create pool or modify 
crush rule.

--
Yueliang
Sent with Airmail


On March 30, 2015 at 11:16:38, Kai KH Huang 
(huangk...@lenovo.commailto:huangk...@lenovo.com) wrote:

Hi, all
I'm a newbie to Ceph, and just setup a whole new Ceph cluster (0.87) with 
two servers. But when its status is always warning:

[root@serverA ~]# ceph osd tree
# idweight  type name   up/down reweight
-1  62.04   root default
-2  36.4host serverA
0   3.64osd.0   up  1
2   3.64osd.2   up  1
1   3.64osd.1   up  1
3   3.64osd.3   up  1
4   3.64osd.4   up  1
5   3.64osd.5   up  1
6   3.64osd.6   up  1
7   3.64osd.7   up  1
8   3.64osd.8   up  1
9   3.64osd.9   up  1
-3  25.64   host serverB
10  3.64osd.10  up  1
11  2   osd.11  up  1
12  2   osd.12  up  1
13  2   osd.13  up  1
14  2   osd.14  up  1
15  2   osd.15  up  1
16  2   osd.16  up  1
17  2   osd.17  up  1
18  2   osd.18  up  1
19  2   osd.19  up  1
20  2   osd.20  up  1
21  2   osd.21  up  1


[root@serverA ~]# ceph -s
cluster ???169715
 health HEALTH_WARN 493 pgs degraded; 19 pgs peering; 19 pgs stuck 
inactive; 512 pgs stuck unclean; 493 pgs undersized
 monmap e1: 2 mons at 
{serverB=10.??.78:6789/0,serverA=10.?.80:6789/0}, election epoch 10, 
quorum 0,1 mac0090fa6aaf7a,mac0090fa6ab68a
 osdmap e92634: 22 osds: 22 up, 22 in
  pgmap v189018: 512 pgs, 1 pools, 0 bytes data, 0 objects
49099 MB used, 63427 GB / 63475 GB avail
 493 active+undersized+degraded
  19 creating+peering

[root@serverA ~]# rados -p test31 ls
2015-03-30 09:57:18.607143 7f5251fcf700  0 -- :/1005913  10.??.78:6789/0 
pipe(0x140a370 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x140a600).fault
2015-03-30 09:57:21.610994 7f52484ad700  0 -- 

[ceph-users] How to test rbd's Copy-on-Read Feature

2015-03-30 Thread Tanay Ganguly
Hello All,

I went through the below link and checked that Copy-on-Read is currently
supported only on librbd and not on rbd kernel module.

https://wiki.ceph.com/Planning/Blueprints/Infernalis/rbd%3A_kernel_rbd_client_supports_copy-on-read

Can someone please let me know how to test Copy-on-Read using librbd ?
What should be the IO pattern to see the performance changes.

Any pointers appreciated.

Thank You,
Tanay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] running Qemu / Hypervisor AND Ceph on the same nodes

2015-03-30 Thread Gurvinder Singh
On 03/30/2015 01:29 PM, Mark Nelson wrote:
 This is definitely something that we've discussed, though I don't think
 anyone has really planned out what a complete solution would look like
 including processor affinity, etc.  Before I joined inktank I worked at
 a supercomputing institute and one of the projects we worked on was to
 develop grid computing tools for bioinformatics research.  Moving
 analytics rather than the data was a big topic for us too since genomics
 data at least tends to be pretty big.
Interestingly I work for supercomputing/education research company and
we are thinking for similar use case and purpose. So are interesting to
know other people managing resources this way.

- Gurvinder
  Potentially ceph could be a very
 interesting solution for that kind of thing.
 
 Mark
 
 On 03/30/2015 06:20 AM, Gurvinder Singh wrote:
 One interesting use case of combining Ceph with computing is running big
 data jobs on ceph itself. As with CephFS coming along, you can run
 Haddop/Spark jobs directly on ceph without needed to move your data to
 compute resources with data locality support. I am wondering if anyone
 in community is looking at combining storage and compute resources from
 this point of view.

 Regards,
 Gurvinder
 On 03/29/2015 09:19 PM, Nick Fisk wrote:
 There's probably a middle ground where you get the best of both worlds.
 Maybe 2-4 OSD's per compute node alongside dedicated Ceph nodes. That
 way
 you get a bit of extra storage and can still use lower end CPU's, but
 don't
 have to worry so much about resource contention.

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
 Behalf Of
 Martin Millnert
 Sent: 29 March 2015 19:58
 To: Mark Nelson
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] running Qemu / Hypervisor AND Ceph on the
 same
 nodes

 On Thu, Mar 26, 2015 at 12:36:53PM -0500, Mark Nelson wrote:
 Having said that, small nodes are
 absolutely more expensive per OSD as far as raw hardware and
 power/cooling goes.

 The smaller volume manufacturers have on the units, the worse the
 margin
 typically (from buyers side).  Also, CPUs typically run up a premium
 the
 higher
 you go.  I've found a lot of local maximas, optimization-wise, over the
 past
 years both in 12 OSD/U vs 18 OSD/U dedicated storage node setups, for
 instance.
There may be local maximas along colocated low-scale storage/compute
 nodes, but the one major problem with colocating storage with
 compute is
 that you can't scale compute independently from storage efficiently, on
 using that building block alone.  There may be temporal
 optimizations in
 doing so however (e.g. before you have reached sufficient scale).

 There's no single optimal answer when you're dealing with 20+
 variables to
 consider... :)

 BR,
 Martin




 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw authorization failed

2015-03-30 Thread Neville
 
 Date: Wed, 25 Mar 2015 11:43:44 -0400
 From: yeh...@redhat.com
 To: neville.tay...@hotmail.co.uk
 CC: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Radosgw authorization failed
 
 
 
 - Original Message -
  From: Neville neville.tay...@hotmail.co.uk
  To: ceph-users@lists.ceph.com
  Sent: Wednesday, March 25, 2015 8:16:39 AM
  Subject: [ceph-users] Radosgw authorization failed
  
  Hi all,
  
  I'm testing backup product which supports Amazon S3 as target for Archive
  storage and I'm trying to setup a Ceph cluster configured with the S3 API to
  use as an internal target for backup archives instead of AWS.
  
  I've followed the online guide for setting up Radosgw and created a default
  region and zone based on the AWS naming convention US-East-1. I'm not sure
  if this is relevant but since I was having issues I thought it might need to
  be the same.
  
  I've tested the radosgw using boto.s3 and it seems to work ok i.e. I can
  create a bucket, create a folder, list buckets etc. The problem is when the
  backup software tries to create an object I get an authorization failure.
  It's using the same user/access/secret as I'm using from boto.s3 and I'm
  sure the creds are right as it lets me create the initial connection, it
  just fails when trying to create an object (backup folder).
  
  Here's the extract from the radosgw log:
  
  -
  2015-03-25 15:07:26.449227 7f1050dc7700 2 req 5:0.000419:s3:GET
  /:list_bucket:init op
  2015-03-25 15:07:26.449232 7f1050dc7700 2 req 5:0.000424:s3:GET
  /:list_bucket:verifying op mask
  2015-03-25 15:07:26.449234 7f1050dc7700 20 required_mask= 1 user.op_mask=7
  2015-03-25 15:07:26.449235 7f1050dc7700 2 req 5:0.000427:s3:GET
  /:list_bucket:verifying op permissions
  2015-03-25 15:07:26.449237 7f1050dc7700 5 Searching permissions for uid=test
  mask=49
  2015-03-25 15:07:26.449238 7f1050dc7700 5 Found permission: 15
  2015-03-25 15:07:26.449239 7f1050dc7700 5 Searching permissions for group=1
  mask=49
  2015-03-25 15:07:26.449240 7f1050dc7700 5 Found permission: 15
  2015-03-25 15:07:26.449241 7f1050dc7700 5 Searching permissions for group=2
  mask=49
  2015-03-25 15:07:26.449242 7f1050dc7700 5 Found permission: 15
  2015-03-25 15:07:26.449243 7f1050dc7700 5 Getting permissions id=test
  owner=test perm=1
  2015-03-25 15:07:26.449244 7f1050dc7700 10 uid=test requested perm (type)=1,
  policy perm=1, user_perm_mask=1, acl perm=1
  2015-03-25 15:07:26.449245 7f1050dc7700 2 req 5:0.000437:s3:GET
  /:list_bucket:verifying op params
  2015-03-25 15:07:26.449247 7f1050dc7700 2 req 5:0.000439:s3:GET
  /:list_bucket:executing
  2015-03-25 15:07:26.449252 7f1050dc7700 10 cls_bucket_list
  test1(@{i=.us-east.rgw.buckets.index}.us-east.rgw.buckets[us-east.280959.2])
  start num 1001
  2015-03-25 15:07:26.450828 7f1050dc7700 2 req 5:0.002020:s3:GET
  /:list_bucket:http status=200
  2015-03-25 15:07:26.450832 7f1050dc7700 1 == req done req=0x7f107000e2e0
  http_status=200 ==
  2015-03-25 15:07:26.516999 7f1069df9700 20 enqueued request
  req=0x7f107000f0e0
  2015-03-25 15:07:26.517006 7f1069df9700 20 RGWWQ:
  2015-03-25 15:07:26.517007 7f1069df9700 20 req: 0x7f107000f0e0
  2015-03-25 15:07:26.517010 7f1069df9700 10 allocated request
  req=0x7f107000f6b0
  2015-03-25 15:07:26.517021 7f1058dd7700 20 dequeued request
  req=0x7f107000f0e0
  2015-03-25 15:07:26.517023 7f1058dd7700 20 RGWWQ: empty
  2015-03-25 15:07:26.517081 7f1058dd7700 20 CONTENT_LENGTH=88
  2015-03-25 15:07:26.517084 7f1058dd7700 20
  CONTENT_TYPE=application/octet-stream
  2015-03-25 15:07:26.517085 7f1058dd7700 20 CONTEXT_DOCUMENT_ROOT=/var/www
  2015-03-25 15:07:26.517086 7f1058dd7700 20 CONTEXT_PREFIX=
  2015-03-25 15:07:26.517087 7f1058dd7700 20 DOCUMENT_ROOT=/var/www
  2015-03-25 15:07:26.517088 7f1058dd7700 20 FCGI_ROLE=RESPONDER
  2015-03-25 15:07:26.517089 7f1058dd7700 20 GATEWAY_INTERFACE=CGI/1.1
  2015-03-25 15:07:26.517090 7f1058dd7700 20 HTTP_AUTHORIZATION=AWS
  F79L68W19B3GCLOSE3F8:AcXqtvlBzBMpwdL+WuhDRoLT/Bs=
  2015-03-25 15:07:26.517091 7f1058dd7700 20 HTTP_CONNECTION=Keep-Alive
  2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_DATE=Wed, 25 Mar 2015
  15:07:26 GMT
  2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_EXPECT=100-continue
  2015-03-25 15:07:26.517093 7f1058dd7700 20
  HTTP_HOST=test1.devops-os-cog01.devops.local
  2015-03-25 15:07:26.517094 7f1058dd7700 20
  HTTP_USER_AGENT=aws-sdk-java/unknown-version Windows_Server_2008_R2/6.1
  Java_HotSpot(TM)_Client_VM/24.55-b03
  2015-03-25 15:07:26.517096 7f1058dd7700 20
  HTTP_X_AMZ_META_CREATIONTIME=2015-03-25T15:07:26
  2015-03-25 15:07:26.517097 7f1058dd7700 20 HTTP_X_AMZ_META_SIZE=88
  2015-03-25 15:07:26.517098 7f1058dd7700 20 HTTP_X_AMZ_STORAGE_CLASS=STANDARD
  2015-03-25 15:07:26.517099 7f1058dd7700 20 HTTPS=on
  2015-03-25 15:07:26.517100 7f1058dd7700 20
  

Re: [ceph-users] running Qemu / Hypervisor AND Ceph on the same nodes

2015-03-30 Thread Haomai Wang
We have a related topic in CDS about
hadoop+ceph(https://wiki.ceph.com/Planning/Blueprints/Infernalis/rgw%3A_Hadoop_FileSystem_Interface_for_a_RADOS_Gateway_Caching_Tier).
It's not directly solve the data locality problem but try to avoid
data migration between different storage cluster.

It would be great if big data framework like Hadoop, Spark can export
interface to let ceph or other storage backend aware of compute job
schedule. And a new project tachyon(tachyon-project.org) is doing
something like this.



On Mon, Mar 30, 2015 at 7:20 PM, Gurvinder Singh
gurvindersinghdah...@gmail.com wrote:
 One interesting use case of combining Ceph with computing is running big
 data jobs on ceph itself. As with CephFS coming along, you can run
 Haddop/Spark jobs directly on ceph without needed to move your data to
 compute resources with data locality support. I am wondering if anyone
 in community is looking at combining storage and compute resources from
 this point of view.

 Regards,
 Gurvinder
 On 03/29/2015 09:19 PM, Nick Fisk wrote:
 There's probably a middle ground where you get the best of both worlds.
 Maybe 2-4 OSD's per compute node alongside dedicated Ceph nodes. That way
 you get a bit of extra storage and can still use lower end CPU's, but don't
 have to worry so much about resource contention.

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Martin Millnert
 Sent: 29 March 2015 19:58
 To: Mark Nelson
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] running Qemu / Hypervisor AND Ceph on the same
 nodes

 On Thu, Mar 26, 2015 at 12:36:53PM -0500, Mark Nelson wrote:
 Having said that, small nodes are
 absolutely more expensive per OSD as far as raw hardware and
 power/cooling goes.

 The smaller volume manufacturers have on the units, the worse the margin
 typically (from buyers side).  Also, CPUs typically run up a premium the
 higher
 you go.  I've found a lot of local maximas, optimization-wise, over the
 past
 years both in 12 OSD/U vs 18 OSD/U dedicated storage node setups, for
 instance.
   There may be local maximas along colocated low-scale storage/compute
 nodes, but the one major problem with colocating storage with compute is
 that you can't scale compute independently from storage efficiently, on
 using that building block alone.  There may be temporal optimizations in
 doing so however (e.g. before you have reached sufficient scale).

 There's no single optimal answer when you're dealing with 20+ variables to
 consider... :)

 BR,
 Martin




 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] running Qemu / Hypervisor AND Ceph on the same nodes

2015-03-30 Thread Mark Nelson
This is definitely something that we've discussed, though I don't think 
anyone has really planned out what a complete solution would look like 
including processor affinity, etc.  Before I joined inktank I worked at 
a supercomputing institute and one of the projects we worked on was to 
develop grid computing tools for bioinformatics research.  Moving 
analytics rather than the data was a big topic for us too since genomics 
data at least tends to be pretty big.  Potentially ceph could be a very 
interesting solution for that kind of thing.


Mark

On 03/30/2015 06:20 AM, Gurvinder Singh wrote:

One interesting use case of combining Ceph with computing is running big
data jobs on ceph itself. As with CephFS coming along, you can run
Haddop/Spark jobs directly on ceph without needed to move your data to
compute resources with data locality support. I am wondering if anyone
in community is looking at combining storage and compute resources from
this point of view.

Regards,
Gurvinder
On 03/29/2015 09:19 PM, Nick Fisk wrote:

There's probably a middle ground where you get the best of both worlds.
Maybe 2-4 OSD's per compute node alongside dedicated Ceph nodes. That way
you get a bit of extra storage and can still use lower end CPU's, but don't
have to worry so much about resource contention.


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Martin Millnert
Sent: 29 March 2015 19:58
To: Mark Nelson
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] running Qemu / Hypervisor AND Ceph on the same
nodes

On Thu, Mar 26, 2015 at 12:36:53PM -0500, Mark Nelson wrote:

Having said that, small nodes are
absolutely more expensive per OSD as far as raw hardware and
power/cooling goes.


The smaller volume manufacturers have on the units, the worse the margin
typically (from buyers side).  Also, CPUs typically run up a premium the

higher

you go.  I've found a lot of local maximas, optimization-wise, over the

past

years both in 12 OSD/U vs 18 OSD/U dedicated storage node setups, for
instance.
   There may be local maximas along colocated low-scale storage/compute
nodes, but the one major problem with colocating storage with compute is
that you can't scale compute independently from storage efficiently, on
using that building block alone.  There may be temporal optimizations in
doing so however (e.g. before you have reached sufficient scale).

There's no single optimal answer when you're dealing with 20+ variables to
consider... :)

BR,
Martin





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] running Qemu / Hypervisor AND Ceph on the same nodes

2015-03-30 Thread Gurvinder Singh
One interesting use case of combining Ceph with computing is running big
data jobs on ceph itself. As with CephFS coming along, you can run
Haddop/Spark jobs directly on ceph without needed to move your data to
compute resources with data locality support. I am wondering if anyone
in community is looking at combining storage and compute resources from
this point of view.

Regards,
Gurvinder
On 03/29/2015 09:19 PM, Nick Fisk wrote:
 There's probably a middle ground where you get the best of both worlds.
 Maybe 2-4 OSD's per compute node alongside dedicated Ceph nodes. That way
 you get a bit of extra storage and can still use lower end CPU's, but don't
 have to worry so much about resource contention.
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Martin Millnert
 Sent: 29 March 2015 19:58
 To: Mark Nelson
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] running Qemu / Hypervisor AND Ceph on the same
 nodes

 On Thu, Mar 26, 2015 at 12:36:53PM -0500, Mark Nelson wrote:
 Having said that, small nodes are
 absolutely more expensive per OSD as far as raw hardware and
 power/cooling goes.

 The smaller volume manufacturers have on the units, the worse the margin
 typically (from buyers side).  Also, CPUs typically run up a premium the
 higher
 you go.  I've found a lot of local maximas, optimization-wise, over the
 past
 years both in 12 OSD/U vs 18 OSD/U dedicated storage node setups, for
 instance.
   There may be local maximas along colocated low-scale storage/compute
 nodes, but the one major problem with colocating storage with compute is
 that you can't scale compute independently from storage efficiently, on
 using that building block alone.  There may be temporal optimizations in
 doing so however (e.g. before you have reached sufficient scale).

 There's no single optimal answer when you're dealing with 20+ variables to
 consider... :)

 BR,
 Martin
 
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Creating and deploying OSDs in parallel

2015-03-30 Thread Somnath Roy
Hi,
I am planning to modify our deployment script so that it can create and deploy 
multiple OSDs in parallel to the same host as well as on different hosts.
Just wanted to check if there is any problem to run say 'ceph-deploy osd 
create' etc. in parallel while deploying cluster.

Thanks  Regards
Somnath



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Slow writes with 1MB files

2015-03-30 Thread Gregory Farnum
On Sat, Mar 28, 2015 at 10:12 AM, Barclay Jameson
almightybe...@gmail.com wrote:
 I redid my entire Ceph build going back to to CentOS 7 hoping to the
 get the same performance I did last time.
 The rados bench test was the best I have ever had with a time of 740
 MB wr and 1300 MB rd. This was even better than the first rados bench
 test that had performance equal to PanFS. I find that this does not
 translate to my CephFS. Even with the following tweaking it still at
 least twice as slow as PanFS and my first *Magical* build (that had
 absolutely no tweaking):

 OSD
  osd_op_treads 8
  /sys/block/sd*/queue/nr_requests 4096
  /sys/block/sd*/queue/read_ahead_kb 4096

 Client
  rsize=16777216
  readdir_max_bytes=16777216
  readdir_max_entries=16777216

 ~160 mins to copy 10 (1MB) files for CephFS vs ~50 mins for PanFS.
 Throughput on CephFS is about 10MB/s vs PanFS 30 MB/s.

 Strange thing is none of the resources are taxed.
 CPU, ram, network, disks, are not even close to being taxed on either
 the client,mon/mds, or the osd nodes.
 The PanFS client node was a 10Gb network the same as the CephFS client
 but you can see the huge difference in speed.

 As per Gregs questions before:
 There is only one client reading and writing (time cp Small1/*
 Small2/.) but three clients have cephfs mounted, although they aren't
 doing anything on the filesystem.

 I have done another test where I stream data info a file as fast as
 the processor can put it there.
 (for (i=0; i  11; i++){ fprintf (out_file, I is : %d\n,i);}
 ) and it is faster than the PanFS. CephFS 16GB in 105 seconds with the
 above tuning vs 130 seconds for PanFS. Without the tuning it takes 230
 seconds for CephFS although the first build did it in 130 seconds
 without any tuning.

 This leads me to believe the bottleneck is the mds. Does anybody have
 any thoughts on this?
 Are there any tuning parameters that I would need to speed up the mds?

This is pretty likely, but 10 creates/second is just impossibly slow.
The only other thing I can think of is that you might have enabled
fragmentation but aren't now, which might make an impact on a
directory with 100k entries.

Or else your hardware is just totally wonky, which we've seen in the
past but your server doesn't look quite large enough to be hitting any
of the nasty NUMA stuff...but that's something else to look at which I
can't help you with, although maybe somebody else can.

If you're interested in diving into it and depending on the Ceph
version you're running you can also examine the mds perfcounters
(http://ceph.com/docs/master/dev/perf_counters/) and the op history
(dump_ops_in_flight etc) and look for any operations which are
noticeably slow.
-Greg


 On Fri, Mar 27, 2015 at 4:50 PM, Gregory Farnum g...@gregs42.com wrote:
 On Fri, Mar 27, 2015 at 2:46 PM, Barclay Jameson
 almightybe...@gmail.com wrote:
 Yes it's the exact same hardware except for the MDS server (although I
 tried using the MDS on the old node).
 I have not tried moving the MON back to the old node.

 My default cache size is mds cache size = 1000
 The OSDs (3 of them) have 16 Disks with 4 SSD Journal Disks.
 I created 2048 for data and metadata:
 ceph osd pool create cephfs_data 2048 2048
 ceph osd pool create cephfs_metadata 2048 2048


 To your point on clients competing against each other... how would I check 
 that?

 Do you have multiple clients mounted? Are they both accessing files in
 the directory(ies) you're testing? Were they accessing the same
 pattern of files for the old cluster?

 If you happen to be running a hammer rc or something pretty new you
 can use the MDS admin socket to explore a bit what client sessions
 there are and what they have permissions on and check; otherwise
 you'll have to figure it out from the client side.
 -Greg


 Thanks for the input!


 On Fri, Mar 27, 2015 at 3:04 PM, Gregory Farnum g...@gregs42.com wrote:
 So this is exactly the same test you ran previously, but now it's on
 faster hardware and the test is slower?

 Do you have more data in the test cluster? One obvious possibility is
 that previously you were working entirely in the MDS' cache, but now
 you've got more dentries and so it's kicking data out to RADOS and
 then reading it back in.

 If you've got the memory (you appear to) you can pump up the mds
 cache size config option quite dramatically from it's default 10.

 Other things to check are that you've got an appropriately-sized
 metadata pool, that you've not got clients competing against each
 other inappropriately, etc.
 -Greg

 On Fri, Mar 27, 2015 at 9:47 AM, Barclay Jameson
 almightybe...@gmail.com wrote:
 Opps I should have said that I am not just writing the data but copying 
 it :

 time cp Small1/* Small2/*

 Thanks,

 BJ

 On Fri, Mar 27, 2015 at 11:40 AM, Barclay Jameson
 almightybe...@gmail.com wrote:
 I did a Ceph cluster install 2 weeks ago where I was getting great
 performance (~= PanFS) where I could write 100,000 1MB files in 

Re: [ceph-users] SSD Journaling

2015-03-30 Thread Gregory Farnum
On Mon, Mar 30, 2015 at 1:01 PM, Garg, Pankaj
pankaj.g...@caviumnetworks.com wrote:
 Hi,

 I’m benchmarking my small cluster with HDDs vs HDDs with SSD Journaling. I
 am using both RADOS bench and Block device (using fio) for testing.

 I am seeing significant Write performance improvements, as expected. I am
 however seeing the Reads coming out a bit slower on the SSD Journaling side.
 They are not terribly different, but sometimes 10% slower.

 Is that something other folks have also seen, or do I need some settings to
 be tuned properly? I’m wondering if accessing 2 drives for reads, adds
 latency and hence the throughput suffers.

You're not reading off of the journal in any case (it's only read on restart).

If I were to guess then the SSD journaling is just building up enough
dirty data ahead of the backing filesystem that if you do a read it
takes a little longer for the data to be readable through the local
filesystem. There have been a number of threads here about configuring
the journal which you might want to grab out of an archiving system
and look at. :)
-Greg




 Thanks

 Pankaj


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] SSD Journaling

2015-03-30 Thread Garg, Pankaj
Hi,
I'm benchmarking my small cluster with HDDs vs HDDs with SSD Journaling. I am 
using both RADOS bench and Block device (using fio) for testing.
I am seeing significant Write performance improvements, as expected. I am 
however seeing the Reads coming out a bit slower on the SSD Journaling side. 
They are not terribly different, but sometimes 10% slower.
Is that something other folks have also seen, or do I need some settings to be 
tuned properly? I'm wondering if accessing 2 drives for reads, adds latency and 
hence the throughput suffers.

Thanks
Pankaj
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Journaling

2015-03-30 Thread Mark Nelson

On 03/30/2015 03:01 PM, Garg, Pankaj wrote:

Hi,

I’m benchmarking my small cluster with HDDs vs HDDs with SSD Journaling.
I am using both RADOS bench and Block device (using fio) for testing.

I am seeing significant Write performance improvements, as expected. I
am however seeing the Reads coming out a bit slower on the SSD
Journaling side. They are not terribly different, but sometimes 10% slower.

Is that something other folks have also seen, or do I need some settings
to be tuned properly? I’m wondering if accessing 2 drives for reads,
adds latency and hence the throughput suffers.


Hi,

What kind of reads are you seeing the degradation with?  Is it 
consistent with different sizes and random/seq?  Any interesting spikes 
or valleys during the tests?




Thanks

Pankaj



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is it possible to change the MDS node after its been created

2015-03-30 Thread Gregory Farnum
On Mon, Mar 30, 2015 at 3:15 PM, Francois Lafont flafdiv...@free.fr wrote:
 Hi,

 Gregory Farnum wrote:

 The MDS doesn't have any data tied to the machine you're running it
 on. You can either create an entirely new one on a different machine,
 or simply copy the config file and cephx keyring to the appropriate
 directories. :)

 Sorry to enter in this post but how can we *remove* a mds daemon of a
 ceph cluster?

 Are the commands below enough?

 stop the daemon
 rm -r /var/lib/ceph/mds/ceph-$id/
 ceph auth del mds.$id

 Should we edit something in the mds map to remove once and for
 all the mds ?

As long as you turn on another MDS which takes over the logical rank
of the MDS you remove, you don't need to remove anything from the
cluster store.

Note that if you just copy the directory and keyring to the new
location you shouldn't do the ceph auth del bit either. ;)
-Greg


 --
 François Lafont

 --
 François Lafont
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is it possible to change the MDS node after its been created

2015-03-30 Thread Gregory Farnum
On Mon, Mar 30, 2015 at 1:51 PM, Steve Hindle mech...@gmail.com wrote:

 Hi!

   I mistakenly created my MDS node on the 'wrong' server a few months back.
 Now I realized I placed it on a machine lacking IPMI and would like to move
 it to another node in my cluster.

   Is it possible to non-destructively move an MDS ?

The MDS doesn't have any data tied to the machine you're running it
on. You can either create an entirely new one on a different machine,
or simply copy the config file and cephx keyring to the appropriate
directories. :)
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Is it possible to change the MDS node after its been created

2015-03-30 Thread Steve Hindle
Hi!

  I mistakenly created my MDS node on the 'wrong' server a few months
back.  Now I realized I placed it on a machine lacking IPMI and would like
to move it to another node in my cluster.

  Is it possible to non-destructively move an MDS ?

Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is it possible to change the MDS node after its been created

2015-03-30 Thread Francois Lafont
Gregory Farnum wrote:

 Sorry to enter in this post but how can we *remove* a mds daemon of a
 ceph cluster?

 Are the commands below enough?

 stop the daemon
 rm -r /var/lib/ceph/mds/ceph-$id/
 ceph auth del mds.$id

 Should we edit something in the mds map to remove once and for
 all the mds ?
 
 As long as you turn on another MDS which takes over the logical rank
 of the MDS you remove, you don't need to remove anything from the
 cluster store.

Ok, and for just remove a mds I guess that commands above are enough. ;)

 Note that if you just copy the directory and keyring to the new
 location you shouldn't do the ceph auth del bit either. ;)

Yes, it seems logical. Thank you Greg. :)

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is it possible to change the MDS node after its been created

2015-03-30 Thread Francois Lafont
Hi,

Gregory Farnum wrote:

 The MDS doesn't have any data tied to the machine you're running it
 on. You can either create an entirely new one on a different machine,
 or simply copy the config file and cephx keyring to the appropriate
 directories. :)

Sorry to enter in this post but how can we *remove* a mds daemon of a
ceph cluster?

Are the commands below enough?

stop the daemon
rm -r /var/lib/ceph/mds/ceph-$id/
ceph auth del mds.$id

Should we edit something in the mds map to remove once and for
all the mds ?

--
François Lafont

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] One host failure bring down the whole cluster

2015-03-30 Thread Kai KH Huang
Hi, all
I have a two-node Ceph cluster, and both are monitor and osd. When they're 
both up, osd are all up and in, everything is fine... almost:

[root~]# ceph -s

 health HEALTH_WARN 25 pgs degraded; 316 pgs incomplete; 85 pgs stale; 24 
pgs stuck degraded; 316 pgs stuck inactive; 85 pgs stuck stale; 343 pgs stuck 
unclean; 24 pgs stuck undersized; 25 pgs undersized; recovery 11/153 objects 
degraded (7.190%)
 monmap e1: 2 mons at 
{server_b=10.???.78:6789/0,server_a=10.???.80:6789/0}, election epoch 14, 
quorum 0,1 server_b,server_a
 osdmap e116375: 22 osds: 22 up, 22 in
  pgmap v238656: 576 pgs, 2 pools, 224 MB data, 59 objects
56175 MB used, 63420 GB / 63475 GB avail
11/153 objects degraded (7.190%)
  15 active+undersized+degraded
  75 stale+active+clean
   2 active+remapped
 158 active+clean
  10 stale+active+undersized+degraded
 316 incomplete


But if I bring down one server, the whole cluster seems not functioning any 
more:

[root~]# ceph -s
2015-03-31 10:32:43.848125 7f57e4105700  0 -- :/1017540  10.???.78:6789/0 
pipe(0x7f57e0027120 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f57e00273b0).fault

This should not happen...Any thoughts?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] One host failure bring down the whole cluster

2015-03-30 Thread Lindsay Mathieson
On Tue, 31 Mar 2015 02:42:27 AM Kai KH Huang wrote:
 Hi, all
 I have a two-node Ceph cluster, and both are monitor and osd. When
 they're both up, osd are all up and in, everything is fine... almost:



Two things.

1 -  You *really* need a min of three monitors. Ceph cannot form a quorum with 
just two monitors and you run a risk of split brain.


2 - You also probably have a min size of two set (the default). This means 
that you need a minimum  of two copies of each data object for writes to work. 
So with just two nodes, if one goes down you can't write to the other.


So:
- Install a extra monitor node - it doesn't have to be powerful, we just use a 
Intel Celeron NUC for that.

- reduce your minimum size to 1 (One).
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] One host failure bring down the whole cluster

2015-03-30 Thread Lindsay Mathieson
On Tue, 31 Mar 2015 02:42:27 AM Kai KH Huang wrote:
 Hi, all
 I have a two-node Ceph cluster, and both are monitor and osd. When
 they're both up, osd are all up and in, everything is fine... almost:



Two things.

1 -  You *really* need a min of three monitors. Ceph cannot form a quorum with 
just two monitors and you run a risk of split brain.


2 - You also probably have a min size of two set (the default). This means 
that you need a minimum  of two copies of each data object for writes to work. 
So with just two nodes, if one goes down you can't write to the other.


So:
- Install a extra monitor node - it doesn't have to be powerful, we just use a 
Intel Celeron NUC for that.

- reduce your minimum size to 1 (One).
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Hi:everyone Calamari can manage multiple ceph clusters ?

2015-03-30 Thread robert
‍___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cannot add OSD node into crushmap or all writes fail

2015-03-30 Thread Tyler Bishop
I have this ceph node that will correctly recover into my ceph pool and 
performance looks to be normal for the rbd clients. However after a few minutes 
once finishing recovery the rbd clients begin to fall over and cannot write 
data to the pool. 

I've been trying to figure this out for weeks! None of the logs contain 
anything relevant at all. 

If I disable the node in the crushmap the rbd clients immediately begin writing 
to the other nodes. 

Ideas? 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] One host failure bring down the whole cluster

2015-03-30 Thread Gregory Farnum
On Mon, Mar 30, 2015 at 8:02 PM, Lindsay Mathieson
lindsay.mathie...@gmail.com wrote:
 On Tue, 31 Mar 2015 02:42:27 AM Kai KH Huang wrote:
 Hi, all
 I have a two-node Ceph cluster, and both are monitor and osd. When
 they're both up, osd are all up and in, everything is fine... almost:



 Two things.

 1 -  You *really* need a min of three monitors. Ceph cannot form a quorum with
 just two monitors and you run a risk of split brain.

You can form quorums with an even number of monitors, and Ceph does so
— there's no risk of split brain.

The problem with 2 monitors is that a quorum is always 2 — which is
exactly what you're seeing right now. You can't run with only one
monitor up (assuming you have a non-zero number of them).

 2 - You also probably have a min size of two set (the default). This means
 that you need a minimum  of two copies of each data object for writes to work.
 So with just two nodes, if one goes down you can't write to the other.

Also this.



 So:
 - Install a extra monitor node - it doesn't have to be powerful, we just use a
 Intel Celeron NUC for that.

 - reduce your minimum size to 1 (One).

Yep.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Slow writes with 1MB files

2015-03-30 Thread Yan, Zheng
On Sun, Mar 29, 2015 at 1:12 AM, Barclay Jameson
almightybe...@gmail.com wrote:
 I redid my entire Ceph build going back to to CentOS 7 hoping to the
 get the same performance I did last time.
 The rados bench test was the best I have ever had with a time of 740
 MB wr and 1300 MB rd. This was even better than the first rados bench
 test that had performance equal to PanFS. I find that this does not
 translate to my CephFS. Even with the following tweaking it still at
 least twice as slow as PanFS and my first *Magical* build (that had
 absolutely no tweaking):

 OSD
  osd_op_treads 8
  /sys/block/sd*/queue/nr_requests 4096
  /sys/block/sd*/queue/read_ahead_kb 4096

 Client
  rsize=16777216
  readdir_max_bytes=16777216
  readdir_max_entries=16777216

 ~160 mins to copy 10 (1MB) files for CephFS vs ~50 mins for PanFS.
 Throughput on CephFS is about 10MB/s vs PanFS 30 MB/s.

 Strange thing is none of the resources are taxed.
 CPU, ram, network, disks, are not even close to being taxed on either
 the client,mon/mds, or the osd nodes.
 The PanFS client node was a 10Gb network the same as the CephFS client
 but you can see the huge difference in speed.

 As per Gregs questions before:
 There is only one client reading and writing (time cp Small1/*
 Small2/.) but three clients have cephfs mounted, although they aren't
 doing anything on the filesystem.

 I have done another test where I stream data info a file as fast as
 the processor can put it there.
 (for (i=0; i  11; i++){ fprintf (out_file, I is : %d\n,i);}
 ) and it is faster than the PanFS. CephFS 16GB in 105 seconds with the
 above tuning vs 130 seconds for PanFS. Without the tuning it takes 230
 seconds for CephFS although the first build did it in 130 seconds
 without any tuning.

 This leads me to believe the bottleneck is the mds. Does anybody have
 any thoughts on this?
 Are there any tuning parameters that I would need to speed up the mds?

could you enable mds debugging for a few seconds (ceph daemon mds.x
config set debug_mds 10; sleep 10; ceph daemon mds.x config set
debug_mds 0). and upload /var/log/ceph/mds.x.log to somewhere.

Regards
Yan, Zheng


 On Fri, Mar 27, 2015 at 4:50 PM, Gregory Farnum g...@gregs42.com wrote:
 On Fri, Mar 27, 2015 at 2:46 PM, Barclay Jameson
 almightybe...@gmail.com wrote:
 Yes it's the exact same hardware except for the MDS server (although I
 tried using the MDS on the old node).
 I have not tried moving the MON back to the old node.

 My default cache size is mds cache size = 1000
 The OSDs (3 of them) have 16 Disks with 4 SSD Journal Disks.
 I created 2048 for data and metadata:
 ceph osd pool create cephfs_data 2048 2048
 ceph osd pool create cephfs_metadata 2048 2048


 To your point on clients competing against each other... how would I check 
 that?

 Do you have multiple clients mounted? Are they both accessing files in
 the directory(ies) you're testing? Were they accessing the same
 pattern of files for the old cluster?

 If you happen to be running a hammer rc or something pretty new you
 can use the MDS admin socket to explore a bit what client sessions
 there are and what they have permissions on and check; otherwise
 you'll have to figure it out from the client side.
 -Greg


 Thanks for the input!


 On Fri, Mar 27, 2015 at 3:04 PM, Gregory Farnum g...@gregs42.com wrote:
 So this is exactly the same test you ran previously, but now it's on
 faster hardware and the test is slower?

 Do you have more data in the test cluster? One obvious possibility is
 that previously you were working entirely in the MDS' cache, but now
 you've got more dentries and so it's kicking data out to RADOS and
 then reading it back in.

 If you've got the memory (you appear to) you can pump up the mds
 cache size config option quite dramatically from it's default 10.

 Other things to check are that you've got an appropriately-sized
 metadata pool, that you've not got clients competing against each
 other inappropriately, etc.
 -Greg

 On Fri, Mar 27, 2015 at 9:47 AM, Barclay Jameson
 almightybe...@gmail.com wrote:
 Opps I should have said that I am not just writing the data but copying 
 it :

 time cp Small1/* Small2/*

 Thanks,

 BJ

 On Fri, Mar 27, 2015 at 11:40 AM, Barclay Jameson
 almightybe...@gmail.com wrote:
 I did a Ceph cluster install 2 weeks ago where I was getting great
 performance (~= PanFS) where I could write 100,000 1MB files in 61
 Mins (Took PanFS 59 Mins). I thought I could increase the performance
 by adding a better MDS server so I redid the entire build.

 Now it takes 4 times as long to write the same data as it did before.
 The only thing that changed was the MDS server. (I even tried moving
 the MDS back on the old slower node and the performance was the same.)

 The first install was on CentOS 7. I tried going down to CentOS 6.6
 and it's the same results.
 I use the same scripts to install the OSDs (which I created because I
 can never get ceph-deploy to behave correctly. 

Re: [ceph-users] Where is the systemd files?

2015-03-30 Thread Ken Dreyer
The systemd service unit files were imported into the tree, but they
have not been added into any upstream packaging yet. See the discussion
at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=769593 or git log
-- systemd. I don't think there are any upstream tickets in Redmine for
this yet.

Since Hammer is very close to being released, the service unit files
will not be available in the Hammer packages. The earliest we would ship
them would be the Infernalis release series.

I've recently added a _with_systemd conditional to the RPM spec
(ceph.spec.in) in master in order to support socket directory creation
using tmpfiles.d. That same _with_systemd logic could be extended to
ship the service unit files on the relevant RPM-based platforms and ship
SysV-init scripts on the older platforms (eg RHEL 6).

I'm not quite sure how we ought to handle that on Debian-based packages.
Is there a way to conditionalize the Debian packaging to use systemd on
some versions of the distro, and use upstart on other versions ?

- Ken

On 03/26/2015 11:13 PM, Robert LeBlanc wrote:
 I understand that Giant should have systemd service files, but I don't
 see them in the CentOS 7 packages.
 
 https://github.com/ceph/ceph/tree/giant/systemd
 
 [ulhglive-root@mon1 systemd]# rpm -qa | grep --color=always ceph
 ceph-common-0.93-0.el7.centos.x86_64
 python-cephfs-0.93-0.el7.centos.x86_64
 libcephfs1-0.93-0.el7.centos.x86_64
 ceph-0.93-0.el7.centos.x86_64
 ceph-deploy-1.5.22-0.noarch
 [ulhglive-root@mon1 systemd]# for i in $(rpm -qa | grep ceph); do rpm
 -ql $i | grep -i --color=always systemd; done
 [nothing returned]
 
 Thanks,
 Robert LeBlanc

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw authorization failed

2015-03-30 Thread Yehuda Sadeh-Weinraub


- Original Message -
 From: Neville neville.tay...@hotmail.co.uk
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Cc: ceph-users@lists.ceph.com
 Sent: Monday, March 30, 2015 6:49:29 AM
 Subject: Re: [ceph-users] Radosgw authorization failed
 
 
  Date: Wed, 25 Mar 2015 11:43:44 -0400
  From: yeh...@redhat.com
  To: neville.tay...@hotmail.co.uk
  CC: ceph-users@lists.ceph.com
  Subject: Re: [ceph-users] Radosgw authorization failed
  
  
  
  - Original Message -
   From: Neville neville.tay...@hotmail.co.uk
   To: ceph-users@lists.ceph.com
   Sent: Wednesday, March 25, 2015 8:16:39 AM
   Subject: [ceph-users] Radosgw authorization failed
   
   Hi all,
   
   I'm testing backup product which supports Amazon S3 as target for Archive
   storage and I'm trying to setup a Ceph cluster configured with the S3 API
   to
   use as an internal target for backup archives instead of AWS.
   
   I've followed the online guide for setting up Radosgw and created a
   default
   region and zone based on the AWS naming convention US-East-1. I'm not
   sure
   if this is relevant but since I was having issues I thought it might need
   to
   be the same.
   
   I've tested the radosgw using boto.s3 and it seems to work ok i.e. I can
   create a bucket, create a folder, list buckets etc. The problem is when
   the
   backup software tries to create an object I get an authorization failure.
   It's using the same user/access/secret as I'm using from boto.s3 and I'm
   sure the creds are right as it lets me create the initial connection, it
   just fails when trying to create an object (backup folder).
   
   Here's the extract from the radosgw log:
   
   -
   2015-03-25 15:07:26.449227 7f1050dc7700 2 req 5:0.000419:s3:GET
   /:list_bucket:init op
   2015-03-25 15:07:26.449232 7f1050dc7700 2 req 5:0.000424:s3:GET
   /:list_bucket:verifying op mask
   2015-03-25 15:07:26.449234 7f1050dc7700 20 required_mask= 1
   user.op_mask=7
   2015-03-25 15:07:26.449235 7f1050dc7700 2 req 5:0.000427:s3:GET
   /:list_bucket:verifying op permissions
   2015-03-25 15:07:26.449237 7f1050dc7700 5 Searching permissions for
   uid=test
   mask=49
   2015-03-25 15:07:26.449238 7f1050dc7700 5 Found permission: 15
   2015-03-25 15:07:26.449239 7f1050dc7700 5 Searching permissions for
   group=1
   mask=49
   2015-03-25 15:07:26.449240 7f1050dc7700 5 Found permission: 15
   2015-03-25 15:07:26.449241 7f1050dc7700 5 Searching permissions for
   group=2
   mask=49
   2015-03-25 15:07:26.449242 7f1050dc7700 5 Found permission: 15
   2015-03-25 15:07:26.449243 7f1050dc7700 5 Getting permissions id=test
   owner=test perm=1
   2015-03-25 15:07:26.449244 7f1050dc7700 10 uid=test requested perm
   (type)=1,
   policy perm=1, user_perm_mask=1, acl perm=1
   2015-03-25 15:07:26.449245 7f1050dc7700 2 req 5:0.000437:s3:GET
   /:list_bucket:verifying op params
   2015-03-25 15:07:26.449247 7f1050dc7700 2 req 5:0.000439:s3:GET
   /:list_bucket:executing
   2015-03-25 15:07:26.449252 7f1050dc7700 10 cls_bucket_list
   test1(@{i=.us-east.rgw.buckets.index}.us-east.rgw.buckets[us-east.280959.2])
   start num 1001
   2015-03-25 15:07:26.450828 7f1050dc7700 2 req 5:0.002020:s3:GET
   /:list_bucket:http status=200
   2015-03-25 15:07:26.450832 7f1050dc7700 1 == req done
   req=0x7f107000e2e0
   http_status=200 ==
   2015-03-25 15:07:26.516999 7f1069df9700 20 enqueued request
   req=0x7f107000f0e0
   2015-03-25 15:07:26.517006 7f1069df9700 20 RGWWQ:
   2015-03-25 15:07:26.517007 7f1069df9700 20 req: 0x7f107000f0e0
   2015-03-25 15:07:26.517010 7f1069df9700 10 allocated request
   req=0x7f107000f6b0
   2015-03-25 15:07:26.517021 7f1058dd7700 20 dequeued request
   req=0x7f107000f0e0
   2015-03-25 15:07:26.517023 7f1058dd7700 20 RGWWQ: empty
   2015-03-25 15:07:26.517081 7f1058dd7700 20 CONTENT_LENGTH=88
   2015-03-25 15:07:26.517084 7f1058dd7700 20
   CONTENT_TYPE=application/octet-stream
   2015-03-25 15:07:26.517085 7f1058dd7700 20 CONTEXT_DOCUMENT_ROOT=/var/www
   2015-03-25 15:07:26.517086 7f1058dd7700 20 CONTEXT_PREFIX=
   2015-03-25 15:07:26.517087 7f1058dd7700 20 DOCUMENT_ROOT=/var/www
   2015-03-25 15:07:26.517088 7f1058dd7700 20 FCGI_ROLE=RESPONDER
   2015-03-25 15:07:26.517089 7f1058dd7700 20 GATEWAY_INTERFACE=CGI/1.1
   2015-03-25 15:07:26.517090 7f1058dd7700 20 HTTP_AUTHORIZATION=AWS
   F79L68W19B3GCLOSE3F8:AcXqtvlBzBMpwdL+WuhDRoLT/Bs=
   2015-03-25 15:07:26.517091 7f1058dd7700 20 HTTP_CONNECTION=Keep-Alive
   2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_DATE=Wed, 25 Mar 2015
   15:07:26 GMT
   2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_EXPECT=100-continue
   2015-03-25 15:07:26.517093 7f1058dd7700 20
   HTTP_HOST=test1.devops-os-cog01.devops.local
   2015-03-25 15:07:26.517094 7f1058dd7700 20
   HTTP_USER_AGENT=aws-sdk-java/unknown-version Windows_Server_2008_R2/6.1