Re: [ceph-users] OSD size and performance

2016-01-04 Thread gjprabu
Hi Srinivas,



  Our cause OCFS2 is not directly interacting with SCSI. Here we have 
ceph Storage that is mounted to many client system using OCFS2. More ever ocfs2 
support SCSI. 

   
https://blogs.oracle.com/wim/entry/what_s_up_with_ocfs2

http://www.linux-mag.com/id/7809/



Regards

Prabu





 On Mon, 04 Jan 2016 12:46:48 +0530 Srinivasula Maram 
wrote  




I doubt rbd driver will not support SCSI Reservation to mount the same rbd 
across multiple clients with OCFS ?

 

Generally underlying devices(here  rbd) should have SCSI reservation support 
for cluster file system.

 

Thanks,

Srinivas

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Somnath Roy
 Sent: Monday, January 04, 2016 12:29 PM
 To: gjprabu
 Cc: ceph-users; Siva Sokkumuthu
 Subject: Re: [ceph-users] OSD size and performance


 

Hi Prabu,

Check the krbd version (and libceph) running in the kernel..You can try 
building the latest krbd source for the 7.1 kernel if this is an option for you.

As I mentioned in my earlier mail, please isolate problem the way I suggested 
if that seems reasonable to you.

 

Thanks & Regards

Somnath

 

From: gjprabu [mailto:gjpr...@zohocorp.com] 
 Sent: Sunday, January 03, 2016 10:53 PM
 To: gjprabu
 Cc: Somnath Roy; ceph-users; Siva Sokkumuthu
 Subject: Re: [ceph-users] OSD size and performance


 

Hi Somnath,


 


   Just check the below details and let us know do you need any 
other information.


 


Regards


Prabu



 


 On Sat, 02 Jan 2016 08:47:05 +0530 gjprabu 
wrote  



 


Hi Somnath,


   


   Please check the details and help me on this issue.


 


Regards


Prabu


 


 On Thu, 31 Dec 2015 12:50:36 +0530 gjprabu 
wrote  



 


 



 



___ 


ceph-users mailing list 


ceph-users@lists.ceph.com


http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



Hi Somnath,


 


 We are using RBD, please find linux and rbd versions. I agree this is 
related to client side issue. My though gone to backup because weekly once will 
take full backup not incremental at the time we found issue once but not sure.


 


Linux version 


CentOS Linux release 7.1.1503 (Core) 


Kernel : - 3.10.91


 


rbd --version


ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)


 


rbd showmapped


id pool image   snap device


1  rbd  downloads  -/dev/rbd1 


 


rbd ls


downloads


 


Client server RBD mounted using ocfs2 file system.


/dev/rbd1  ocfs2 9.6T  2.6T  7.0T  27% /data/downloads


 


Client level cluster configuration done with 5 clients and We are using below 
procedure in client node.


 


1) rbd map downloads --pool rbd --name client.admin -m 
192.168.112.192,192.168.112.193,192.168.112.194 -k 
/etc/ceph/ceph.client.admin.keyring


 


 


2)  Formatting rbd with ocfs2


mkfs.ocfs2 -b4K -C 4K -L label -T mail -N5 /dev/rbd/rbd/downloads


 


3) We have do ocfs2 client level configuration and start ocfs2 service.


 


4) mount /dev/rbd/rbd/downloads /data/downloads


 


 Please let me know do you need any other information.


 


Regards


Prabu


 


 



 


 


 On Thu, 31 Dec 2015 01:04:39 +0530 Somnath Roy 
wrote  



 


 



 



___ 


ceph-users mailing list 


ceph-users@lists.ceph.com


http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



Prabu,

 

I assume you are using krbd then..Could you please let us know the Linux 
version/flavor you are using ?

Krbd had some hang issues and supposed to be fixed with the latest versions 
available..Also, it could be due to OCFS2->krbd integration as well (?) 
..Handling data consistency is the responsibility of OCFS as krbd doesn’t 
guarantee that..So, I would suggest to do the following to root cause if your 
cluster is not into production.

 

1. Do a synthetic fio run  on krbd alone (or creating a filesystem on top) and 
see if you can reproduce the hang

 

2. Try building the latest krbd or upgrade your Linux version to get a newer 
krbd and see if it is still happening.

 

 

<< Also we are taking backup from client, we feel that could be the 
reason for this hang

 

I assume this is regular filesystem back up ? Why do you think this could be a 
problem ?

 

I think it is a client side issue , I doubt it could be because of large OSD 
size..

 

 

Thanks & Regards

Somnath

 

From: gjprabu [mailto:gjpr...@zohocorp.com] 
 Sent: Wednesday, December 30, 2015 4:29 AM
 To: gjprabu
 Cc: Somnath Roy; ceph-users; Siva Sokkumuthu
 Subject: Re: [ceph-users] OSD size and performance


 

Hi Somnath,



 


 Thanks for your reply. Current setup we are having client hang issue 
and its hang frequently and after reboot it is working, Client used to mount 
with OCFS2 file system for mul

Re: [ceph-users] OSD size and performance

2016-01-04 Thread Srinivasula Maram
My point is rbd device should support SCSI reservation, so that OCFS can take 
write lock while write on particular client to avoid corruption.

Thanks,
Srinivas

From: gjprabu [mailto:gjpr...@zohocorp.com]
Sent: Monday, January 04, 2016 1:40 PM
To: Srinivasula Maram
Cc: Somnath Roy; ceph-users; Siva Sokkumuthu
Subject: RE: [ceph-users] OSD size and performance

Hi Srinivas,

  Our cause OCFS2 is not directly interacting with SCSI. Here we have 
ceph Storage that is mounted to many client system using OCFS2. More ever ocfs2 
support SCSI.

https://blogs.oracle.com/wim/entry/what_s_up_with_ocfs2
http://www.linux-mag.com/id/7809/

Regards
Prabu


 On Mon, 04 Jan 2016 12:46:48 +0530 Srinivasula Maram 
wrote 


I doubt rbd driver will not support SCSI Reservation to mount the same rbd 
across multiple clients with OCFS ?



Generally underlying devices(here  rbd) should have SCSI reservation support 
for cluster file system.



Thanks,

Srinivas



From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.com]
 On Behalf Of Somnath Roy
Sent: Monday, January 04, 2016 12:29 PM
To: gjprabu
Cc: ceph-users; Siva Sokkumuthu
Subject: Re: [ceph-users] OSD size and performance



Hi Prabu,

Check the krbd version (and libceph) running in the kernel..You can try 
building the latest krbd source for the 7.1 kernel if this is an option for you.

As I mentioned in my earlier mail, please isolate problem the way I suggested 
if that seems reasonable to you.



Thanks & Regards

Somnath



From: gjprabu [mailto:gjpr...@zohocorp.com]
Sent: Sunday, January 03, 2016 10:53 PM
To: gjprabu
Cc: Somnath Roy; ceph-users; Siva Sokkumuthu
Subject: Re: [ceph-users] OSD size and performance



Hi Somnath,



   Just check the below details and let us know do you need any 
other information.



Regards

Prabu



 On Sat, 02 Jan 2016 08:47:05 +0530 gjprabu 
mailto:gjpr...@zohocorp.com>>wrote 



Hi Somnath,



   Please check the details and help me on this issue.



Regards

Prabu



 On Thu, 31 Dec 2015 12:50:36 +0530 gjprabu 
mailto:gjpr...@zohocorp.com>>wrote 







___

ceph-users mailing list

ceph-users@lists.ceph.com

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Hi Somnath,



 We are using RBD, please find linux and rbd versions. I agree this is 
related to client side issue. My though gone to backup because weekly once will 
take full backup not incremental at the time we found issue once but not sure.



Linux version

CentOS Linux release 7.1.1503 (Core)

Kernel : - 3.10.91



rbd --version

ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)



rbd showmapped

id pool image   snap device

1  rbd  downloads  -/dev/rbd1



rbd ls

downloads



Client server RBD mounted using ocfs2 file system.

/dev/rbd1  ocfs2 9.6T  2.6T  7.0T  27% /data/downloads



Client level cluster configuration done with 5 clients and We are using below 
procedure in client node.



1) rbd map downloads --pool rbd --name client.admin -m 
192.168.112.192,192.168.112.193,192.168.112.194 -k 
/etc/ceph/ceph.client.admin.keyring





2)  Formatting rbd with ocfs2

mkfs.ocfs2 -b4K -C 4K -L label -T mail -N5 /dev/rbd/rbd/downloads



3) We have do ocfs2 client level configuration and start ocfs2 service.



4) mount /dev/rbd/rbd/downloads /data/downloads



 Please let me know do you need any other information.



Regards

Prabu









 On Thu, 31 Dec 2015 01:04:39 +0530 Somnath Roy 
mailto:somnath@sandisk.com>>wrote 







___

ceph-users mailing list

ceph-users@lists.ceph.com

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Prabu,



I assume you are using krbd then..Could you please let us know the Linux 
version/flavor you are using ?

Krbd had some hang issues and supposed to be fixed with the latest versions 
available..Also, it could be due to OCFS2->krbd integration as well (?) 
..Handling data consistency is the responsibility of OCFS as krbd doesn’t 
guarantee that..So, I would suggest to do the following to root cause if your 
cluster is not into production.



1. Do a synthetic fio run  on krbd alone (or creating a filesystem on top) and 
see if you can reproduce the hang



2. Try building the latest krbd or upgrade your Linux version to get a newer 
krbd and see if it is still happening.





<< Also we are taking backup from client, we feel that could be the reason for 
this hang



I assume this is regular filesystem back up ? Why do you think this could be a 
problem ?



I think it is a client side issue , I doubt it could be because of large OSD 
size..





Thanks & Regards

Somnath



From: gjprabu [mailto:gjpr...@zohocorp.com]
Sent: Wednesday, December 30, 2

Re: [ceph-users] OSD size and performance

2016-01-04 Thread Ric Wheeler


I am not sure why you want to layer a clustered file system (OCFS2) on top of 
Ceph RBD. Seems like a huge overhead and a ton of complexity.


Better to use CephFS if you want Ceph at the bottom or to just use iSCSI luns 
under ocfs2.


Regards,

Ric


On 01/04/2016 10:28 AM, Srinivasula Maram wrote:


My point is rbd device should support SCSI reservation, so that OCFS can take 
write lock while write on particular client to avoid corruption.


Thanks,

Srinivas

*From:*gjprabu [mailto:gjpr...@zohocorp.com]
*Sent:* Monday, January 04, 2016 1:40 PM
*To:* Srinivasula Maram
*Cc:* Somnath Roy; ceph-users; Siva Sokkumuthu
*Subject:* RE: [ceph-users] OSD size and performance

Hi Srinivas,

  Our cause OCFS2 is not directly interacting with SCSI. Here we have ceph 
Storage that is mounted to many client system using OCFS2. More ever ocfs2 
support SCSI.


https://blogs.oracle.com/wim/entry/what_s_up_with_ocfs2

http://www.linux-mag.com/id/7809/

Regards

Prabu

 On Mon, 04 Jan 2016 12:46:48 +0530 *Srinivasula Maram 
*wrote 


I doubt rbd driver will not support SCSI Reservation to mount the same rbd
across multiple clients with OCFS ?

Generally underlying devices(here  rbd) should have SCSI reservation
support for cluster file system.

Thanks,

Srinivas

*From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com
] *On Behalf Of *Somnath Roy
*Sent:* Monday, January 04, 2016 12:29 PM
*To:* gjprabu
*Cc:* ceph-users; Siva Sokkumuthu
*Subject:* Re: [ceph-users] OSD size and performance

Hi Prabu,

Check the krbd version (and libceph) running in the kernel..You can try
building the latest krbd source for the 7.1 kernel if this is an option
for you.

As I mentioned in my earlier mail, please isolate problem the way I
suggested if that seems reasonable to you.

Thanks & Regards

Somnath

*From:*gjprabu [mailto:gjpr...@zohocorp.com ]
*Sent:* Sunday, January 03, 2016 10:53 PM
*To:* gjprabu
*Cc:* Somnath Roy; ceph-users; Siva Sokkumuthu
*Subject:* Re: [ceph-users] OSD size and performance

Hi Somnath,

 Just check the below details and let us know do you need any
other information.

Regards

Prabu

 On Sat, 02 Jan 2016 08:47:05 +0530 *gjprabu mailto:gjpr...@zohocorp.com>>*wrote 

Hi Somnath,

 Please check the details and help me on this issue.

Regards

Prabu

 On Thu, 31 Dec 2015 12:50:36 +0530 *gjprabu mailto:gjpr...@zohocorp.com>>*wrote 

___

ceph-users mailing list

ceph-users@lists.ceph.com 

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Hi Somnath,

 We are using RBD, please find linux and rbd versions. I agree
this is related to client side issue. My though gone to backup
because weekly once will take full backup not incremental at the
time we found issue once but not sure.

*Linux version *

*CentOS Linux release 7.1.1503 (Core) *

*Kernel : - 3.10.91*

*rbd --version*

*ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)*

*rbd showmapped*

*id pool image   snap device *

*1  rbd  downloads  -/dev/rbd1 *

*rbd ls*

*downloads*

*Client server RBD mounted using ocfs2 file system.*

*/dev/rbd1 ocfs2 9.6T  2.6T  7.0T  27% /data/downloads*

Client level cluster configuration done with 5 clients and We are
using below procedure in client node.

1) rbd map downloads --pool rbd --name client.admin -m
192.168.112.192,192.168.112.193,192.168.112.194 -k
/etc/ceph/ceph.client.admin.keyring

2)  Formatting rbd with ocfs2

mkfs.ocfs2 -b4K -C 4K -L label -T mail -N5 /dev/rbd/rbd/downloads

3) We have do ocfs2 client level configuration and start ocfs2
service.

4) mount /dev/rbd/rbd/downloads /data/downloads

 Please let me know do you need any other information.

Regards

Prabu

 On Thu, 31 Dec 2015 01:04:39 +0530 *Somnath Roy
mailto:somnath@sandisk.com>>*wrote 


___

ceph-users mailing list

ceph-users@lists.ceph.com 

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Prabu,

I assume you are using krbd then..Could you please let us know
the Linux version/flavor you are using ?

Krbd had some hang issues and supposed to be fixed wit

[ceph-users] How to run multiple RadosGW instances under the same zone

2016-01-04 Thread Joseph Yang

Hello,

How to run multiple RadosGW instances under the same zone?

Assume there are two hosts HOST_1 and HOST2. I want to run
two RadosGW instances on these two hosts for my zone ZONE_MULI.
So, when one of the radosgw instance is down, I can still access the zone.

There are some questions:
1. How many ceph users should I create?
2. How many rados users should I create?
3. How to set ZONE_MULI's access_key/secret_key?
4. How to set the 'host' section in the ceph conf file for these two 
   radosgw instances?
5. How to start the instances?
# radosgw --cluster My_Cluster -n ?_which_rados_user_?

I read http://docs.ceph.com/docs/master/radosgw/federated-config/, but
there seems no explanation.

Your answer is appreciated!

thx

Joseph




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bug 12200

2016-01-04 Thread HEWLETT, Paul (Paul)
Thanks...




On 23/12/2015, 21:33, "Gregory Farnum"  wrote:

>On Wed, Dec 23, 2015 at 5:20 AM, HEWLETT, Paul (Paul)
> wrote:
>> Seasons Greetings Cephers..
>>
>> Can I assume that http://tracker.ceph.com/issues/12200 is fixed in
>> Infernalis?
>>
>> Any chance that it can be back ported to Hammer ? (I don’t see it planned)
>>
>> We are hitting this bug more frequently than desired so would be keen to see
>> it fixed in Hammer
>
>David tells me the fix was fairly complicated, involved some encoding
>changes, and doesn't backport cleanly. So I guess it's not likely to
>happen.
>-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds complains about "wrong node", stuck in replay

2016-01-04 Thread John Spray
On Wed, Dec 30, 2015 at 5:06 PM, Bryan Wright  wrote:
> Hi folks,
>
> I have an mds cluster stuck in replay.  The mds log file is filled with
> errors like the following:
>
> 2015-12-30 12:00:25.912026 7f9f5b88b700  0 -- 192.168.1.31:6800/13093 >>
> 192.168.1.24:6823/31155 pipe(0x4ccc800 sd=18 :44201 s=1 pgs=0 cs=0 l=1
> c=0x4bb1e40).connect claims to be 192.168.1.24:6823/15059 not
> 192.168.1.24:6823/31155 - wrong node!
>
> Restarting all of the osds, mons, and mdss causes the error message
> to refer to a different osd.
>
> What's going on here?

What's the network between the MDS and the other daemons?  Messages
like that make me wonder if there is some NAT or other funky routing
going on.

John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to run multiple RadosGW instances under the same zone

2016-01-04 Thread Srinivasula Maram
Hi Joseph,

You can try haproxy as proxy for load balancing and failover.

Thanks,
Srinivas

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joseph 
Yang
Sent: Monday, January 04, 2016 2:09 PM
To: ceph-us...@ceph.com; Joseph Yang
Subject: [ceph-users] How to run multiple RadosGW instances under the same zone




Hello,



How to run multiple RadosGW instances under the same zone?



Assume there are two hosts HOST_1 and HOST2. I want to run

two RadosGW instances on these two hosts for my zone ZONE_MULI.

So, when one of the radosgw instance is down, I can still access the zone.



There are some questions:

1. How many ceph users should I create?

2. How many rados users should I create?

3. How to set ZONE_MULI's access_key/secret_key?

4. How to set the 'host' section in the ceph conf file for these two

   radosgw instances?

5. How to start the instances?

# radosgw --cluster My_Cluster -n ?_which_rados_user_?



I read http://docs.ceph.com/docs/master/radosgw/federated-config/, but

there seems no explanation.



Your answer is appreciated!



thx



Joseph


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] retrieve opstate issue on radosgw

2016-01-04 Thread Laurent Barbe

Hi,

I was doing some tests with opstate on radosgw, and I find that behavior 
is strange :
When I try to retrieve the status of a particular object by specifying 
client_id, object and op_id, the return value is an empty array.

The behavior is identical using radosgw-admin and the REST API.
Is it a desired behavior ?
My version is hammer (0.94.5), is that someone could make the test with 
another version ?



For example :

$ radosgw-admin opstate set --client_id client1 --object 'myobject' 
--op_id '1000' --state 'error'



# When I try to retreive by object name, the value is returned :
$ radosgw-admin opstate list --object=myobject
[
{
"client_id": "client1",
"op_id": "1000",
"object": "myobject",
"timestamp": "2016-01-04 08:36:04.785597Z",
"state": "error"
}

]


# When I try to retreive by client_id, object and op_id, the array is 
empty :
$ radosgw-admin opstate list --client_id client1 --object 'myobject' 
--op_id '1000'

[
]



# Log trace for the first command : (radosgw-admin opstate list 
--object=myobject)
2016-01-04 09:40:34.281624 7f00d7e76800  1 -- 172.16.4.29:0/3173774305 
--> 172.16.4.68:6802/530 -- osd_op(client.23890090.0:23 
statelog.obj_opstate.126 [call statelog.list] 26.21370676 
ack+read+known_if_redirected e13127) v5 -- ?+0 0x45bb2c0 con 0x45b4ba0
2016-01-04 09:40:34.284476 7f00b9ae8700  1 -- 172.16.4.29:0/3173774305 
<== osd.36 172.16.4.68:6802/530 1  osd_op_reply(23 
statelog.obj_opstate.126 [call] v0'0 uv5613451 ondisk = 0) v6  
191+0+68 (3477643793 0 3202723411) 0x7f005cd0 con 0x45b4ba0

[
{
"client_id": "client1",
"op_id": "1000",
"object": "myobject",
"timestamp": "2016-01-04 08:36:04.785597Z",
"state": "error"
}

]

# Log trace for the second command : (radosgw-admin opstate list 
--client_id client1 --object 'myobject' --op_id '1000')
2016-01-04 09:40:48.071165 7f54153dc800  1 -- 172.16.4.29:0/2301029731 
--> 172.16.4.68:6802/530 -- osd_op(client.23978876.0:23 
statelog.obj_opstate.126 [call statelog.list] 26.21370676 
ack+read+known_if_redirected e13127) v5 -- ?+0 0x3aff330 con 0x3af8c40
2016-01-04 09:40:48.074318 7f53d6fef700  1 -- 172.16.4.29:0/2301029731 
<== osd.36 172.16.4.68:6802/530 1  osd_op_reply(23 
statelog.obj_opstate.126 [call] v0'0 uv5613451 ondisk = 0) v6  
191+0+15 (1417304820 0 2149983739) 0x7f5394000ca0 con 0x3af8c40

[
]


The behavior is same with the REST API :

/admin/opstate?client-id=client1&object=myobject&op-id=1000
[{"client_id":"client1","op_id":"1000","object":"myobject","timestamp":"2016-01-04 
08:36:04.785597Z","state":"error"}]


/admin/opstate?object=myobject
[]



$ rados -p .main.log listomapvals statelog.obj_opstate.126
1_client1_1000
value: (53 bytes) :
 : 01 01 2f 00 00 00 07 00 00 00 63 6c 69 65 6e 74 : ../...client
0010 : 31 04 00 00 00 31 30 30 30 08 00 00 00 6d 79 6f : 11000myo
0020 : 62 6a 65 63 74 f4 2e 8a 56 2f 44 d3 2e 00 00 00 : bject...V/D.
0030 : 00 03 00 00 00  : .

2_8_myobject_1000
value: (53 bytes) :
 : 01 01 2f 00 00 00 07 00 00 00 63 6c 69 65 6e 74 : ../...client
0010 : 31 04 00 00 00 31 30 30 30 08 00 00 00 6d 79 6f : 11000myo
0020 : 62 6a 65 63 74 f4 2e 8a 56 2f 44 d3 2e 00 00 00 : bject...V/D.
0030 : 00 03 00 00 00  : .


Thanks,
Laurent Barbe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD size and performance

2016-01-04 Thread gjprabu


Hi  Srinivas,

​

I am not sure RBD support SCSI but OCFS2 having that capability to lock and 
unlock while write. 



 (kworker/u192:5,71152,28):dlm_unlock_lock_handler:424 lvb: none

 (kworker/u192:5,71152,28):__dlm_lookup_lockres:232 
O00946c510c


 (kworker/u192:5,71152,28):__dlm_lookup_lockres_full:198 
O00946c510c

(kworker/u192:5,71152,28):dlmunlock_common:111 master_node = 1, valblk = 0

(kworker/u192:5,71152,28):dlmunlock_common:251 lock 4:7162177 should be gone 
now! refs=1

(kworker/u192:5,71152,28):__dlm_dirty_lockres:483 
A895BC216BE641A8A7E20AA89D57E051: res O00946c510c

 (kworker/u192:5,71152,28):dlm_lock_detach_lockres:393 removing lock's lockres 
reference

 (kworker/u192:5,71152,28):dlm_lock_release:371 freeing kernel-allocated lksb

(kworker/u192:5,71152,28):__dlm_lookup_lockres_full:198 
O00946c4fd2

(kworker/u192:5,71152,28):dlm_lockres_clear_refmap_bit:651 res 
O00946c4fd2, clr node 4, dlm_deref_lockres_handler()

 

Regards

Prabu




 On Mon, 04 Jan 2016 13:58:21 +0530 Srinivasula Maram 
wrote  




My point is rbd device should support SCSI reservation, so that OCFS can take 
write lock while write on particular client to avoid corruption.

 

Thanks,

Srinivas

 

From: gjprabu [mailto:gjpr...@zohocorp.com] 
 Sent: Monday, January 04, 2016 1:40 PM
 To: Srinivasula Maram
 Cc: Somnath Roy; ceph-users; Siva Sokkumuthu
 Subject: RE: [ceph-users] OSD size and performance


 

Hi Srinivas,


 


  Our cause OCFS2 is not directly interacting with SCSI. Here we have 
ceph Storage that is mounted to many client system using OCFS2. More ever ocfs2 
support SCSI. 


   


https://blogs.oracle.com/wim/entry/what_s_up_with_ocfs2


http://www.linux-mag.com/id/7809/


 


Regards


Prabu


 



 


 On Mon, 04 Jan 2016 12:46:48 +0530 Srinivasula Maram 
wrote  



 


I doubt rbd driver will not support SCSI Reservation to mount the same rbd 
across multiple clients with OCFS ?

 

Generally underlying devices(here  rbd) should have SCSI reservation support 
for cluster file system.

 

Thanks,

Srinivas

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Somnath Roy
 Sent: Monday, January 04, 2016 12:29 PM
 To: gjprabu
 Cc: ceph-users; Siva Sokkumuthu
 Subject: Re: [ceph-users] OSD size and performance


 

Hi Prabu,

Check the krbd version (and libceph) running in the kernel..You can try 
building the latest krbd source for the 7.1 kernel if this is an option for you.

As I mentioned in my earlier mail, please isolate problem the way I suggested 
if that seems reasonable to you.

 

Thanks & Regards

Somnath

 

From: gjprabu [mailto:gjpr...@zohocorp.com] 
 Sent: Sunday, January 03, 2016 10:53 PM
 To: gjprabu
 Cc: Somnath Roy; ceph-users; Siva Sokkumuthu
 Subject: Re: [ceph-users] OSD size and performance


 

Hi Somnath,


 


   Just check the below details and let us know do you need any 
other information.


 


Regards


Prabu



 


 On Sat, 02 Jan 2016 08:47:05 +0530 gjprabu 
wrote  



 


Hi Somnath,


   


   Please check the details and help me on this issue.


 


Regards


Prabu


 


 On Thu, 31 Dec 2015 12:50:36 +0530 gjprabu 
wrote  



 


 



 



___ 


ceph-users mailing list  


ceph-users@lists.ceph.com


http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



Hi Somnath,


 


 We are using RBD, please find linux and rbd versions. I agree this is 
related to client side issue. My though gone to backup because weekly once will 
take full backup not incremental at the time we found issue once but not sure.


 


Linux version 


CentOS Linux release 7.1.1503 (Core) 


Kernel : - 3.10.91


 


rbd --version


ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)


 


rbd showmapped


id pool image   snap device


1  rbd  downloads  -/dev/rbd1 


 


rbd ls


downloads


 


Client server RBD mounted using ocfs2 file system.


/dev/rbd1  ocfs2 9.6T  2.6T  7.0T  27% /data/downloads


 


Client level cluster configuration done with 5 clients and We are using below 
procedure in client node.


 


1) rbd map downloads --pool rbd --name client.admin -m 
192.168.112.192,192.168.112.193,192.168.112.194 -k 
/etc/ceph/ceph.client.admin.keyring


 


 


2)  Formatting rbd with ocfs2


mkfs.ocfs2 -b4K -C 4K -L label -T mail -N5 /dev/rbd/rbd/downloads


 


3) We have do ocfs2 client level configuration and start ocfs2 service.


 


4) mount /dev/rbd/rbd/downloads /data/downloads


 


 Please let me know do you need any other information.


 


Regards


Prabu


 


 



 


 


 On Thu, 31 Dec 2015 01:04:39 +0530 Somnath Roy 
wrote  



 


 



 



__

Re: [ceph-users] How to run multiple RadosGW instances under the same zone

2016-01-04 Thread Daniel Schneller

On 2016-01-04 10:37:43 +, Srinivasula Maram said:


Hi Joseph,
 
You can try haproxy as proxy for load balancing and failover.
 
Thanks,
Srinivas 


We have 6 hosts running RadosGW with haproxy in front of them without problems.
Depending on your setup you might even consider running haproxy locally 
on your application servers, so that your application always connects 
to localhost. This saves you from having to set up highly available 
load balancers. It's strongly recommended, of course, to use some kind 
of automatic provisioning (Ansible, Puppet etc.) to roll out identical 
haproxy configuration on all these machines. 





--
Daniel Schneller
Principal Cloud Engineer

CenterDevice GmbH
https://www.centerdevice.de___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] letting and Infernalis

2016-01-04 Thread HEWLETT, Paul (Paul)
Hi Cephers and Happy New Year

I am under the impression that ceph was refactored to allow dynamic enabling of 
lttng in Infernalis.

Is there any documentation on how to enable lttng  in Infernalis? (I cannot 
find anything…)

Regards
Paul
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cephfs: large files hang

2016-01-04 Thread Gregory Farnum
On Fri, Jan 1, 2016 at 9:14 AM, Bryan Wright  wrote:
> Gregory Farnum  writes:
>
>> Or maybe it's 0.9a, or maybe I just don't remember at all. I'm sure
>> somebody recalls...
>>
>
> I'm still struggling with this.  When copying some files from the ceph file
> system, it hangs forever.  Here's some more data:
>
>
> * Attempt to copy file.  ceph --watch-warn shows:
>
> 2016-01-01 11:16:12.637932 osd.405 [WRN] slow request 480.160153 seconds
> old, received at 2016-01-01 11:08:12.477509: osd_op(client.46686461.1:11
> 1006479.0004 [read 2097152~2097152 [1@-1]] 0.ca710b7 read e367378)
> currently waiting for replay end
>
> * Look for client's entry in "ceph daemon mds.0 session ls".  Here it is:
>
> {
> "id": 46686461,
> "num_leases": 0,
> "num_caps": 10332,
> "state": "open",
> "replay_requests": 0,
> "reconnecting": false,
> "inst": "client.46686461 192.168.1.180:0\/2512587758",
> "client_metadata": {
> "entity_id": "",
> "hostname": "node80.galileo",
> "kernel_version": "4.3.3-1.el6.elrepo.i686"
> }
> },
>
> * Look for messages in /var/log/ceph/ceph.log referring to this client:
>
> 2016-01-01 11:16:12.637917 osd.405 192.168.1.23:6823/30938 142 : cluster
> [WRN] slow request 480.184693 seconds old, received at 2016-01-01
> 11:08:12.452970: osd_op(client.46686461.1:10 1006479.0004 [read
> 0~2097152 [1@-1]] 0.ca710b7 read e367378) currently waiting for replay end
> 2016-01-01 11:16:12.637932 osd.405 192.168.1.23:6823/30938 143 : cluster
> [WRN] slow request 480.160153 seconds old, received at 2016-01-01
> 11:08:12.477509: osd_op(client.46686461.1:11 1006479.0004 [read
> 2097152~2097152 [1@-1]] 0.ca710b7 read e367378) currently waiting for replay
> end
> 2016-01-01 11:23:11.298786 mds.0 192.168.1.31:6800/19945 64 : cluster [WRN]
> slow request 7683.077077 seconds old, received at 2016-01-01
> 09:15:08.221671: client_request(client.46686461:758 readdir #101913d
> 2016-01-01 09:15:08.222194) currently acquired locks
> 2016-01-01 11:24:12.728794 osd.405 192.168.1.23:6823/30938 145 : cluster
> [WRN] slow request 960.275521 seconds old, received at 2016-01-01
> 11:08:12.452970: osd_op(client.46686461.1:10 1006479.0004 [read
> 0~2097152 [1@-1]] 0.ca710b7 read e367378) currently waiting for replay end
> 2016-01-01 11:24:12.728814 osd.405 192.168.1.23:6823/30938 146 : cluster
> [WRN] slow request 960.250982 seconds old, received at 2016-01-01
> 11:08:12.477509: osd_op(client.46686461.1:11 1006479.0004 [read
> 2097152~2097152 [1@-1]] 0.ca710b7 read e367378) currently waiting for replay
> end
>
>
> * Seems to refer to "0.ca710b7", which I'm guessing is either pg 0.ca,
> 0.ca7, 0.7b, 0.7b0, 0.b7 or 0.0b7.  Look for these in "ceph health detail":
>
> ceph health detail | egrep '0\.ca|0\.7b|0\.b7|0\.0b'
> pg 0.7b2 is stuck inactive since forever, current state incomplete, last
> acting [307,206]
> pg 0.7b2 is stuck unclean since forever, current state incomplete, last
> acting [307,206]
> pg 0.7b2 is incomplete, acting [307,206]
>
> OK, so no "7b" or "7b0", but is "7b2" close enough?
>
> * Take a look at osd 307 and 206.  These are both online and show no errors
> in their logs.  Why then the "stuck"?
>
> * Look at filesystem on other OSDs for "7b".  Find this:
>
> osd 102 (defunct, offline OSD disk, appears as "DNE" in "ceph osd tree"):
> drwxr-xr-x  3 root root  4096 Dec 13 12:58 0.7b_head
> drwxr-xr-x  2 root root 6 Dec 13 12:43 0.7b_TEMP
>
> osd 103:
> drwxr-xr-x  3 root root  4096 Dec 18 12:04 0.7b0_head
>
> osd 110:
> drwxr-xr-x  3 root root  4096 Dec 20 09:06 0.7b_head
>
> osd 402:
> drwxr-xr-x  3 root root  4096 Jul  1  2014 0.7b_head
>
> All of these OSDs except 102 are up and heathy.
>
>
> Where do I go from here?

What's the output of ceph pg query on that PG — do the OSDs agree with
the monitor log that it's incomplete? They should have info about why,
if os (eg, known missing log).
Based on the slow client request message, it's stuck on a PG which is
still in the replay period. See http://tracker.ceph.com/issues/13116,
which was fixed for infernalis and backported to hammer but I think
not released yet. If you reboot one of the OSDs in the PG it should
recover (this is often a good band-aid when something is busted in
peering/recovery).
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is this pg incomplete?

2016-01-04 Thread Gregory Farnum
On Fri, Jan 1, 2016 at 12:15 PM, Bryan Wright  wrote:
> Hi folks,
>
> "ceph pg dump_stuck inactive" shows:
>
> 0.e8incomplete  [406,504]   406 [406,504]   406
>
> Each of the osds above is alive and well, and idle.
>
> The output of "ceph pg 0.e8 query" is shown below.  All of the osds it refers
> to are alive and well, with the exception of osd 102 which died and has been
> removed from the cluster.
>
> Can anyone look at this and tell me why this pg is incomplete?
>
> Bryan
>
> "ceph pg query" output is here, because it's so large:
>
> http://ayesha.phys.virginia.edu/~bryan/errant-pg.txt

I can't parse all of that output, but the most important and
easiest-to-understand bit is:
"blocked_by": [
102
],

And indeed in the past_intervals section there are a bunch where it's
just 102. You really want min_size >=2 for exactly this reason. :/ But
if you get 102 up stuff should recover; if you can't you can mark it
as "lost" and RADOS ought to resume processing, with potential
data/metadata loss...
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is this pg incomplete?

2016-01-04 Thread Bryan Wright
Gregory Farnum  writes:

> I can't parse all of that output, but the most important and
> easiest-to-understand bit is:
> "blocked_by": [
> 102
> ],
> 
> And indeed in the past_intervals section there are a bunch where it's
> just 102. You really want min_size >=2 for exactly this reason. :/ But
> if you get 102 up stuff should recover; if you can't you can mark it
> as "lost" and RADOS ought to resume processing, with potential
> data/metadata loss...
> -Greg
> 


Ack!  I thought min_size was 2, but I see:

ceph osd pool get data min_size
min_size: 1

Well that's a fine kettle of fish.

The osd in question (102) has actually already been marked as lost, via
"ceph osd lost 102 --yes-i-really-mean-it", and it shows up in "ceph osd
tree" as "DNE".  If I can manage to read the disk, how should I try to add
it back in?



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] letting and Infernalis

2016-01-04 Thread Jason Dillaman
LTTng tracing is now enabled via the following config file options:

  osd tracing = false # enable OSD tracing
  osd objectstore tracing = false # enable OSD object store tracing (only 
supported by FileStore)
  rados tracing = false   # enable librados LTTng tracing
  rbd tracing = false # enable librbd LTTng tracing

You can dynamically enable LTTng on a running process via the admin socket as 
well.  I created a tracker ticket for updating the documentation [1].  

[1] http://tracker.ceph.com/issues/14219

-- 

Jason Dillaman 


- Original Message - 

> From: "Paul HEWLETT (Paul)" 
> To: ceph-users@lists.ceph.com
> Sent: Monday, January 4, 2016 10:03:07 AM
> Subject: [ceph-users] letting and Infernalis

> Hi Cephers and Happy New Year

> I am under the impression that ceph was refactored to allow dynamic enabling
> of lttng in Infernalis.

> Is there any documentation on how to enable lttng in Infernalis? (I cannot
> find anything…)

> Regards
> Paul

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is this pg incomplete?

2016-01-04 Thread Michael Kidd
Bryan,
  If you can read the disk that was osd.102, you may wish to attempt this
process to recover your data:
https://ceph.com/community/incomplete-pgs-oh-my/

Good luck!

Michael J. Kidd
Sr. Software Maintenance Engineer
Red Hat Ceph Storage

On Mon, Jan 4, 2016 at 8:32 AM, Bryan Wright  wrote:

> Gregory Farnum  writes:
>
> > I can't parse all of that output, but the most important and
> > easiest-to-understand bit is:
> > "blocked_by": [
> > 102
> > ],
> >
> > And indeed in the past_intervals section there are a bunch where it's
> > just 102. You really want min_size >=2 for exactly this reason. :/ But
> > if you get 102 up stuff should recover; if you can't you can mark it
> > as "lost" and RADOS ought to resume processing, with potential
> > data/metadata loss...
> > -Greg
> >
>
>
> Ack!  I thought min_size was 2, but I see:
>
> ceph osd pool get data min_size
> min_size: 1
>
> Well that's a fine kettle of fish.
>
> The osd in question (102) has actually already been marked as lost, via
> "ceph osd lost 102 --yes-i-really-mean-it", and it shows up in "ceph osd
> tree" as "DNE".  If I can manage to read the disk, how should I try to add
> it back in?
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to do quiesced rbd snapshot in libvirt?

2016-01-04 Thread Мистер Сёма
Hello,

Can anyone please tell me what is the right way to do quiesced RBD
snapshots in libvirt (OpenStack)?
My Ceph version is 0.94.3.

I found two possible ways, none of them is working for me. Wonder if
I'm doing something wrong:
1) Do VM fsFreeze through QEMU guest agent, perform RBD snapshot, do
fsThaw. Looks good but the bad thing here is that libvirt uses
exclusive lock on image, which results in errors like that when taking
snapshot: " 7f359d304880 -1 librbd::ImageWatcher: no lock owners
detected". It seems like rbd client is trying to take snapshot on
behalf of exclusive lock owner but is unable to find this owner.
Without an exclusive lock everything is working nice.

2)  Performing QEMU external snapshots with local QCOW2 file being
overlayed on top of RBD image. This seems really interesting but the
bad thing is that there is no way currently to remove this kind of
snapshot because active blockcommit is not currently working for RBD
images (https://bugzilla.redhat.com/show_bug.cgi?id=1189998).

So again my question is: how do you guys take quiesced RBD snapshots in libvirt?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is this pg incomplete?

2016-01-04 Thread Bryan Wright
Michael Kidd  writes:

>   If you can read the disk that was osd.102, you may wish to attempt this
process to recover your data:https://ceph.com/community/incomplete-pgs-oh-my/
> Good luck!

Hi Michael,

Thanks for the pointer.  After looking at it, I'm wondering if the necessity
to copy the pgs to a new osd could be avoided it I can get the original disk
running again temporarily.  Is there a way to re-add an osd after it's been
removed?

Bryan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to do quiesced rbd snapshot in libvirt?

2016-01-04 Thread Jason Dillaman
I am surprised by the error you are seeing with exclusive lock enabled.  The 
rbd CLI should be able to send the 'snap create' request to QEMU without an 
error.  Are you able to provide "debug rbd = 20" logs from shortly before and 
after your snapshot attempt?

-- 

Jason Dillaman 


- Original Message -
> From: "Мистер Сёма" 
> To: "ceph-users" 
> Sent: Monday, January 4, 2016 12:37:07 PM
> Subject: [ceph-users] How to do quiesced rbd snapshot in libvirt?
> 
> Hello,
> 
> Can anyone please tell me what is the right way to do quiesced RBD
> snapshots in libvirt (OpenStack)?
> My Ceph version is 0.94.3.
> 
> I found two possible ways, none of them is working for me. Wonder if
> I'm doing something wrong:
> 1) Do VM fsFreeze through QEMU guest agent, perform RBD snapshot, do
> fsThaw. Looks good but the bad thing here is that libvirt uses
> exclusive lock on image, which results in errors like that when taking
> snapshot: " 7f359d304880 -1 librbd::ImageWatcher: no lock owners
> detected". It seems like rbd client is trying to take snapshot on
> behalf of exclusive lock owner but is unable to find this owner.
> Without an exclusive lock everything is working nice.
> 
> 2)  Performing QEMU external snapshots with local QCOW2 file being
> overlayed on top of RBD image. This seems really interesting but the
> bad thing is that there is no way currently to remove this kind of
> snapshot because active blockcommit is not currently working for RBD
> images (https://bugzilla.redhat.com/show_bug.cgi?id=1189998).
> 
> So again my question is: how do you guys take quiesced RBD snapshots in
> libvirt?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd bench-write vs dd performance confusion

2016-01-04 Thread Snyder, Emile
Hi all,

I'm trying to get comfortable with managing and benchmarking ceph clusters, and 
I'm struggling to understan rbd bench-write results vs using dd against mounted 
rbd images.

I have a 6 node test cluster running version 0.94.5, 2 nodes per rack, 20 OSDs 
per node. Write journals are on the same disk as their OSD. My rbd pool is set 
for 3 replicas, with 2 on different hosts in a given rack, and 3rd on some host 
in a different rack.


I created a test 100GB image with 4MB object size, created a VM client, and 
mounted the image at /dev/rbd1.

In a shell on one of my 6 storage nodes I have 'iostat 2' running.

Now my confusion; If I run on the client:

'sudo dd if=/dev/zero of=/dev/rbd1 bs=4M count=1000 iflag=fullblock 
oflag=direct'

I see '4194304000 bytes (4.2 GB) copied, 18.5798 s, 226 MB/s' and the iostat on 
the storage node shows almost all 20 disks sustaining 4-16MB/s writes.

However, if I run

'rbd --cluster  bench-write test-4m-image --io-size 400 
--io-threads 1 --io-total 400 --io-pattern rand'

I see 'elapsed:12  ops:1  ops/sec:   805.86  bytes/sec: 
3223441447.72' but the iostat shows the disks basically all at 0.00kb_wrtn/s 
for the duration of the run.

So that's bench-write reporting 3.2 GB/s with iostat showing *nothing* 
happening, while dd writes 226 MB/s and iostat lights up. Am I misunderstanding 
what rbd-bench is supposed to do?

Thanks,
-Emile
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Infernalis upgrade breaks when journal on separate partition

2016-01-04 Thread Stuart Longland
Hi all,

I just did an update of a storage cluster here, or rather, I've done one
node out of three updating to Infernalis from Hammer.

I shut down the daemons, as per the guide, then did a recursive chown of
the /var/lib/ceph directory, then struck the following when re-starting:

> 2016-01-05 07:32:09.114197 7f5b41d0f940  0 ceph version 9.2.0 
> (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 2899
> 2016-01-05 07:32:09.123740 7f5b41d0f940  0 
> filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
> 2016-01-05 07:32:09.124047 7f5b41d0f940  0 
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP 
> ioctl is disabl
> ed via 'filestore fiemap' config option
> 2016-01-05 07:32:09.124053 7f5b41d0f940  0 
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: 
> SEEK_DATA/SEEK_HOLE is
>  disabled via 'filestore seek data hole' config option
> 2016-01-05 07:32:09.124066 7f5b41d0f940  0 
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice is 
> supported
> 2016-01-05 07:32:09.156182 7f5b41d0f940  0 
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) 
> syscall full
> y supported (by glibc and kernel)
> 2016-01-05 07:32:09.156301 7f5b41d0f940  0 
> xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: extsize is 
> supported and y
> our kernel >= 3.5
> 2016-01-05 07:32:09.232801 7f5b41d0f940  0 
> filestore(/var/lib/ceph/osd/ceph-0) mount: enabling WRITEAHEAD journal mode: 
> checkpoint i
> s not enabled
> 2016-01-05 07:32:09.253440 7f5b41d0f940 -1 
> filestore(/var/lib/ceph/osd/ceph-0) mount failed to open journal /dev/sdc5: 
> (13) Permissi
> on denied
> 2016-01-05 07:32:09.263646 7f5b41d0f940 -1 osd.0 0 OSD:init: unable to mount 
> object store
> 2016-01-05 07:32:09.263656 7f5b41d0f940 -1 ESC[0;31m ** ERROR: osd init 
> failed: (13) Permission deniedESC[0m

Things did not co-operate until I chown'ed /dev/sdc5 (and /dev/sdc6) to
ceph:ceph.  (-R in /var/lib/ceph was not sufficient).  Even adding ceph
to the 'disk' group (who owns /dev/sdc5) oddly enough, was not sufficient.

I have that node running, and will do the others, but I am concerned
about what happens after a reboot.  Is it necessary to configure udev to
chown /dev/sdc[56] at boot or is there some way to fix ceph's permissions?
-- 
 _ ___ Stuart Longland - Systems Engineer
\  /|_) |   T: +61 7 3535 9619
 \/ | \ | 38b Douglas StreetF: +61 7 3535 9699
   SYSTEMSMilton QLD 4064   http://www.vrt.com.au
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Infernalis upgrade breaks when journal on separate partition

2016-01-04 Thread Stillwell, Bryan
I ran into this same issue, and found that a reboot ended up setting the
ownership correctly.  If you look at /lib/udev/rules.d/95-ceph-osd.rules
you'll see the magic that makes it happen:

# JOURNAL_UUID
ACTION=="add", SUBSYSTEM=="block", \
  ENV{DEVTYPE}=="partition", \
  ENV{ID_PART_ENTRY_TYPE}=="45b0969e-9b03-4f30-b4c6-b4b80ceff106", \
  OWNER:="ceph", GROUP:="ceph", MODE:="660", \
  RUN+="/usr/sbin/ceph-disk --log-stdout -v trigger /dev/$name"
ACTION=="change", SUBSYSTEM=="block", \
  ENV{ID_PART_ENTRY_TYPE}=="45b0969e-9b03-4f30-b4c6-b4b80ceff106", \
  OWNER="ceph", GROUP="ceph", MODE="660"



Bryan

On 1/4/16, 2:39 PM, "ceph-users on behalf of Stuart Longland"
 wrote:

>Hi all,
>
>I just did an update of a storage cluster here, or rather, I've done one
>node out of three updating to Infernalis from Hammer.
>
>I shut down the daemons, as per the guide, then did a recursive chown of
>the /var/lib/ceph directory, then struck the following when re-starting:
>
>> 2016-01-05 07:32:09.114197 7f5b41d0f940  0 ceph version 9.2.0
>>(bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 2899
>> 2016-01-05 07:32:09.123740 7f5b41d0f940  0
>>filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
>> 2016-01-05 07:32:09.124047 7f5b41d0f940  0
>>genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features:
>>FIEMAP ioctl is disabl
>> ed via 'filestore fiemap' config option
>> 2016-01-05 07:32:09.124053 7f5b41d0f940  0
>>genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features:
>>SEEK_DATA/SEEK_HOLE is
>>  disabled via 'filestore seek data hole' config option
>> 2016-01-05 07:32:09.124066 7f5b41d0f940  0
>>genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features:
>>splice is supported
>> 2016-01-05 07:32:09.156182 7f5b41d0f940  0
>>genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features:
>>syncfs(2) syscall full
>> y supported (by glibc and kernel)
>> 2016-01-05 07:32:09.156301 7f5b41d0f940  0
>>xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: extsize
>>is supported and y
>> our kernel >= 3.5
>> 2016-01-05 07:32:09.232801 7f5b41d0f940  0
>>filestore(/var/lib/ceph/osd/ceph-0) mount: enabling WRITEAHEAD journal
>>mode: checkpoint i
>> s not enabled
>> 2016-01-05 07:32:09.253440 7f5b41d0f940 -1
>>filestore(/var/lib/ceph/osd/ceph-0) mount failed to open journal
>>/dev/sdc5: (13) Permissi
>> on denied
>> 2016-01-05 07:32:09.263646 7f5b41d0f940 -1 osd.0 0 OSD:init: unable to
>>mount object store
>> 2016-01-05 07:32:09.263656 7f5b41d0f940 -1 ESC[0;31m ** ERROR: osd init
>>failed: (13) Permission deniedESC[0m
>
>Things did not co-operate until I chown'ed /dev/sdc5 (and /dev/sdc6) to
>ceph:ceph.  (-R in /var/lib/ceph was not sufficient).  Even adding ceph
>to the 'disk' group (who owns /dev/sdc5) oddly enough, was not sufficient.
>
>I have that node running, and will do the others, but I am concerned
>about what happens after a reboot.  Is it necessary to configure udev to
>chown /dev/sdc[56] at boot or is there some way to fix ceph's permissions?
>--
> _ ___ Stuart Longland - Systems Engineer
>\  /|_) |   T: +61 7 3535 9619
> \/ | \ | 38b Douglas StreetF: +61 7 3535 9699
>   SYSTEMSMilton QLD 4064   http://www.vrt.com.au
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




This E-mail and any of its attachments may contain Time Warner Cable 
proprietary information, which is privileged, confidential, or subject to 
copyright belonging to Time Warner Cable. This E-mail is intended solely for 
the use of the individual or entity to which it is addressed. If you are not 
the intended recipient of this E-mail, you are hereby notified that any 
dissemination, distribution, copying, or action taken in relation to the 
contents of and attachments to this E-mail is strictly prohibited and may be 
unlawful. If you have received this E-mail in error, please notify the sender 
immediately and permanently delete the original and any copy of this E-mail and 
any printout.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Infernalis upgrade breaks when journal on separate partition

2016-01-04 Thread Stuart Longland
Hi Bryan,
On 05/01/16 07:45, Stillwell, Bryan wrote:
> I ran into this same issue, and found that a reboot ended up setting the
> ownership correctly.  If you look at /lib/udev/rules.d/95-ceph-osd.rules
> you'll see the magic that makes it happen

Ahh okay, good-o, so a reboot should be fine.  I guess adding chown-ing
of journal files would be a good idea (maybe it's version specific, but
chown -R did not follow the symlink and change ownership for me).

Might be worth a mention in the release notes.  At least I'm not going
mad. :-)

So the procedure I'm following, having installed the latest stable ceph
on all nodes (via ansible):

> root@bneprdsn1:~# stop ceph-all
> ceph-all stop/waiting
> root@bneprdsn1:~# gpasswd -a ceph disk
> Adding user ceph to group disk
> root@bneprdsn1:~# chown -R ceph:ceph /dev/sdc[56] /var/lib/ceph

then I'll re-start monitor (ceph-all doesn't stop this, oddly enough)
and OSDs, then wait for the recovery to complete before moving onto the
next (final) node.
-- 
 _ ___ Stuart Longland - Systems Engineer
\  /|_) |   T: +61 7 3535 9619
 \/ | \ | 38b Douglas StreetF: +61 7 3535 9699
   SYSTEMSMilton QLD 4064   http://www.vrt.com.au
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Combo for Reliable SSD testing

2016-01-04 Thread Wade Holler
All,

I am testing an all SSD and NVMe (journal) config for a customers first
endeavor investigating Ceph for performance oriented workloads.

Can someone recommend a good performance and reliable ( under high load )
combination?

Terrible high level question I know but we have had a number of issues
while stress testing.

Cent 7.1 / Infernalis / EXT4 appeared to be stable.
Cent 7.1 (229.20 kernel) / Infernalis / XFS suffered from some amount of
XFS issues which I think was long running / hung kernel tasks.
Cent 7.2 ( 327.3 kernel as I recall ) / Infernalis and Jewell / XFS and
BTRFS appeared to suffer from the highest frequency of hung kernel tasks /
False ENOSPC osd errors.
Cent 7.1 (229.20 ) / Jewell / btrfs seems to have some nice performance
characteristics but will hang a kernel task every few stress tests.

Should I just punt on Cent and go to Ubuntu 14.04 for my stated use case?

I know this is an open ended and poor question but maybe someone out there
has done something similar and seen similar issues.

Thanks for reading

Wade
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd bench-write vs dd performance confusion

2016-01-04 Thread Jason Dillaman
There was a bug in the rbd CLI bench-write tool that would result in the same 
offset being re-written [1].  Since writeback cache is enabled (by default), in 
your example only 4MB would be written to the OSD at the conclusion of the 
test.  The fix should have been scheduled for backport to Hammer but it looks 
like it was missed.  I will open a new tracker ticket to start that process.

[1] https://github.com/ceph/ceph/commit/333f3a01a9916c781f266078391c580efb81a0fc

-- 

Jason Dillaman 


- Original Message -
> From: "Emile Snyder" 
> To: ceph-users@lists.ceph.com
> Sent: Monday, January 4, 2016 3:51:25 PM
> Subject: [ceph-users] rbd bench-write vs dd performance confusion
> 
> Hi all,
> 
> I'm trying to get comfortable with managing and benchmarking ceph clusters,
> and I'm struggling to understan rbd bench-write results vs using dd against
> mounted rbd images.
> 
> I have a 6 node test cluster running version 0.94.5, 2 nodes per rack, 20
> OSDs per node. Write journals are on the same disk as their OSD. My rbd pool
> is set for 3 replicas, with 2 on different hosts in a given rack, and 3rd on
> some host in a different rack.
> 
> 
> I created a test 100GB image with 4MB object size, created a VM client, and
> mounted the image at /dev/rbd1.
> 
> In a shell on one of my 6 storage nodes I have 'iostat 2' running.
> 
> Now my confusion; If I run on the client:
> 
> 'sudo dd if=/dev/zero of=/dev/rbd1 bs=4M count=1000 iflag=fullblock
> oflag=direct'
> 
> I see '4194304000 bytes (4.2 GB) copied, 18.5798 s, 226 MB/s' and the iostat
> on the storage node shows almost all 20 disks sustaining 4-16MB/s writes.
> 
> However, if I run
> 
> 'rbd --cluster  bench-write test-4m-image --io-size 400
> --io-threads 1 --io-total 400 --io-pattern rand'
> 
> I see 'elapsed:12  ops:1  ops/sec:   805.86  bytes/sec:
> 3223441447.72' but the iostat shows the disks basically all at 0.00kb_wrtn/s
> for the duration of the run.
> 
> So that's bench-write reporting 3.2 GB/s with iostat showing *nothing*
> happening, while dd writes 226 MB/s and iostat lights up. Am I
> misunderstanding what rbd-bench is supposed to do?
> 
> Thanks,
> -Emile
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Long peering - throttle at FileStore::queue_transactions

2016-01-04 Thread Guang Yang
Hi Cephers,
Happy New Year! I got question regards to the long PG peering..

Over the last several days I have been looking into the *long peering*
problem when we start a OSD / OSD host, what I observed was that the
two peering working threads were throttled (stuck) when trying to
queue new transactions (writing pg log), thus the peering process are
dramatically slow down.

The first question came to me was, what were the transactions in the
queue? The major ones, as I saw, included:

- The osd_map and incremental osd_map, this happens if the OSD had
been down for a while (in a large cluster), or when the cluster got
upgrade, which made the osd_map epoch the down OSD had, was far behind
the latest osd_map epoch. During the OSD booting, it would need to
persist all those osd_maps and generate lots of filestore transactions
(linear with the epoch gap).
> As the PG was not involved in most of those epochs, could we only take and 
> persist those osd_maps which matter to the PGs on the OSD?

- There are lots of deletion transactions, and as the PG booting, it
needs to merge the PG log from its peers, and for the deletion PG
entry, it would need to queue the deletion transaction immediately.
> Could we delay the queue of the transactions until all PGs on the host are 
> peered?

Thanks,
Guang
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd bench-write vs dd performance confusion

2016-01-04 Thread Snyder, Emile
Ah, thanks, that makes sense. I see bug 14225 opened for the backport.


I'm looking at 
http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_backport_commits, 
I'll see if I can get a PR up for that.

-emile

On 1/4/16, 3:11 PM, "Jason Dillaman"  wrote:

>There was a bug in the rbd CLI bench-write tool that would result in the same 
>offset being re-written [1].  Since writeback cache is enabled (by default), 
>in your example only 4MB would be written to the OSD at the conclusion of the 
>test.  The fix should have been scheduled for backport to Hammer but it looks 
>like it was missed.  I will open a new tracker ticket to start that process.
>
>[1] 
>https://github.com/ceph/ceph/commit/333f3a01a9916c781f266078391c580efb81a0fc
>
>-- 
>
>Jason Dillaman 
>
>
>- Original Message -
>> From: "Emile Snyder" 
>> To: ceph-users@lists.ceph.com
>> Sent: Monday, January 4, 2016 3:51:25 PM
>> Subject: [ceph-users] rbd bench-write vs dd performance confusion
>> 
>> Hi all,
>> 
>> I'm trying to get comfortable with managing and benchmarking ceph clusters,
>> and I'm struggling to understan rbd bench-write results vs using dd against
>> mounted rbd images.
>> 
>> I have a 6 node test cluster running version 0.94.5, 2 nodes per rack, 20
>> OSDs per node. Write journals are on the same disk as their OSD. My rbd pool
>> is set for 3 replicas, with 2 on different hosts in a given rack, and 3rd on
>> some host in a different rack.
>> 
>> 
>> I created a test 100GB image with 4MB object size, created a VM client, and
>> mounted the image at /dev/rbd1.
>> 
>> In a shell on one of my 6 storage nodes I have 'iostat 2' running.
>> 
>> Now my confusion; If I run on the client:
>> 
>> 'sudo dd if=/dev/zero of=/dev/rbd1 bs=4M count=1000 iflag=fullblock
>> oflag=direct'
>> 
>> I see '4194304000 bytes (4.2 GB) copied, 18.5798 s, 226 MB/s' and the iostat
>> on the storage node shows almost all 20 disks sustaining 4-16MB/s writes.
>> 
>> However, if I run
>> 
>> 'rbd --cluster  bench-write test-4m-image --io-size 400
>> --io-threads 1 --io-total 400 --io-pattern rand'
>> 
>> I see 'elapsed:12  ops:1  ops/sec:   805.86  bytes/sec:
>> 3223441447.72' but the iostat shows the disks basically all at 0.00kb_wrtn/s
>> for the duration of the run.
>> 
>> So that's bench-write reporting 3.2 GB/s with iostat showing *nothing*
>> happening, while dd writes 226 MB/s and iostat lights up. Am I
>> misunderstanding what rbd-bench is supposed to do?
>> 
>> Thanks,
>> -Emile
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Long peering - throttle at FileStore::queue_transactions

2016-01-04 Thread Samuel Just
We need every OSDMap persisted before persisting later ones because we
rely on there being no holes for a bunch of reasons.

The deletion transactions are more interesting.  It's not part of the
boot process, these are deletions resulting from merging in a log from
a peer which logically removed an object.  It's more noticeable on
boot because all PGs will see these operations at once (if there are a
bunch of deletes happening).  We need to process these transactions
before we can serve reads (before we activate) currently since we use
the on disk state (modulo the objectcontext locks) as authoritative.
That transaction iirc also contains the updated PGLog.  We can't avoid
writing down the PGLog prior to activation, but we *can* delay the
deletes (and even batch/throttle them) if we do some work:
1) During activation, we need to maintain a set of to-be-deleted
objects.  For each of these objects, we need to populate the
objectcontext cache with an exists=false objectcontext so that we
don't erroneously read the deleted data.  Each of the entries in the
to-be-deleted object set would have a reference to the context to keep
it alive until the deletion is processed.
2) Any write operation which references one of these objects needs to
be preceded by a delete if one has not yet been queued (and the
to-be-deleted set updated appropriately).  The tricky part is that the
primary and replicas may have different objects in this set...  The
replica would have to insert deletes ahead of any subop (or the ec
equilivant) it gets from the primary.  For that to work, it needs to
have something like the obc cache.  I have a wip-replica-read branch
which refactors object locking to allow the replica to maintain locks
(to avoid replica-reads conflicting with writes).  That machinery
would probably be the right place to put it.
3) We need to make sure that if a node restarts anywhere in this
process that it correctly repopulates the set of to be deleted
entries.  We might consider a deleted-to version in the log?  Not sure
about this one since it would be different on the replica and the
primary.

Anyway, it's actually more complicated than you'd expect and will
require more design (and probably depends on wip-replica-read
landing).
-Sam

On Mon, Jan 4, 2016 at 3:32 PM, Guang Yang  wrote:
> Hi Cephers,
> Happy New Year! I got question regards to the long PG peering..
>
> Over the last several days I have been looking into the *long peering*
> problem when we start a OSD / OSD host, what I observed was that the
> two peering working threads were throttled (stuck) when trying to
> queue new transactions (writing pg log), thus the peering process are
> dramatically slow down.
>
> The first question came to me was, what were the transactions in the
> queue? The major ones, as I saw, included:
>
> - The osd_map and incremental osd_map, this happens if the OSD had
> been down for a while (in a large cluster), or when the cluster got
> upgrade, which made the osd_map epoch the down OSD had, was far behind
> the latest osd_map epoch. During the OSD booting, it would need to
> persist all those osd_maps and generate lots of filestore transactions
> (linear with the epoch gap).
>> As the PG was not involved in most of those epochs, could we only take and 
>> persist those osd_maps which matter to the PGs on the OSD?
>
> - There are lots of deletion transactions, and as the PG booting, it
> needs to merge the PG log from its peers, and for the deletion PG
> entry, it would need to queue the deletion transaction immediately.
>> Could we delay the queue of the transactions until all PGs on the host are 
>> peered?
>
> Thanks,
> Guang
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to run multiple RadosGW instances under the same zone

2016-01-04 Thread Yang Honggang

Hello Srinivas,

Yes, we can use Haproxy as a frontend. But the precondition is multi 
RadosGW instances sharing
the *SAME CEPH POOLS* are running. I only want the master zone keep one 
copy of all data. I want

to access the data through *ANY *radosgw instance.
And it said in http://docs.ceph.com/docs/master/radosgw/federated-config/
"zones may have more than one Ceph Object Gateway instance per zone.". 
So I need the *official way*

to set up these radosgw instances.

thx

joseph

On 01/04/2016 06:37 PM, Srinivasula Maram wrote:


Hi Joseph,

You can try haproxy as proxy for load balancing and failover.

Thanks,

Srinivas

*From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On 
Behalf Of *Joseph Yang

*Sent:* Monday, January 04, 2016 2:09 PM
*To:* ceph-us...@ceph.com; Joseph Yang
*Subject:* [ceph-users] How to run multiple RadosGW instances under 
the same zone




Hello,
How to run multiple RadosGW instances under the same zone?
Assume there are two hosts HOST_1 and HOST2. I want to run
two RadosGW instances on these two hosts for my zone ZONE_MULI.
So, when one of the radosgw instance is down, I can still access the zone.
There are some questions:
1. How many ceph users should I create?
2. How many rados users should I create?
3. How to set ZONE_MULI's access_key/secret_key?
4. How to set the 'host' section in the ceph conf file for these two
radosgw instances?
5. How to start the instances?
 # radosgw --cluster My_Cluster -n ?_which_rados_user_?
I readhttp://docs.ceph.com/docs/master/radosgw/federated-config/,  but
there seems no explanation.
Your answer is appreciated!
thx
Joseph



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy create-initial errors out with "Some monitors have still not reached quorum"

2016-01-04 Thread Maruthi Seshidhar
Thank you Martin,

Yes, "nslookup " was not working.
After configuring DNS on all nodes, the nslookup issue got sorted out.

But the "some monitors have still not reach quorun" issue was still seen.
I was using user "ceph" for ceph deployment. The user "ceph" is reserved
for ceph internal use.
After creating a new user "cephdeploy", and running ceph-deploy commands
from this user, the cluster came up.

thanks & regards,
Maruthi.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Long peering - throttle at FileStore::queue_transactions

2016-01-04 Thread Sage Weil
On Mon, 4 Jan 2016, Guang Yang wrote:
> Hi Cephers,
> Happy New Year! I got question regards to the long PG peering..
> 
> Over the last several days I have been looking into the *long peering*
> problem when we start a OSD / OSD host, what I observed was that the
> two peering working threads were throttled (stuck) when trying to
> queue new transactions (writing pg log), thus the peering process are
> dramatically slow down.
> 
> The first question came to me was, what were the transactions in the
> queue? The major ones, as I saw, included:
> 
> - The osd_map and incremental osd_map, this happens if the OSD had
> been down for a while (in a large cluster), or when the cluster got
> upgrade, which made the osd_map epoch the down OSD had, was far behind
> the latest osd_map epoch. During the OSD booting, it would need to
> persist all those osd_maps and generate lots of filestore transactions
> (linear with the epoch gap).
> > As the PG was not involved in most of those epochs, could we only take and 
> > persist those osd_maps which matter to the PGs on the OSD?

This part should happen before the OSD sends the MOSDBoot message, before 
anyone knows it exists.  There is a tunable threshold that controls how 
recent the map has to be before the OSD tries to boot.  If you're 
seeing this in the real world, be probably just need to adjust that value 
way down to something small(er).

sage

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to run multiple RadosGW instances under the same zone

2016-01-04 Thread Ben Hines
It works fine. The federated config reference is not related to running
multiple instances on the same zone.

Just set up 2 radosgws give each instance the exact same configuration. (I
use different client names in ceph.conf, but i bet it would work even if
the client names were identical)

Official documentation on this very common use case would be a good idea, i
also figured this out on my own.

On Mon, Jan 4, 2016 at 6:21 PM, Yang Honggang 
wrote:

> Hello Srinivas,
>
> Yes, we can use Haproxy as a frontend. But the precondition is multi
> RadosGW instances sharing
> the *SAME CEPH POOLS* are running. I only want the master zone keep one
> copy of all data. I want
> to access the data through *ANY *radosgw instance.
> And it said in http://docs.ceph.com/docs/master/radosgw/federated-config/
> "zones may have more than one Ceph Object Gateway instance per zone.". So
> I need the *official way*
> to set up these radosgw instances.
>
> thx
>
> joseph
>
>
> On 01/04/2016 06:37 PM, Srinivasula Maram wrote:
>
> Hi Joseph,
>
>
>
> You can try haproxy as proxy for load balancing and failover.
>
>
>
> Thanks,
>
> Srinivas
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com
> ] *On Behalf Of *Joseph Yang
> *Sent:* Monday, January 04, 2016 2:09 PM
> *To:* ceph-us...@ceph.com; Joseph Yang
> *Subject:* [ceph-users] How to run multiple RadosGW instances under the
> same zone
>
>
>
>
>
> Hello,
>
>
>
> How to run multiple RadosGW instances under the same zone?
>
>
>
> Assume there are two hosts HOST_1 and HOST2. I want to run
>
> two RadosGW instances on these two hosts for my zone ZONE_MULI.
>
> So, when one of the radosgw instance is down, I can still access the zone.
>
>
>
> There are some questions:
>
> 1. How many ceph users should I create?
>
> 2. How many rados users should I create?
>
> 3. How to set ZONE_MULI's access_key/secret_key?
>
> 4. How to set the 'host' section in the ceph conf file for these two
>
>radosgw instances?
>
> 5. How to start the instances?
>
> # radosgw --cluster My_Cluster -n ?_which_rados_user_?
>
>
>
> I read http://docs.ceph.com/docs/master/radosgw/federated-config/, but
>
> there seems no explanation.
>
>
>
> Your answer is appreciated!
>
>
>
> thx
>
>
>
> Joseph
>
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] systemd support?

2016-01-04 Thread Adam


On 01/01/2016 08:22 PM, Adam wrote:
> I'm running into the same install problem described here:
> https://www.spinics.net/lists/ceph-users/msg23533.html
> 
> I tried compiling from source (ceph-9.2.0) to see if it had been fixed
> in the latest code, but I got the same error as with the pre-compiled
> binaries.  Is there any solution or workaround to this?

I just learned that ceph-deploy isn't included with ceph.  I was using
the latest Ubuntu pakage (1.5.20-0ubuntu1).  I cloned the latest from
git (1.5.31).  Both of them have the exact same error.


Here's the exact error message:
[horde.diseasedmind.com][INFO  ] Running command: sudo initctl emit
ceph-mon cluster=ceph id=horde
[horde.diseasedmind.com][WARNIN] initctl: Unable to connect to Upstart:
Failed to connect to socket /com/ubuntu/upstart: Connection refused
[horde.diseasedmind.com][ERROR ] RuntimeError: command returned non-zero
exit status: 1
[ceph_deploy.mon][ERROR ] Failed to execute command: initctl emit
ceph-mon cluster=ceph id=horde
[ceph_deploy][ERROR ] GenericError: Failed to create 1 monitors

I'm using Ubuntu 15.04 (Vivid Vervet).



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to run multiple RadosGW instances under the same zone

2016-01-04 Thread Srinivasula Maram
Yes, it should work. Even if you have multiple radosgws/instances, all 
instances use same pools.

Ceph.conf:
[client.radosgw.gateway-1]
host = host1
keyring = /etc/ceph/ceph.client.admin.keyring
rgw_socket_path = /var/log/ceph/radosgw1.sock
log_file = /var/log/ceph/radosgw-1.host1.log
rgw_max_chunk_size = 4194304
rgw_frontends = "civetweb port=8081"
rgw_dns_name = host1
rgw_ops_log_rados = false
rgw_enable_ops_log = false
rgw_cache_lru_size = 100
rgw_enable_usage_log = false
rgw_usage_log_tick_interval = 30
rgw_usage_log_flush_threshold = 1024
rgw_exit_timeout_secs = 600

[client.radosgw.gateway-2]
host = host2
keyring = /etc/ceph/ceph.client.admin.keyring
rgw_socket_path = /var/log/ceph/radosgw2.sock
log_file = /var/log/ceph/radosgw-2.host2.log
rgw_max_chunk_size = 4194304
rgw_frontends = "civetweb port=8082"
rgw_dns_name = host2
rgw_ops_log_rados = false
rgw_enable_ops_log = false
rgw_cache_lru_size = 100
rgw_enable_usage_log = false
rgw_usage_log_tick_interval = 30
rgw_usage_log_flush_threshold = 1024
rgw_exit_timeout_secs = 600
Thanks,
Srinivas

From: Ben Hines [mailto:bhi...@gmail.com]
Sent: Tuesday, January 05, 2016 10:07 AM
To: Yang Honggang
Cc: Srinivasula Maram; ceph-us...@ceph.com; Javen Wu
Subject: Re: [ceph-users] How to run multiple RadosGW instances under the same 
zone

It works fine. The federated config reference is not related to running 
multiple instances on the same zone.

Just set up 2 radosgws give each instance the exact same configuration. (I use 
different client names in ceph.conf, but i bet it would work even if the client 
names were identical)

Official documentation on this very common use case would be a good idea, i 
also figured this out on my own.

On Mon, Jan 4, 2016 at 6:21 PM, Yang Honggang 
mailto:joseph.y...@xtaotech.com>> wrote:
Hello Srinivas,

Yes, we can use Haproxy as a frontend. But the precondition is multi RadosGW 
instances sharing
the SAME CEPH POOLS are running. I only want the master zone keep one copy of 
all data. I want
to access the data through ANY radosgw instance.
And it said in http://docs.ceph.com/docs/master/radosgw/federated-config/
"zones may have more than one Ceph Object Gateway instance per zone.". So I 
need the official way
to set up these radosgw instances.

thx

joseph

On 01/04/2016 06:37 PM, Srinivasula Maram wrote:
Hi Joseph,

You can try haproxy as proxy for load balancing and failover.

Thanks,
Srinivas

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joseph 
Yang
Sent: Monday, January 04, 2016 2:09 PM
To: ceph-us...@ceph.com; Joseph Yang
Subject: [ceph-users] How to run multiple RadosGW instances under the same zone



Hello,



How to run multiple RadosGW instances under the same zone?



Assume there are two hosts HOST_1 and HOST2. I want to run

two RadosGW instances on these two hosts for my zone ZONE_MULI.

So, when one of the radosgw instance is down, I can still access the zone.



There are some questions:

1. How many ceph users should I create?

2. How many rados users should I create?

3. How to set ZONE_MULI's access_key/secret_key?

4. How to set the 'host' section in the ceph conf file for these two

   radosgw instances?

5. How to start the instances?

# radosgw --cluster My_Cluster -n ?_which_rados_user_?



I read http://docs.ceph.com/docs/master/radosgw/federated-config/, but

there seems no explanation.



Your answer is appreciated!



thx



Joseph




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to run multiple RadosGW instances under the same zone

2016-01-04 Thread Yang Honggang

It works. Thank you for your time (Srinivas and Ben).

Supplement:
client.radosgw.gateway-1 and client.radosgw.gateway-2 should only share 
the same ceph pools.
A keyring must be created for both client.radosgw.gateway-1 and 
client.radosgw.gateway-2.


thx

joseph

On 01/05/2016 01:26 PM, Srinivasula Maram wrote:


Yes, it should work. Even if you have multiple radosgws/instances, all 
instances use same pools.


Ceph.conf:

[client.radosgw.gateway-1]

host = host1

keyring = /etc/ceph/ceph.client.admin.keyring

rgw_socket_path = /var/log/ceph/radosgw1.sock

log_file = /var/log/ceph/radosgw-1.host1.log

rgw_max_chunk_size = 4194304

rgw_frontends = "civetweb port=8081"

rgw_dns_name = host1

rgw_ops_log_rados = false

rgw_enable_ops_log = false

rgw_cache_lru_size = 100

rgw_enable_usage_log = false

rgw_usage_log_tick_interval = 30

rgw_usage_log_flush_threshold = 1024

rgw_exit_timeout_secs = 600

[client.radosgw.gateway-2]

host = host2

keyring = /etc/ceph/ceph.client.admin.keyring

rgw_socket_path = /var/log/ceph/radosgw2.sock

log_file = /var/log/ceph/radosgw-2.host2.log

rgw_max_chunk_size = 4194304

rgw_frontends = "civetweb port=8082"

rgw_dns_name = host2

rgw_ops_log_rados = false

rgw_enable_ops_log = false

rgw_cache_lru_size = 100

rgw_enable_usage_log = false

rgw_usage_log_tick_interval = 30

rgw_usage_log_flush_threshold = 1024

rgw_exit_timeout_secs = 600

Thanks,

Srinivas

*From:*Ben Hines [mailto:bhi...@gmail.com]
*Sent:* Tuesday, January 05, 2016 10:07 AM
*To:* Yang Honggang
*Cc:* Srinivasula Maram; ceph-us...@ceph.com; Javen Wu
*Subject:* Re: [ceph-users] How to run multiple RadosGW instances 
under the same zone


It works fine. The federated config reference is not related to 
running multiple instances on the same zone.


Just set up 2 radosgws give each instance the exact same 
configuration. (I use different client names in ceph.conf, but i bet 
it would work even if the client names were identical)


Official documentation on this very common use case would be a good 
idea, i also figured this out on my own.


On Mon, Jan 4, 2016 at 6:21 PM, Yang Honggang 
mailto:joseph.y...@xtaotech.com>> wrote:


Hello Srinivas,

Yes, we can use Haproxy as a frontend. But the precondition is multi 
RadosGW instances sharing
the *SAME CEPH POOLS* are running. I only want the master zone keep 
one copy of all data. I want

to access the data through *ANY *radosgw instance.
And it said in http://docs.ceph.com/docs/master/radosgw/federated-config/
"zones may have more than one Ceph Object Gateway instance per zone.". 
So I need the *official way*

to set up these radosgw instances.

thx

joseph

On 01/04/2016 06:37 PM, Srinivasula Maram wrote:

Hi Joseph,

You can try haproxy as proxy for load balancing and failover.

Thanks,

Srinivas

*From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
Behalf Of *Joseph Yang
*Sent:* Monday, January 04, 2016 2:09 PM
*To:* ceph-us...@ceph.com ; Joseph Yang
*Subject:* [ceph-users] How to run multiple RadosGW instances
under the same zone

Hello,

  


How to run multiple RadosGW instances under the same zone?

  


Assume there are two hosts HOST_1 and HOST2. I want to run

two RadosGW instances on these two hosts for my zone ZONE_MULI.

So, when one of the radosgw instance is down, I can still access the zone.

  


There are some questions:

1. How many ceph users should I create?

2. How many rados users should I create?

3. How to set ZONE_MULI's access_key/secret_key?

4. How to set the 'host' section in the ceph conf file for these two

radosgw instances?

5. How to start the instances?

 # radosgw --cluster My_Cluster -n ?_which_rados_user_?

  


I readhttp://docs.ceph.com/docs/master/radosgw/federated-config/,  but

there seems no explanation.

  


Your answer is appreciated!

  


thx

  


Joseph


___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com