[ceph-users] blockdev --setro cannot set krbd to readonly

2013-09-08 Thread Da Chun Ng
I mapped an image to a system, and used blockdev to make it readonly. But it 
failed.[root@ceph0 mnt]# blockdev --setro /dev/rbd2[root@ceph0 mnt]# blockdev 
--getro /dev/rbd20
It's on Centos6.4 with kernel 3.10.6 .Ceph 0.61.8 .
Any idea? ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] System reboot hangs when umounting filesystem on rbd

2013-09-08 Thread Da Chun Ng
Centos 6.4Kernel 3.10.6Ceph 0.61.8
My ceph cluster is deployed on three nodes.One rbd image was created, mapped to 
one of the three nodes, formatted with ext4, and mounted.When rebooting this 
node, it hung umouting the file system on the rbd.
My guess about the root cause:When the system shutting down, the services are 
stopped firstly, then the mounted file systems are umounted. For the file 
system on rbd, if there are dirty pages, a flush will happen, but the ceph 
services have been shut down, so it will hang.
Am I right? How to work around this?
Thanks!   ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to force a monitor time check?

2013-09-03 Thread Da Chun Ng
Time skews happen frequently when the systems running monitors are 
restarted.With ntp server configured, the time skew between systems will be 
fixed over some time. But the ceph monitors won't find it at once if there are 
no time check messages at that time, so the ceph status will be still shown as 
time skew warning. Sometimes it will keep for several hours.How can I trigger a 
monitor time check manually? ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is it possible to change the pg number after adding new osds?

2013-09-02 Thread Da Chun Ng
Only pgp_num is listed in the reference. Though pg_num can be changed in the 
same way, is there any risk to do that?

From: andreas.fu...@swisstxt.ch
To: dachun...@outlook.com; ceph-users@lists.ceph.com
Subject: RE: [ceph-users] Is it possible to change the pg number after adding   
new osds?
Date: Mon, 2 Sep 2013 09:02:15 +









You can change the pg numbers on the fly with
 
ceph osd pool set {pool_name} pg_num {value}
ceph osd pool set {pool_name} pgp_num {value}
 
refrence: http://ceph.com/docs/master/rados/operations/pools/
 


From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com]
On Behalf Of Da Chun Ng

Sent: Montag, 2. September 2013 04:49

To: ceph-users@lists.ceph.com

Subject: [ceph-users] Is it possible to change the pg number after adding new 
osds?


 

According to the doc, the pg numbers should be enlarged for better read/write 
balance if the osd number is increased.

But seems the pg number cannot be changed on the fly. It's fixed when the pool 
is created. Am I right?


  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Is it possible to change the pg number after adding new osds?

2013-09-01 Thread Da Chun Ng
According to the doc, the pg numbers should be enlarged for better read/write 
balance if the osd number is increased.But seems the pg number cannot be 
changed on the fly. It's fixed when the pool is created. Am I right?
   ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] The whole cluster hangs when changing MTU to 9216

2013-08-26 Thread Da Chun Ng
Centos 6.4Ceph Cuttlefish 0.61.7, or 0.61.8.
I changed the MTU to 9216(or 9000), then restarted all the cluster nodes. The 
whole cluster hung, with messages in the mon log as below:4048 2013-08-26 
15:52:43.028554 7fd83f131700  1 mon.ceph0@0(electing).elector(15) init, last 
seen epoch 154049 2013-08-26 15:52:46.431842 7fd83f131700  1 
mon.ceph0@0(electing) e1 discarding message auth(proto 0 30 bytes epoch 1) v1 
and sending client elsewhere4050 2013-08-26 15:52:46.431886 7fd83f131700  1 
mon.ceph0@0(electing) e1 discarding message auth(proto 0 30 bytes epoch 1) v1 
and sending client elsewhere4051 2013-08-26 15:52:46.431899 7fd83f131700  1 
mon.ceph0@0(electing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 
and sending client elsewhere4052 2013-08-26 15:52:46.431911 7fd83f131700  1 
mon.ceph0@0(electing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 
and sending client elsewhere4053 2013-08-26 15:52:46.431923 7fd83f131700  1 
mon.ceph0@0(electing) e1 discarding message auth(proto 0 30 bytes epoch 1) v1 
and sending client elsewhere4054 2013-08-26 15:52:46.431937 7fd83f131700  1 
mon.ceph0@0(electing) e1 discarding message auth(proto 0 26 bytes epoch 0) v1 
and sending client elsewhere4055 2013-08-26 15:52:46.431948 7fd83f131700  1 
mon.ceph0@0(electing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 
and sending client elsewhere4056 2013-08-26 15:52:48.028808 7fd83f131700  1 
mon.ceph0@0(electing).elector(15) init, last seen epoch 154057 2013-08-26 
15:52:51.432073 7fd83f131700  1 mon.ceph0@0(electing) e1 discarding message 
auth(proto 0 26 bytes epoch 0) v1 and sending client elsewhere4058 2013-08-26 
15:52:51.432116 7fd83f131700  1 mon.ceph0@0(electing) e1 discarding message 
auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere4059 2013-08-26 
15:52:51.432129 7fd83f131700  1 mon.ceph0@0(electing) e1 discarding message 
auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere4060 2013-08-26 
15:52:51.432147 7fd83f131700  1 mon.ceph0@0(electing) e1 discarding message 
auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere4061 2013-08-26 
15:52:53.029037 7fd83f131700  1 mon.ceph0@0(electing).elector(15) init, last 
seen epoch 15
  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Poor write/random read/random write performance

2013-08-21 Thread Da Chun Ng
Mark,
I tried with journal aio = true, and op thread = 4, but made little 
difference.Then I tried to enlarge read ahead value both on the osd block 
devices and cephfs client. It did improve some overall performance, especially 
the sequential read performance. But still has not much help to the 
write/random read/random write performance.
I tried to change the place group number to  (100 *  osd_num)/replica_size, it 
does not decrease the overall performance this time, but not improve neither.
 Date: Mon, 19 Aug 2013 12:31:07 -0500
 From: mark.nel...@inktank.com
 To: dachun...@outlook.com
 CC: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Poor write/random read/random write performance
 
 On 08/19/2013 12:05 PM, Da Chun Ng wrote:
  Thank you! Testing now.
  
  How about pg num? I'm using the default size 64, as I tried with (100 * 
  osd_num)/replica_size, but it decreased the performance surprisingly.
 
 Oh!  That's odd!  Typically you would want more than that.  Most likely
 you aren't distributing PGs very evenly across OSDs with 64.  More PGs
 shouldn't decrease performance unless the monitors are behaving badly.
 We saw some issues back in early cuttlefish but you should be fine with
 many more PGs.
 
 Mark
 
  
Date: Mon, 19 Aug 2013 11:33:30 -0500
From: mark.nel...@inktank.com
To: dachun...@outlook.com
CC: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Poor write/random read/random write performance
   
On 08/19/2013 08:59 AM, Da Chun Ng wrote:
 Thanks very much! Mark.
 Yes, I put the data and journal on the same disk, no SSD in my 
  environment.
 My controllers are general SATA II.
   
Ok, so in this case the lack of WB cache on the controller and no SSDs
for journals is probably having an effect.
   

 Some more questions below in blue.

 
  
 Date: Mon, 19 Aug 2013 07:48:23 -0500
 From: mark.nel...@inktank.com
 To: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Poor write/random read/random write 
  performance

 On 08/19/2013 06:28 AM, Da Chun Ng wrote:

 I have a 3 nodes, 15 osds ceph cluster setup:
 * 15 7200 RPM SATA disks, 5 for each node.
 * 10G network
 * Intel(R) Xeon(R) CPU E5-2620(6 cores) 2.00GHz, for each node.
 * 64G Ram for each node.

 I deployed the cluster with ceph-deploy, and created a new data pool
 for cephfs.
 Both the data and metadata pools are set with replica size 3.
 Then mounted the cephfs on one of the three nodes, and tested the
 performance with fio.

 The sequential read performance looks good:
 fio -direct=1 -iodepth 1 -thread -rw=read -ioengine=libaio -bs=16K
 -size=1G -numjobs=16 -group_reporting -name=mytest -runtime 60
 read : io=10630MB, bw=181389KB/s, iops=11336 , runt= 60012msec


 Sounds like readahead and or caching is helping out a lot here. 
  Btw, you
 might want to make sure this is actually coming from the disks with
 iostat or collectl or something.

 I ran sync  echo 3 | tee /proc/sys/vm/drop_caches on all the nodes
 before every test. I used collectl to watch every disk IO, the numbers
 should match. I think readahead is helping here.
   
Ok, good! I suspect that readahead is indeed helping.
   


 But the sequential write/random read/random write performance is
 very poor:
 fio -direct=1 -iodepth 1 -thread -rw=write -ioengine=libaio -bs=16K
 -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60
 write: io=397280KB, bw=6618.2KB/s, iops=413 , runt= 60029msec


 One thing to keep in mind is that unless you have SSDs in this system,
 you will be doing 2 writes for every client write to the spinning 
  disks
 (since data and journals will both be on the same disk).

 So let's do the math:

 6618.2KB/s * 3 replication * 2 (journal + data writes) * 1024
 (KB-bytes) / 16384 (write size in bytes) / 15 drives = ~165 IOPS / 
  drive

 If there is no write coalescing going on, this isn't terrible. If 
  there
 is, this is terrible.

 How can I know if there is write coalescing going on?
   
look in collectl at the average IO sizes going to the disks. I bet they
will be 16KB. If you were to look further with blktrace and
seekwatcher, I bet you'd see lots of seeking between OSD data writes and
journal writes since there is no controller cache helping smooth things
out (and your journals are on the same drives).
   

 Have you tried buffered writes with the sync engine at the same IO 
  size?

 Do you mean as below?
 fio -direct=0-iodepth 1 -thread -rw=write -ioengine=sync-bs=16K
 -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60
   
Yeah, that'd work.
   

 fio -direct=1 -iodepth 1 -thread -rw=randread -ioengine

[ceph-users] Poor write/random read/random write performance

2013-08-19 Thread Da Chun Ng
I have a 3 nodes, 15 osds ceph cluster setup:* 15 7200 RPM SATA disks, 5 for 
each node.* 10G network* Intel(R) Xeon(R) CPU E5-2620(6 cores) 2.00GHz, for 
each node.* 64G Ram for each node.

I deployed the cluster with ceph-deploy, and created a new data pool for 
cephfs.Both the data and metadata pools are set with replica size 3.Then 
mounted the cephfs on one of the three nodes, and tested the performance with 
fio.
The sequential read  performance looks good:fio -direct=1 -iodepth 1 -thread 
-rw=read -ioengine=libaio -bs=16K -size=1G -numjobs=16 -group_reporting 
-name=mytest -runtime 60read : io=10630MB, bw=181389KB/s, iops=11336 , runt= 
60012msec
But the sequential write/random read/random write performance is very poor:fio 
-direct=1 -iodepth 1 -thread -rw=write -ioengine=libaio -bs=16K -size=256M 
-numjobs=16 -group_reporting -name=mytest -runtime 60write: io=397280KB, 
bw=6618.2KB/s, iops=413 , runt= 60029msecfio -direct=1 -iodepth 1 -thread 
-rw=randread -ioengine=libaio -bs=16K -size=256M -numjobs=16 -group_reporting 
-name=mytest -runtime 60read : io=665664KB, bw=11087KB/s, iops=692 , runt= 
60041msecfio -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=libaio 
-bs=16K -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60write: 
io=361056KB, bw=6001.1KB/s, iops=375 , runt= 60157msec
I am mostly surprised by the seq write performance comparing to the raw sata 
disk performance(It can get 4127 IOPS when mounted with ext4). My cephfs only 
gets 1/10 performance of the raw disk.
How can I tune my cluster to improve the sequential write/random read/random 
write performance?


  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Poor write/random read/random write performance

2013-08-19 Thread Da Chun Ng
Sorry, forget to tell the OS and kernel version.
It's Centos 6.4 with kernel 3.10.6 .fio 2.0.13 .

From: dachun...@outlook.com
To: ceph-users@lists.ceph.com
Date: Mon, 19 Aug 2013 11:28:24 +
Subject: [ceph-users] Poor write/random read/random write performance




I have a 3 nodes, 15 osds ceph cluster setup:* 15 7200 RPM SATA disks, 5 for 
each node.* 10G network* Intel(R) Xeon(R) CPU E5-2620(6 cores) 2.00GHz, for 
each node.* 64G Ram for each node.

I deployed the cluster with ceph-deploy, and created a new data pool for 
cephfs.Both the data and metadata pools are set with replica size 3.Then 
mounted the cephfs on one of the three nodes, and tested the performance with 
fio.
The sequential read  performance looks good:fio -direct=1 -iodepth 1 -thread 
-rw=read -ioengine=libaio -bs=16K -size=1G -numjobs=16 -group_reporting 
-name=mytest -runtime 60read : io=10630MB, bw=181389KB/s, iops=11336 , runt= 
60012msec
But the sequential write/random read/random write performance is very poor:fio 
-direct=1 -iodepth 1 -thread -rw=write -ioengine=libaio -bs=16K -size=256M 
-numjobs=16 -group_reporting -name=mytest -runtime 60write: io=397280KB, 
bw=6618.2KB/s, iops=413 , runt= 60029msecfio -direct=1 -iodepth 1 -thread 
-rw=randread -ioengine=libaio -bs=16K -size=256M -numjobs=16 -group_reporting 
-name=mytest -runtime 60read : io=665664KB, bw=11087KB/s, iops=692 , runt= 
60041msecfio -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=libaio 
-bs=16K -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60write: 
io=361056KB, bw=6001.1KB/s, iops=375 , runt= 60157msec
I am mostly surprised by the seq write performance comparing to the raw sata 
disk performance(It can get 4127 IOPS when mounted with ext4). My cephfs only 
gets 1/10 performance of the raw disk.
How can I tune my cluster to improve the sequential write/random read/random 
write performance?


  

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com  
  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Poor write/random read/random write performance

2013-08-19 Thread Da Chun Ng
Thanks very much! Mark.Yes, I put the data and journal on the same disk, no SSD 
in my environment.My controllers are general SATA II.
Some more questions below in blue.

Date: Mon, 19 Aug 2013 07:48:23 -0500
From: mark.nel...@inktank.com
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Poor write/random read/random write performance


  

  
  
On 08/19/2013 06:28 AM, Da Chun Ng
  wrote:



  
  I have a 3 nodes, 15 osds ceph cluster setup:
* 15 7200 RPM SATA disks, 5 for each node.
* 10G network
* Intel(R) Xeon(R) CPU E5-2620(6 cores) 2.00GHz, for each
  node.
* 64G Ram for each node.

  

  
  I deployed the cluster with ceph-deploy, and created a
new data pool for cephfs.
  Both the data and metadata pools are set with replica
size 3.
  Then mounted the cephfs on
  one of the three nodes, and tested the performance with
  fio.
  

  
  The sequential read  performance looks good:
  fio -direct=1 -iodepth 1 -thread -rw=read
-ioengine=libaio -bs=16K -size=1G -numjobs=16
-group_reporting -name=mytest -runtime 60
  read : io=10630MB, bw=181389KB/s, iops=11336 , runt=
60012msec

  



Sounds like readahead and or caching is helping out a lot here. 
Btw, you might want to make sure this is actually coming from the
disks with iostat or collectl or something.
I ran sync  echo 3 | tee /proc/sys/vm/drop_caches on all the nodes before 
every test. I used collectl to watch every disk IO, the numbers should match. I 
think readahead is helping here.




  

  

  
  But the sequential write/random read/random write
performance is very poor:
  fio -direct=1 -iodepth 1 -thread -rw=write -ioengine=libaio -bs=16K 
-size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60
  write: io=397280KB, bw=6618.2KB/s, iops=413 , runt=
60029msec

  



One thing to keep in mind is that unless you have SSDs in this
system, you will be doing 2 writes for every client write to the
spinning disks (since data and journals will both be on the same
disk).



So let's do the math:



6618.2KB/s * 3 replication * 2 (journal + data writes) * 1024
(KB-bytes) / 16384 (write size in bytes) / 15 drives = ~165 IOPS
/ drive



If there is no write coalescing going on, this isn't terrible.  If
there is, this is terrible. 
How can I know if there is write coalescing going on?
Have you tried buffered writes with the
sync engine at the same IO size?
Do you mean as below?fio -direct=0 -iodepth 1 -thread -rw=write -ioengine=sync 
-bs=16K -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60




  

  fio -direct=1 -iodepth 1 -thread -rw=randread
-ioengine=libaio -bs=16K -size=256M -numjobs=16
-group_reporting -name=mytest -runtime 60
  read : io=665664KB, bw=11087KB/s, iops=692 , runt=
60041msec

  



In this case:



11087 * 1024 (KB-bytes) / 16384 / 15 = ~46 IOPS / drive.  



Definitely not great!  You might want to try fiddling with read
ahead both on the CephFS client and on the block devices under the
OSDs themselves.  
Could you please tell me how to enable read ahead on the CephFS client? 
For the block devices under the OSDs, the read ahead value is:[root@ceph0 ~]# 
blockdev --getra /dev/sdi256How big is appropriate for it?


One thing I did notice back during bobtail is that increasing the
number of osd op threads seemed to help small object read
performance.  It might be worth looking at too.



http://ceph.com/community/ceph-bobtail-jbod-performance-tuning/#4kbradosread



Other than that, if you really want to dig into this, you can use
tools like iostat, collectl, blktrace, and seekwatcher to try and
get a feel for what the IO going to the OSDs looks like.  That can
help when diagnosing this sort of thing.




  

  fio -direct=1 -iodepth 1 -thread -rw=randwrite
-ioengine=libaio -bs=16K -size=256M -numjobs=16
-group_reporting -name=mytest -runtime 60
  write: io=361056KB, bw=6001.1KB/s, iops=375 , runt=
60157msec

  



6001.1KB/s * 3 replication * 2 (journal + data writes) * 1024
(KB-bytes) / 16384 (write size in bytes) / 15 drives = ~150 IOPS
/ drive




  

  

  
  I am mostly surprised by the seq write performance
comparing to the raw sata disk performance(It can get 4127
IOPS when mounted with ext4). My cephfs only gets 1/10
performance of the raw disk

Re: [ceph-users] Poor write/random read/random write performance

2013-08-19 Thread Da Chun Ng
Thank you! Testing now.
How about pg num? I'm using the default size 64, as I tried with (100 * 
osd_num)/replica_size, but it decreased the performance surprisingly.

 Date: Mon, 19 Aug 2013 11:33:30 -0500
 From: mark.nel...@inktank.com
 To: dachun...@outlook.com
 CC: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Poor write/random read/random write performance
 
 On 08/19/2013 08:59 AM, Da Chun Ng wrote:
  Thanks very much! Mark.
  Yes, I put the data and journal on the same disk, no SSD in my environment.
  My controllers are general SATA II.
 
 Ok, so in this case the lack of WB cache on the controller and no SSDs
 for journals is probably having an effect.
 
  
  Some more questions below in blue.
  
  
  Date: Mon, 19 Aug 2013 07:48:23 -0500
  From: mark.nel...@inktank.com
  To: ceph-users@lists.ceph.com
  Subject: Re: [ceph-users] Poor write/random read/random write performance
  
  On 08/19/2013 06:28 AM, Da Chun Ng wrote:
  
  I have a 3 nodes, 15 osds ceph cluster setup:
  * 15 7200 RPM SATA disks, 5 for each node.
  * 10G network
  * Intel(R) Xeon(R) CPU E5-2620(6 cores) 2.00GHz, for each node.
  * 64G Ram for each node.
  
  I deployed the cluster with ceph-deploy, and created a new data pool
  for cephfs.
  Both the data and metadata pools are set with replica size 3.
  Then mounted the cephfs on one of the three nodes, and tested the
  performance with fio.
  
  The sequential read  performance looks good:
  fio -direct=1 -iodepth 1 -thread -rw=read -ioengine=libaio -bs=16K
  -size=1G -numjobs=16 -group_reporting -name=mytest -runtime 60
  read : io=10630MB, bw=181389KB/s, iops=11336 , runt= 60012msec
  
  
  Sounds like readahead and or caching is helping out a lot here. Btw, you 
  might want to make sure this is actually coming from the disks with 
  iostat or collectl or something.
  
  I ran sync  echo 3 | tee /proc/sys/vm/drop_caches on all the nodes 
  before every test. I used collectl to watch every disk IO, the numbers 
  should match. I think readahead is helping here.
 
 Ok, good!  I suspect that readahead is indeed helping.
 
  
  
  But the sequential write/random read/random write performance is
  very poor:
  fio -direct=1 -iodepth 1 -thread -rw=write -ioengine=libaio -bs=16K
  -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60
  write: io=397280KB, bw=6618.2KB/s, iops=413 , runt= 60029msec
  
  
  One thing to keep in mind is that unless you have SSDs in this system, 
  you will be doing 2 writes for every client write to the spinning disks 
  (since data and journals will both be on the same disk).
  
  So let's do the math:
  
  6618.2KB/s * 3 replication * 2 (journal + data writes) * 1024 
  (KB-bytes) / 16384 (write size in bytes) / 15 drives = ~165 IOPS / drive
  
  If there is no write coalescing going on, this isn't terrible.  If there 
  is, this is terrible.
  
  How can I know if there is write coalescing going on?
 
 look in collectl at the average IO sizes going to the disks.  I bet they
 will be 16KB.  If you were to look further with blktrace and
 seekwatcher, I bet you'd see lots of seeking between OSD data writes and
 journal writes since there is no controller cache helping smooth things
 out (and your journals are on the same drives).
 
  
  Have you tried buffered writes with the sync engine at the same IO size?
  
  Do you mean as below?
  fio -direct=0-iodepth 1 -thread -rw=write -ioengine=sync-bs=16K 
  -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60
 
 Yeah, that'd work.
 
  
  fio -direct=1 -iodepth 1 -thread -rw=randread -ioengine=libaio
  -bs=16K -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60
  read : io=665664KB, bw=11087KB/s, iops=692 , runt= 60041msec
  
  
  In this case:
  
  11087 * 1024 (KB-bytes) / 16384 / 15 = ~46 IOPS / drive.
  
  Definitely not great!  You might want to try fiddling with read ahead 
  both on the CephFS client and on the block devices under the OSDs 
  themselves.
  
  Could you please tell me how to enable read ahead on the CephFS client?
 
 It's one of the mount options:
 
 http://ceph.com/docs/master/man/8/mount.ceph/
 
  
  For the block devices under the OSDs, the read ahead value is:
  [root@ceph0 ~]# blockdev --getra /dev/sdi
  256
  How big is appropriate for it?
 
 To be honest I've seen different results depending on the hardware.  I'd
 try anywhere from 32kb to 2048kb.
 
  
  One thing I did notice back during bobtail is that increasing the number 
  of osd op threads seemed to help small object read performance.  It 
  might be worth looking at too.
  
  http://ceph.com/community/ceph-bobtail-jbod-performance-tuning/#4kbradosread
  
  Other than that, if you really want to dig into this, you can use tools 
  like iostat, collectl, blktrace, and seekwatcher to try and get a feel 
  for what the IO

[ceph-users] All old pgs in stale after recreating all osds

2013-08-09 Thread Da Chun
On Centos 6.4, Ceph 0.61.7.
I had a ceph cluster of 9 osds. Today I destroyed all of the osds, and 
recreated 6 new ones.
Then I find all the old pgs are in stale.
[root@ceph0 ceph]# ceph -s
   health HEALTH_WARN 192 pgs stale; 192 pgs stuck inactive; 192 pgs stuck 
stale; 192 pgs stuck unclean
   monmap e1: 3 mons at 
{ceph0=172.18.11.60:6789/0,ceph1=172.18.11.61:6789/0,ceph2=172.18.11.62:6789/0},
 election epoch 24, quorum 0,1,2 ceph0,ceph1,ceph2
   osdmap e166: 6 osds: 6 up, 6 in
pgmap v837: 192 pgs: 192 stale; 9526 bytes data, 221 MB used, 5586 GB / 
5586 GB avail
   mdsmap e114: 0/0/1 up



[root@ceph0 ~]# ceph health detail
...
pg 2.3 is stuck stale for 10249.230667, current state stale, last acting [5]
...
[root@ceph0 ~]# ceph pg 2.3 query
i don't have pgid 2.3



How can I get all the pgs back or recreated?


Thanks!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] All old pgs in stale after recreating all osds

2013-08-09 Thread Da Chun
On Centos 6.4, Ceph 0.61.7.
I had a ceph cluster of 9 osds. Today I destroyed all of the osds, and 
recreated 6 new ones.
Then I find all the old pgs are in stale.
[root@ceph0 ceph]# ceph -s
   health HEALTH_WARN 192 pgs stale; 192 pgs stuck inactive; 192 pgs stuck 
stale; 192 pgs stuck unclean
   monmap e1: 3 mons at 
{ceph0=172.18.11.60:6789/0,ceph1=172.18.11.61:6789/0,ceph2=172.18.11.62:6789/0},
 election epoch 24, quorum 0,1,2 ceph0,ceph1,ceph2
   osdmap e166: 6 osds: 6 up, 6 in
pgmap v837: 192 pgs: 192 stale; 9526 bytes data, 221 MB used, 5586 GB / 
5586 GB avail
   mdsmap e114: 0/0/1 up



[root@ceph0 ~]# ceph health detail
...
pg 2.3 is stuck stale for 10249.230667, current state stale, last acting [5]
...
[root@ceph0 ~]# ceph pg 2.3 query
i don't have pgid 2.3



How can I get all the pgs back or recreated?


Thanks!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to set Object Size/Stripe Width/Stripe Count?

2013-08-08 Thread Da Chun
Hi list,
I saw the info about data striping in 
http://ceph.com/docs/master/architecture/#data-striping .
But couldn't find the way to set these values.


Could you please tell me how to that or give me a link? Thanks!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to start/stop ceph daemons separately?

2013-08-07 Thread Da Chun
On Ubuntu, we can start/stop ceph daemons separately as below:
start ceph-mon id=ceph0
stop ceph-mon id=ceph0


How to do this on Centos or rhel? Thanks!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Which Kernel and QEMU/libvirt version doyourecommend on Ubuntu 12.04 and Centos?

2013-08-01 Thread Da Chun
Thanks! Neil.


If cephfs and krbd are not used, is the default kernel working well with only 
QEMU/kvm/librbd? AFAIK, librbd doesn't have dependency on the kernel version, 
right?


-- Original --
From:  Neil Levineneil.lev...@inktank.com;
Date:  Thu, Aug 1, 2013 11:13 AM
To:  Da Chunng...@qq.com; 
Cc:  ceph-usersceph-users@lists.ceph.com; 
Subject:  Re: [ceph-users] Which Kernel and QEMU/libvirt version doyourecommend 
on Ubuntu 12.04 and Centos?



Yes, default version should work.

Neil



On Wed, Jul 31, 2013 at 7:11 PM, Da Chun ng...@qq.com wrote:
 Thanks! Neil and Wido.


Neil, what about the livirt version on CentOS 6.4? Just use the official 
release?
 

-- Original --
From:  Neil Levineneil.lev...@inktank.com;
 Date:  Thu, Aug 1, 2013 05:53 AM
To:  Da Chunng...@qq.com; 
Cc:  ceph-usersceph-users@lists.ceph.com; 
 Subject:  Re: [ceph-users] Which Kernel and QEMU/libvirt version do 
yourecommend on Ubuntu 12.04 and Centos?



For CentOS 6.4, we have custom qemu packages available at 
http://ceph.com/packages/ceph-extras/rpm/centos6 which will provide RBD support.
 
You will need to install a newer kernel than the one which ships by default 
(2.6.32) to use the cephfs or krbd drivers. Any version above 3.x should be 
sufficient.


For Ubuntu 12.04, as per Wido's comments, use the Ubuntu Cloud Archive to get 
the latest version of all necessary packages.
 


N






On Wed, Jul 31, 2013 at 7:18 AM, Da Chun ng...@qq.com wrote:
 Hi List,


I want to deploy two ceph clusters on ubuntu 12.04 and centos 6.4 separately, 
and test cephfs, krbd, and librbd.
 

Which Kernel and QEMU/libvirt version do you recommend? Any specific patches 
which I should apply manually?

 
Thanks for your time!



___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 







___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Which Kernel and QEMU/libvirt version doyourecommend on Ubuntu 12.04 and Centos?

2013-08-01 Thread Da Chun
Neil,


What's the difference between your custom qemu packages and the official ones?


There are two kinds of packages in it:
qemu-kvm-0.12.1.2-2.355.el6.2.cuttlefish.async.x86_64.rpm
qemu-kvm-0.12.1.2-2.355.el6.2.cuttlefish.x86_64.rpm



What's the difference between them? Does the async version support aio flush?


-- Original --
From:  Neil Levineneil.lev...@inktank.com;
Date:  Thu, Aug 1, 2013 11:13 AM
To:  Da Chunng...@qq.com; 
Cc:  ceph-usersceph-users@lists.ceph.com; 
Subject:  Re: [ceph-users] Which Kernel and QEMU/libvirt version doyourecommend 
on Ubuntu 12.04 and Centos?



Yes, default version should work.

Neil



On Wed, Jul 31, 2013 at 7:11 PM, Da Chun ng...@qq.com wrote:
 Thanks! Neil and Wido.


Neil, what about the livirt version on CentOS 6.4? Just use the official 
release?
 

-- Original --
From:  Neil Levineneil.lev...@inktank.com;
 Date:  Thu, Aug 1, 2013 05:53 AM
To:  Da Chunng...@qq.com; 
Cc:  ceph-usersceph-users@lists.ceph.com; 
 Subject:  Re: [ceph-users] Which Kernel and QEMU/libvirt version do 
yourecommend on Ubuntu 12.04 and Centos?



For CentOS 6.4, we have custom qemu packages available at 
http://ceph.com/packages/ceph-extras/rpm/centos6 which will provide RBD support.
 
You will need to install a newer kernel than the one which ships by default 
(2.6.32) to use the cephfs or krbd drivers. Any version above 3.x should be 
sufficient.


For Ubuntu 12.04, as per Wido's comments, use the Ubuntu Cloud Archive to get 
the latest version of all necessary packages.
 


N






On Wed, Jul 31, 2013 at 7:18 AM, Da Chun ng...@qq.com wrote:
 Hi List,


I want to deploy two ceph clusters on ubuntu 12.04 and centos 6.4 separately, 
and test cephfs, krbd, and librbd.
 

Which Kernel and QEMU/libvirt version do you recommend? Any specific patches 
which I should apply manually?

 
Thanks for your time!



___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 







___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Which Kernel and QEMU/libvirt version do you recommend on Ubuntu 12.04 and Centos?

2013-07-31 Thread Da Chun
Hi List,


I want to deploy two ceph clusters on ubuntu 12.04 and centos 6.4 separately, 
and test cephfs, krbd, and librbd.


Which Kernel and QEMU/libvirt version do you recommend? Any specific patches 
which I should apply manually?


Thanks for your time!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Which Kernel and QEMU/libvirt version do you recommendon Ubuntu 12.04 and Centos?

2013-07-31 Thread Da Chun
Sorry, forgot to mention the ceph version I want to use.


I want to use the latest stable cuttlefish release, 0.61.7 currently.


-- Original --
From:  Da Chunng...@qq.com;
Date:  Wed, Jul 31, 2013 10:18 PM
To:  ceph-usersceph-users@lists.ceph.com; 

Subject:  [ceph-users] Which Kernel and QEMU/libvirt version do you recommendon 
Ubuntu 12.04 and Centos?



Hi List,


I want to deploy two ceph clusters on ubuntu 12.04 and centos 6.4 separately, 
and test cephfs, krbd, and librbd.


Which Kernel and QEMU/libvirt version do you recommend? Any specific patches 
which I should apply manually?


Thanks for your time!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Which Kernel and QEMU/libvirt version do yourecommend on Ubuntu 12.04 and Centos?

2013-07-31 Thread Da Chun
Thanks! Neil and Wido.


Neil, what about the livirt version on CentOS 6.4? Just use the official 
release?


-- Original --
From:  Neil Levineneil.lev...@inktank.com;
Date:  Thu, Aug 1, 2013 05:53 AM
To:  Da Chunng...@qq.com; 
Cc:  ceph-usersceph-users@lists.ceph.com; 
Subject:  Re: [ceph-users] Which Kernel and QEMU/libvirt version do 
yourecommend on Ubuntu 12.04 and Centos?



For CentOS 6.4, we have custom qemu packages available at 
http://ceph.com/packages/ceph-extras/rpm/centos6 which will provide RBD support.
 
You will need to install a newer kernel than the one which ships by default 
(2.6.32) to use the cephfs or krbd drivers. Any version above 3.x should be 
sufficient.


For Ubuntu 12.04, as per Wido's comments, use the Ubuntu Cloud Archive to get 
the latest version of all necessary packages.
 


N






On Wed, Jul 31, 2013 at 7:18 AM, Da Chun ng...@qq.com wrote:
 Hi List,


I want to deploy two ceph clusters on ubuntu 12.04 and centos 6.4 separately, 
and test cephfs, krbd, and librbd.
 

Which Kernel and QEMU/libvirt version do you recommend? Any specific patches 
which I should apply manually?

 
Thanks for your time!



___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph fio read test hangs

2013-07-17 Thread Da Chun
On Ubuntu 13.04, ceph 0.61.4.


I was running an fio read test as below, then it hung:
root@ceph-node2:/mnt# fio -filename=/dev/rbd1 -direct=1 -iodepth 1 -thread 
-rw=read -ioengine=psync -bs=4k -size=50G -numjobs=16 -group_reporting 
-name=mytest 
mytest: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=psync, iodepth=1
...
mytest: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=psync, iodepth=1
2.0.8
Starting 16 threads
^Cbs: 16 (f=16): [] [0.1% done] [0K/0K /s] [0 /0  iops] [eta 
02d:01h:34m:39s]   
fio: terminating on signal 2
^Cbs: 16 (f=16): [] [0.1% done] [0K/0K /s] [0 /0  iops] [eta 
02d:18h:36m:23s]
fio: terminating on signal 2
Jobs: 16 (f=16): [] [0.1% done] [0K/0K /s] [0 /0  iops] [eta 
04d:07h:40m:55s]



The top command shown that one cpu was waiting for disk IO, and the other was 
idle:
top - 20:28:30 up 1 day,  6:02,  3 users,  load average: 16.00, 13.91, 8.55
Tasks: 141 total,   1 running, 139 sleeping,   0 stopped,   1 zombie
%Cpu0  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  0.3 us,  0.3 sy,  0.0 ni,  0.0 id, 99.3 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   4013924 total,   702112 used,  3311812 free, 3124 buffers
KiB Swap:  3903484 total,   184520 used,  3718964 free,74156 cached



root@ceph-node4:~# ceph -s
   health HEALTH_OK
   monmap e5: 3 mons at 
{ceph-node0=172.18.11.30:6789/0,ceph-node2=172.18.11.32:6789/0,ceph-node4=172.18.11.34:6789/0},
 election epoch 714, quorum 0,1,2 ceph-node0,ceph-node2,ceph-node4
   osdmap e4043: 11 osds: 11 up, 11 in
pgmap v92429: 1192 pgs: 1192 active+clean; 530 GB data, 1090 GB used, 9041 
GB / 10131 GB avail
   mdsmap e1: 0/0/1 up



Nothing error found in the ceph.log.


Anything else I can collect for investigation?___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Should the disk write cache be disabled?

2013-07-17 Thread Da Chun
Do you mean the write barrier?
So all ceph disk partitions are mounted with barrier=1?


-- Original --
From:  Gregory Farnumg...@inktank.com;
Date:  Wed, Jul 17, 2013 00:29 AM
To:  Da Chunng...@qq.com; 
Cc:  ceph-usersceph-users@lists.ceph.com; 
Subject:  Re: [ceph-users] Should the disk write cache be disabled?



Just old kernels, as they didn't correctly provide all the barriers
and other ordering constraints necessary for the write cache to be
used safely.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Jul 16, 2013 at 9:20 AM, Da Chun ng...@qq.com wrote:
 In this doc,
 http://ceph.com/docs/master/rados/configuration/filesystem-recommendations/

 It says,

 Ceph aims for data safety, which means that when the Ceph Client receives
 notice that data was written to a storage drive, that data was actually
 written to the storage drive. For old kernels (2.6.33), disable the write
 cache if the journal is on a raw drive. Newer kernels should work fine.

 Use hdparm to disable write caching on the hard disk:

 sudo hdparm -W 0 /dev/hda

 Does it mean the disk write cache should be always disabled for ceph, or
 just when using the old kernel (2.6.33)?

 Thanks for your time!


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Should the disk write cache be disabled?

2013-07-16 Thread Da Chun
In this doc,
http://ceph.com/docs/master/rados/configuration/filesystem-recommendations/


It says,

Ceph aims for data safety, which means that when the Ceph Client receives 
notice that data was written to a storage drive, that data was actually written 
to the storage drive. For old kernels (2.6.33), disable the write cache if the 
journal is on a raw drive. Newer kernels should work fine.

Use hdparm to disable write caching on the hard disk:
sudo hdparm -W 0 /dev/hda

Does it mean the disk write cache should be always disabled for ceph, or just 
when using the old kernel (2.6.33)?


Thanks for your time!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph iscsi questions

2013-06-22 Thread Da Chun
Kurt, do you have performance benchmark data for the tgt target?


I ran a simple benchmark for LIO iSCIS target. The ceph cluster is with default 
settings.
The read performance is good. But the write performance is very poor from my 
point of view.


Performance of mapped kernel rbd:
root@ceph-observer:/mnt/fs2# echo 3 | sudo tee /proc/sys/vm/drop_caches  sudo 
sync
3
root@ceph-observer:/mnt/fs2# dd bs=1M count=1024  if=/dev/zero of=test 
conv=fdatasync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 10.0333 s, 107 MB/s
root@ceph-observer:/mnt/fs2# echo 3 | sudo tee /proc/sys/vm/drop_caches  sudo 
sync
3
root@ceph-observer:/mnt/fs2# dd if=test  of=/dev/null   bs=1M
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 10.0018 s, 107 MB/s



Performance of LIO iSCSI target, mapped kernel rbd:
root@ceph-observer:/mnt/fs3# echo 3 | sudo tee /proc/sys/vm/drop_caches  sudo 
sync
3
root@ceph-observer:/mnt/fs3# dd bs=1M count=1024  if=/dev/zero of=test 
conv=fdatasync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 21.3096 s, 50.4 MB/s
root@ceph-observer:/mnt/fs3# echo 3 | sudo tee /proc/sys/vm/drop_caches  sudo 
sync
3
root@ceph-observer:/mnt/fs3# dd if=test  of=/dev/null   bs=1M
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 9.70467 s, 102 MB/s







-- Original --
From:  Kurt Bauerkurt.ba...@univie.ac.at;
Date:  Tue, Jun 18, 2013 08:38 PM
To:  Da Chunng...@qq.com; 
Cc:  ceph-usersceph-users@lists.ceph.com; 
Subject:  Re: [ceph-users] ceph iscsi questions



 
 
 Da Chun schrieb: 

   Thanks for sharing! Kurt.


Yes. I  have read the article you mentioned. But I also read another one: 
http://www.hastexo.com/resources/hints-and-kinks/turning-ceph-rbd-images-san-storage-devices.
   It uses LIO, which is the current standard Linux kernel SCSI target.

  
 That has a major disadvantage, which is, that you have to use the kernel  rbd 
module, which is not feature equivalent to ceph userland code, at  least in 
kernel-versions which are shipped with recent distributions.
 
 

There is another doc in the ceph  site: 
http://ceph.com/w/index.php?title=ISCSIredirect=no

  Quite outdated I think, last update nearly 3 years ago, I don't  understand 
what the box in the middle should depict.
 
 I don't quite understand how the multi path works  here. Are  the two ISCSI 
targets on the same system or two different ones?
Has  anybody tried this already?



 Leen has illustrated that quite well.
 
 -- Original --
From:  Kurt  Bauerkurt.ba...@univie.ac.at;
Date:   Tue, Jun  18, 2013 03:52 PM
To:  Da Chunng...@qq.com;  
Cc:   ceph-usersceph-users@lists.ceph.com;  
Subject:   Re: [ceph-users] ceph iscsi questions



 Hi,
 
 
 Da Chun schrieb: Hi List,   

   I want to deploy a ceph cluster with latest cuttlefish, and  export it with 
iscsi interface to my applications.
   Some questions here:
   1. Which Linux distro and release  would you recommend? I used Ubuntu 13.04 
for testing purpose  before.
 For the ceph-cluster or the iSCSI-GW? We use Ubuntu 12.04 LTS for the  
cluster and the iSCSI-GW, but tested Debian wheezy as iSCSI-GW too. Both  work 
flawless.
 2. Which iscsi target is better? LIO, SCST, or others?
 Have you read http://ceph.com/dev-notes/adding-support-for-rbd-to-stgt/  ?  
That's what we do and it works without problems so far.
 
 3. The system for the iscsi target will be a single  point of  failure. How to 
eliminate it and make good use of ceph's nature of  distribution?
 That's a question we asked aourselves too. In theory one can set up 2  
iSCSI-GW and use multipath but what does that do to the cluster? Will  smth. 
break if 2 iSCSI targets use the same rbd image in the cluster?  Even if I use 
failover-mode only?
 
 Has someone already tried this and is willing to share their knowledge?
 
 Best regards,
 Kurt
 
 

   Thanks!
   ___ ceph-users mailing list 
ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Possible to bind one osd with a specific network adapter?

2013-06-21 Thread Da Chun
Hi List,
Each of my osd nodes has 5 network Gb adapters, and has many osds, one disk one 
osd. They are all connected with a Gb switch.
Currently I can get an average 100MB/s of read/write speed. To improve the 
throughput further, the network bandwidth will be the bottleneck, right?


I can't afford to replace all the adapters and switch with 10Gb ones. How can I 
improve the throughput based on current gears?


My first thought is to use bonding as we have multiple adapters. But bonding 
has performance cost, surely cannot multiplex the throughput. And it has 
dependency on the switch.


My second thought is to group the adapters and osds. For example, we have three 
adapters called A1, A2, A3, and 6 osds called O1, O2,..., O6. let O1  O2 use 
A1 exclusively, O3  O4 use A2 exclusively, O5  O6 use A3 exclusively. So they 
are separated groups, each group has its own disks, adapters, which are not 
shared. Only CPU  memory resource is shared between groups.


Is it possible to do this with current ceph implementation?


Thanks for your time and any ideas!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Possible to bind one osd with a specific networkadapter?

2013-06-21 Thread Da Chun
James,
Thank you??
No, I have not separated the public and cluster yet. They are on the same 
switch. As I don't have many nodes now, the switch won't be the bottleneck 
currently.


-- Original --
From:  James Harperjames.har...@bendigoit.com.au;
Date:  Sat, Jun 22, 2013 12:41 PM
To:  Da Chunng...@qq.com; ceph-usersceph-users@lists.ceph.com; 

Subject:  RE: [ceph-users] Possible to bind one osd with a specific 
networkadapter?



 
 Hi List,
 Each of my osd nodes has 5 network Gb adapters, and has many osds, one
 disk one osd. They are all connected with a Gb switch.
 Currently I can get an average 100MB/s of read/write speed. To improve the
 throughput further, the network bandwidth will be the bottleneck, right?

Do you already have separate networks for public and cluster?

 
 I can't afford to replace all the adapters and switch with 10Gb ones. How can 
 I
 improve the throughput based on current gears?
 
 My first thought is to use bonding as we have multiple adapters. But bonding
 has performance cost, surely cannot multiplex the throughput. And it has
 dependency on the switch.

LACP bonding should be okay. Each connection will only be 1gbit/second but if 
you have multiple clients and multiple connections you could see improved 
performance.

If you want to use plain round robin ordering at layer two, play with 
net/ipv4/tcp_reordering value to improve things. I do this and iperf gives me 
2gbit/second throughput, but with an increase in cpu of course.

 My second thought is to group the adapters and osds. For example, we have
 three adapters called A1, A2, A3, and 6 osds called O1, O2,..., O6. let O1  
 O2
 use A1 exclusively, O3  O4 use A2 exclusively, O5  O6 use A3 exclusively.
 So they are separated groups, each group has its own disks, adapters, which
 are not shared. Only CPU  memory resource is shared between groups.
 

I tested something similar. Each server has two disks and 2 adapters assigned 
to the cluster network. Each adapter is on a different subnet. As long as each 
osd can reach each ip address (because it has adapters on both networks) it 
should be fine and is probably better than bonding.

Actual multipath would be nice for the public network, but LACP should give you 
an aggregate increase even if individual connections are still limited to the 
adapter link speed.

James___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to change the journal size at run time?

2013-06-20 Thread Da Chun
Hi List,
The default journal size is 1G, which I think is too small for my Gb network. I 
want to extend all the journal partitions to 2 or 4G. How can I do that? The 
osds were all created by commands like ceph-deploy osd create 
ceph-node0:/dev/sdb. The journal partition is on the same disk together with 
the corresponding data partition.
I notice there is an attribute osd journal size which value is 1024. I guess 
this is why the command ceph-deploy osd create set the journal partition size 
as 1G.


I want to do this job using steps as below:
1. Change the osd journal size in the ceph.conf to 4G
2. Remove the osd
3. Readd the osd
4. Repeat 2 and 3 steps for all the osds.


This needs lots of manual work and is time consuming. Are there better ways to 
do that? Thanks!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Error in ceph.conf when creating cluster with IP addresses as monitors

2013-06-19 Thread Da Chun
ceph@ceph-node0:~/test$ ceph-deploy new 172.18.11.30 172.18.11.32 172.18.11.34
ceph@ceph-node0:~/test$ cat ceph.conf
[global]
fsid = caf39355-bd8f-450e-b026-6001607e62cf
mon initial members = 172, 172, 172
mon host = 172.18.11.30,172.18.11.32,172.18.11.34
auth supported = cephx
osd journal size = 1024
filestore xattr use omap = true



The IP addresses have been truncated. Is it by design not to use IP address as 
a name?
My ceph-deploy is the latest one: ceph-deploy_1.0-1_all.deb .


BTW, is there a version attribute in ceph-deploy for printing version info?___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Repository Mirroring

2013-06-19 Thread Da Chun
+1.
I use apt-mirror to do this now. 


-- Original --
From:  John Nielsenli...@jnielsen.net;
Date:  Thu, Jun 20, 2013 00:21 AM
To:  Joe Rynerjry...@cait.org; 
Cc:  ceph-usersceph-users@lists.ceph.com; 
Subject:  Re: [ceph-users] Repository Mirroring



On Jun 18, 2013, at 12:08 PM, Joe Ryner jry...@cait.org wrote:

 I would like to make a local mirror or your yum repositories.  Do you support 
 any of the standard methods of syncing aka rsync?

+1. Our Ceph boxes are firewalled from the Internet at large and installing 
from a local mirror is faster and simpler than trying to go through a proxy. 
I'm currently using wget but that's not very friendly since the web server 
hosting the repo doesn't issue last-modified headers so I end up downloading 
everything every time.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph iscsi questions

2013-06-18 Thread Da Chun
Thanks for sharing! Kurt.


Yes. I have read the article you mentioned. But I also read another one: 
http://www.hastexo.com/resources/hints-and-kinks/turning-ceph-rbd-images-san-storage-devices.
  It uses LIO, which is the current standard Linux kernel SCSI target.


There is another doc in the ceph site: 
http://ceph.com/w/index.php?title=ISCSIredirect=no
I don't quite understand how the multi path works here. Are the two ISCSI 
targets on the same system or two different ones?
Has anybody tried this already?


-- Original --
From:  Kurt Bauerkurt.ba...@univie.ac.at;
Date:  Tue, Jun 18, 2013 03:52 PM
To:  Da Chunng...@qq.com; 
Cc:  ceph-usersceph-users@lists.ceph.com; 
Subject:  Re: [ceph-users] ceph iscsi questions



 Hi,
 
 
 Da Chun schrieb: Hi List,   

   I want to deploy a ceph cluster with latest cuttlefish, and  export it with 
iscsi interface to my applications.
   Some questions here:
   1. Which Linux distro and release  would you recommend? I used Ubuntu 13.04 
for testing purpose  before.
  For the ceph-cluster or the iSCSI-GW? We use Ubuntu 12.04 LTS for the  
cluster and the iSCSI-GW, but tested Debian wheezy as iSCSI-GW too. Both  work 
flawless.
2. Which iscsi target is better? LIO, SCST, or others?
  Have you read http://ceph.com/dev-notes/adding-support-for-rbd-to-stgt/ ?  
That's what we do and it works without problems so far.
 
3. The system for the iscsi target will be a single point of  failure. How 
to eliminate it and make good use of ceph's nature of  distribution?
  That's a question we asked aourselves too. In theory one can set up 2  
iSCSI-GW and use multipath but what does that do to the cluster? Will  smth. 
break if 2 iSCSI targets use the same rbd image in the cluster?  Even if I use 
failover-mode only?
 
 Has someone already tried this and is willing to share their knowledge?
 
 Best regards,
 Kurt
 


   Thanks!
   ___ ceph-users mailing list 
ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to remove /var/lib/ceph/osd/ceph-2?

2013-06-18 Thread Da Chun
Thanks! Craig.
umount works.


About the time skew, I saw the log said the time difference should be less than 
50ms. I setup one of my nodes as the time server, and the others sync the time 
with it. I don't know why the system time still changes frequently especially 
after reboot. Maybe it's because all my nodes are VMware virtual machines. The 
softclock is not accurate enough.


-- Original --
From:  Craig Lewiscle...@centraldesktop.com;
Date:  Tue, Jun 18, 2013 05:34 AM
To:  ceph-usersceph-users@lists.ceph.com; 

Subject:  Re: [ceph-users] How to remove /var/lib/ceph/osd/ceph-2?



   If you followed the standard setup,   each OSD is it's 
own disk + filesystem.  /var/lib/ceph/osd/ceph-2   is in use, as the mount 
point for the OSD.2 filesystem.  Double   check by examining the output of 
the `mount` command.
   
   I get the same error when I try to rename a directory that's used   
as a mount point.
   
   Try `umount /var/lib/ceph/osd/ceph-2` instead of the mv and rm.
The fuser command is telling you that the kernel has a filesystem   mounted 
in that directory.  Nothing else appears to be using it,   so the umount 
should complete successfully.
   
   
   Also, you should fix that time skew on mon.ceph-node5.  The   
mailing list archives have several posts with good answers.
   
   
   On 6/15/2013 2:14 AM, Da Chun wrote:
 
Hi all,
   On Ubuntu 13.04 with ceph 0.61.3.
   I want to remove osd.2 from my cluster. The following steps were 
performed:
root@ceph-node6:~# ceph osd out osd.2
 marked out osd.2.
 root@ceph-node6:~# ceph -w
health HEALTH_WARN clock skew detected on mon.ceph-node5
monmap e1: 3 mons at 
{ceph-node4=172.18.46.34:6789/0,ceph-node5=172.18.46.35:6789/0,ceph-node6=172.18.46.36:6789/0},
 election epoch 124, quorum 0,1,2 
ceph-node4,ceph-node5,ceph-node6
osdmap e414: 6 osds: 5 up, 5 in
 pgmap v10540: 456 pgs: 456 active+clean; 12171 MB 
data, 24325 MB used, 50360 MB / 74685 MB avail
mdsmap e102: 1/1/1 up {0=ceph-node4=up:active}
 
   
 2013-06-15 16:55:22.096059 mon.0 [INF] pgmap v10540: 456 
pgs: 456 active+clean; 12171 MB data, 24325 MB used, 50360 MB / 
74685 MB avail
 ^C
 root@ceph-node6:~# stop ceph-osd id=2
 ceph-osd stop/waiting
 root@ceph-node6:~# ceph osd crush remove osd.2
 removed item id 2 name 'osd.2' from crush map
 root@ceph-node6:~# ceph auth del osd.2
 updated
 root@ceph-node6:~# ceph osd rm 2
 removed osd.2
 root@ceph-node6:~# mv /var/lib/ceph/osd/ceph-2 
/var/lib/ceph/osd/ceph-2.bak
 mv: cannot move ??/var/lib/ceph/osd/ceph-2?? to
 ??/var/lib/ceph/osd/ceph-2.bak??: Device or resource busy
   
   
   
   Everything was working OK until the last step to remove the 
osd.2 directory /var/lib/ceph/osd/ceph-2.
root@ceph-node6:~# fuser -v /var/lib/ceph/osd/ceph-2
  USERPID ACCESS COMMAND
 /var/lib/ceph/osd/ceph-2:
  root kernel mount 
/var/lib/ceph/osd/ceph-2   // What does this mean?
 root@ceph-node6:~# lsof +D /var/lib/ceph/osd/ceph-2
 root@ceph-node6:~#
   
   
   
   I restarted the system, and found that the osd.2 daemon was 
still running:
root@ceph-node6:~# ps aux | grep osd
 root  1264  1.4 12.3 550940 125732 ?   Ssl  16:41  
 0:20 /usr/bin/ceph-osd --cluster=ceph -i 2 -f
 root  2876  0.0  0.0   4440   628 ?Ss   16:44  
 0:00 /bin/sh -e -c /usr/bin/ceph-osd --cluster=${cluster:-ceph} 
-i $id -f /bin/sh
 root  2877  4.9 18.2 613780 185676 ?   Sl   16:44  
 1:04 /usr/bin/ceph-osd --cluster=ceph -i 5 -f
   
   
   
   I have to take this workaround:
root@ceph-node6:~# rm -rf /var/lib/ceph/osd/ceph-2
 rm: cannot remove ??/var/lib/ceph/osd/ceph-2??: Device or 
resource busy
 root@ceph-node6:~# ls /var/lib/ceph/osd/ceph-2
 root@ceph-node6:~# shutdown -r now
   
   
root@ceph-node6:~# ps aux | grep osd
 root  1416  0.0  0.0   4440   628 ?Ss   17:10  
 0:00 /bin/sh -e -c /usr/bin/ceph-osd --cluster=${cluster:-ceph} 
-i $id -f /bin/sh
 root  1417  8.9  5.8 468052 59868 ?Sl   17:10  
 0:02

Re: [ceph-users] Live Migrations with cephFS

2013-06-17 Thread Da Chun
OpenStack grizzly VM can be started on rbd(0.61.3) with no problem.
I didn't try live migration though.


-- Original --
From:  Wolfgang Hennerbichlerwolfgang.hennerbich...@risc-software.at;
Date:  Mon, Jun 17, 2013 02:00 PM
To:  ceph-usersceph-users@lists.ceph.com; 

Subject:  Re: [ceph-users] Live Migrations with cephFS





On 06/14/2013 08:00 PM, Ilja Maslov wrote:
 Hi,
 
 Is live migration supported with RBD and KVM/OpenStack?
 Always wanted to know but was afraid to ask :)

totally works in my productive setup. but we don't use openstack in this
installation, just KVM/RBD.

 Pardon brevity and formatting, replying from the phone.
 
 Cheers,
 Ilja
 
 
 Robert Sander r.san...@heinlein-support.de wrote:
 
 
 On 14.06.2013 12:55, Alvaro Izquierdo Jimeno wrote:
 
 By default, openstack uses NFS but… other options are available….can we
 use cephFS instead of NFS?
 
 Wouldn't you use qemu-rbd for your virtual guests in OpenStack?
 AFAIK CephFS is not needed for KVM/qemu virtual machines.
 
 Regards
 --
 Robert Sander
 Heinlein Support GmbH
 Schwedter Str. 8/9b, 10119 Berlin
 
 http://www.heinlein-support.de
 
 Tel: 030 / 405051-43
 Fax: 030 / 405051-19
 
 Zwangsangaben lt. §35a GmbHG:
 HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
 Geschäftsführer: Peer Heinlein -- Sitz: Berlin
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 This email and any files transmitted with it are confidential and intended 
 solely for the use of the individual or entity to whom they are addressed. If 
 you are not the intended recipient, please note that any review, 
 dissemination, disclosure, alteration, printing, circulation, retention or 
 transmission of this e-mail and/or any file or attachment transmitted with 
 it, is prohibited and may be unlawful. If you have received this e-mail or 
 any file or attachment transmitted with it in error please notify 
 postmas...@openet.com. Although Openet has taken reasonable precautions to 
 ensure no viruses are present in this email, we cannot accept responsibility 
 for any loss or damage arising from the use of this email or attachments.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


-- 
DI (FH) Wolfgang Hennerbichler
Software Development
Unit Advanced Computing Technologies
RISC Software GmbH
A company of the Johannes Kepler University Linz

IT-Center
Softwarepark 35
4232 Hagenberg
Austria

Phone: +43 7236 3343 245
Fax: +43 7236 3343 250
wolfgang.hennerbich...@risc-software.at
http://www.risc-software.at
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph iscsi questions

2013-06-17 Thread Da Chun
Hi List,

I want to deploy a ceph cluster with latest cuttlefish, and export it with 
iscsi interface to my applications.
Some questions here:
1. Which Linux distro and release would you recommend? I used Ubuntu 13.04 for 
testing purpose before.
2. Which iscsi target is better? LIO, SCST, or others?
3. The system for the iscsi target will be a single point of failure. How to 
eliminate it and make good use of ceph's nature of distribution?


Thanks!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to remove /var/lib/ceph/osd/ceph-2?

2013-06-15 Thread Da Chun
Hi all,
On Ubuntu 13.04 with ceph 0.61.3.
I want to remove osd.2 from my cluster. The following steps were performed:
root@ceph-node6:~# ceph osd out osd.2
marked out osd.2.
root@ceph-node6:~# ceph -w
   health HEALTH_WARN clock skew detected on mon.ceph-node5
   monmap e1: 3 mons at 
{ceph-node4=172.18.46.34:6789/0,ceph-node5=172.18.46.35:6789/0,ceph-node6=172.18.46.36:6789/0},
 election epoch 124, quorum 0,1,2 ceph-node4,ceph-node5,ceph-node6
   osdmap e414: 6 osds: 5 up, 5 in
pgmap v10540: 456 pgs: 456 active+clean; 12171 MB data, 24325 MB used, 
50360 MB / 74685 MB avail
   mdsmap e102: 1/1/1 up {0=ceph-node4=up:active}


2013-06-15 16:55:22.096059 mon.0 [INF] pgmap v10540: 456 pgs: 456 active+clean; 
12171 MB data, 24325 MB used, 50360 MB / 74685 MB avail
^C
root@ceph-node6:~# stop ceph-osd id=2
ceph-osd stop/waiting
root@ceph-node6:~# ceph osd crush remove osd.2
removed item id 2 name 'osd.2' from crush map
root@ceph-node6:~# ceph auth del osd.2
updated
root@ceph-node6:~# ceph osd rm 2
removed osd.2
root@ceph-node6:~# mv /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2.bak
mv: cannot move ‘/var/lib/ceph/osd/ceph-2’ to ‘/var/lib/ceph/osd/ceph-2.bak’: 
Device or resource busy



Everything was working OK until the last step to remove the osd.2 directory 
/var/lib/ceph/osd/ceph-2.
root@ceph-node6:~# fuser -v /var/lib/ceph/osd/ceph-2
 USERPID ACCESS COMMAND
/var/lib/ceph/osd/ceph-2:
 root kernel mount /var/lib/ceph/osd/ceph-2   
// What does this mean?
root@ceph-node6:~# lsof +D /var/lib/ceph/osd/ceph-2
root@ceph-node6:~#



I restarted the system, and found that the osd.2 daemon was still running:
root@ceph-node6:~# ps aux | grep osd
root  1264  1.4 12.3 550940 125732 ?   Ssl  16:41   0:20 
/usr/bin/ceph-osd --cluster=ceph -i 2 -f
root  2876  0.0  0.0   4440   628 ?Ss   16:44   0:00 /bin/sh -e -c 
/usr/bin/ceph-osd --cluster=${cluster:-ceph} -i $id -f /bin/sh
root  2877  4.9 18.2 613780 185676 ?   Sl   16:44   1:04 
/usr/bin/ceph-osd --cluster=ceph -i 5 -f



I have to take this workaround:
root@ceph-node6:~# rm -rf /var/lib/ceph/osd/ceph-2
rm: cannot remove ‘/var/lib/ceph/osd/ceph-2’: Device or resource busy
root@ceph-node6:~# ls /var/lib/ceph/osd/ceph-2
root@ceph-node6:~# shutdown -r now


root@ceph-node6:~# ps aux | grep osd
root  1416  0.0  0.0   4440   628 ?Ss   17:10   0:00 /bin/sh -e -c 
/usr/bin/ceph-osd --cluster=${cluster:-ceph} -i $id -f /bin/sh
root  1417  8.9  5.8 468052 59868 ?Sl   17:10   0:02 
/usr/bin/ceph-osd --cluster=ceph -i 5 -f

root@ceph-node6:~# rm -r /var/lib/ceph/osd/ceph-2
root@ceph-node6:~#



Any idea? HELP!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Failed to stop osd.

2013-06-14 Thread Da Chun
On Ubuntu 13.04 with ceph 0.61.3 .
I'm trying to remove one of the osds according to this guide:
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#stopping-the-osd


First took it out from the cluster:
root@ceph-node4:~# ceph osd out osd.0
root@ceph-node4:~# ceph -w/// wait until the recovery finished.


Then stop the daemon:
root@ceph-node4:~# /etc/init.d/ceph stop osd.0
/etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph 
defines )



The directory tree /var/lib/ceph is as below:
root@ceph-node4:~# tree -L 2 /var/lib/ceph/
/var/lib/ceph/
├── bootstrap-mds
│   └── ceph.keyring
├── bootstrap-osd
│   └── ceph.keyring
├── mds
│   └── ceph-ceph-node4
├── mon
│   └── ceph-ceph-node4
├── osd
│   ├── ceph-0
│   └── ceph-3
└── tmp



It seems there are some mismatch of naming.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Failed to stop osd.

2013-06-14 Thread Da Chun
Finally managed to stop it by:
stop ceph-osd id=0



-- Original --
From:  Da Chunng...@qq.com;
Date:  Fri, Jun 14, 2013 05:00 PM
To:  ceph-usersceph-users@lists.ceph.com; 

Subject:  [ceph-users] Failed to stop osd.



On Ubuntu 13.04 with ceph 0.61.3 .
I'm trying to remove one of the osds according to this guide:
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#stopping-the-osd


First took it out from the cluster:
root@ceph-node4:~# ceph osd out osd.0
root@ceph-node4:~# ceph -w/// wait until the recovery finished.


Then stop the daemon:
root@ceph-node4:~# /etc/init.d/ceph stop osd.0
/etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph 
defines )



The directory tree /var/lib/ceph is as below:
root@ceph-node4:~# tree -L 2 /var/lib/ceph/
/var/lib/ceph/
├── bootstrap-mds
│   └── ceph.keyring
├── bootstrap-osd
│   └── ceph.keyring
├── mds
│   └── ceph-ceph-node4
├── mon
│   └── ceph-ceph-node4
├── osd
│   ├── ceph-0
│   └── ceph-3
└── tmp



It seems there are some mismatch of naming.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy osd create hangs

2013-06-14 Thread Da Chun
On Ubuntu 13.04 with ceph 0.61.3 .
It hangs when creating a new osd using ceph-deploy.


ceph@ceph-node4:~/mycluster$ ceph-deploy disk zap ceph-node4:sdd
ceph@ceph-node4:~/mycluster$ ceph-deploy disk zap ceph-node4:sdb
ceph@ceph-node4:~/mycluster$ ceph-deploy osd create ceph-node4:sdb:sdd
^CTraceback (most recent call last):
  File /home/ceph/ceph-deploy/ceph-deploy, line 9, in module
load_entry_point('ceph-deploy==0.1', 'console_scripts', 'ceph-deploy')()
  File /home/ceph/ceph-deploy/ceph_deploy/cli.py, line 112, in main
return args.func(args)
  File /home/ceph/ceph-deploy/ceph_deploy/osd.py, line 425, in osd
prepare(args, cfg, activate_prepared_disk=True)
  File /home/ceph/ceph-deploy/ceph_deploy/osd.py, line 265, in prepare
dmcrypt_dir=args.dmcrypt_key_dir,
  File 
/home/ceph/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/proxy.py,
 line 255, in lambda
(conn.operator(type_, self, args, kwargs))
  File 
/home/ceph/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/connection.py,
 line 66, in operator
return self.send_request(type_, (object, args, kwargs))
  File 
/home/ceph/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/baseconnection.py,
 line 315, in send_request
m = self.__waitForResponse(handler)
  File 
/home/ceph/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/baseconnection.py,
 line 412, in __waitForResponse
self.__processing_condition.wait()
  File /usr/lib/python2.7/threading.py, line 339, in wait
waiter.acquire()




ps aux | grep ceph
ceph  4015  0.0  1.1 118412 11404 pts/1Sl+  20:51   0:00 
/home/ceph/ceph-deploy/virtualenv/bin/python /home/ceph/ceph-deploy/ceph-deploy 
osd create ceph-node4:sdb:sdd
root  4043  0.0  0.0      628 pts/1S+   20:51   0:00 /bin/sh 
/usr/sbin/ceph-disk-prepare -- /dev/sdb /dev/sdd
root  4049  0.1  0.9  43216  9876 pts/1S+   20:51   0:00 
/usr/bin/python /usr/sbin/ceph-disk prepare -- /dev/sdb /dev/sdd



Any idea? Thanks!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy osd create hangs

2013-06-14 Thread Da Chun
OK. I've fixed it.


Previously I ran ceph-deploy osd create ceph-node4:sdb by mistake. I 
terminated it by Ctrl-c. Therefore the lock on 
/var/lib/ceph/tmp/ceph-disk.prepare.lock.lock was not released.
So the next ceph-deploy osd create was hanging waiting for the lock.


It's a user error, but not easy to be located.


To avoid this problem, maybe we can catch SIGINT in  the command ceph-disk:
import signal
import sys
def signal_handler(signal, frame):
prepare_lock.release()
sys.exit(0)

signal.signal(signal.SIGINT, signal_handler)



Or at least, for better problem determination, IMHO, a meaningful error message 
should be prompted by ceph-deploy osd prepare instead of running until hang.


-- Original --
From:  Da Chunng...@qq.com;
Date:  Fri, Jun 14, 2013 09:13 PM
To:  ceph-usersceph-users@lists.ceph.com; 

Subject:  [ceph-users] ceph-deploy osd create hangs



On Ubuntu 13.04 with ceph 0.61.3 .
It hangs when creating a new osd using ceph-deploy.


ceph@ceph-node4:~/mycluster$ ceph-deploy disk zap ceph-node4:sdd
ceph@ceph-node4:~/mycluster$ ceph-deploy disk zap ceph-node4:sdb
ceph@ceph-node4:~/mycluster$ ceph-deploy osd create ceph-node4:sdb:sdd
^CTraceback (most recent call last):
  File /home/ceph/ceph-deploy/ceph-deploy, line 9, in module
load_entry_point('ceph-deploy==0.1', 'console_scripts', 'ceph-deploy')()
  File /home/ceph/ceph-deploy/ceph_deploy/cli.py, line 112, in main
return args.func(args)
  File /home/ceph/ceph-deploy/ceph_deploy/osd.py, line 425, in osd
prepare(args, cfg, activate_prepared_disk=True)
  File /home/ceph/ceph-deploy/ceph_deploy/osd.py, line 265, in prepare
dmcrypt_dir=args.dmcrypt_key_dir,
  File 
/home/ceph/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/proxy.py,
 line 255, in lambda
(conn.operator(type_, self, args, kwargs))
  File 
/home/ceph/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/connection.py,
 line 66, in operator
return self.send_request(type_, (object, args, kwargs))
  File 
/home/ceph/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/baseconnection.py,
 line 315, in send_request
m = self.__waitForResponse(handler)
  File 
/home/ceph/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/baseconnection.py,
 line 412, in __waitForResponse
self.__processing_condition.wait()
  File /usr/lib/python2.7/threading.py, line 339, in wait
waiter.acquire()




ps aux | grep ceph
ceph  4015  0.0  1.1 118412 11404 pts/1Sl+  20:51   0:00 
/home/ceph/ceph-deploy/virtualenv/bin/python /home/ceph/ceph-deploy/ceph-deploy 
osd create ceph-node4:sdb:sdd
root  4043  0.0  0.0      628 pts/1S+   20:51   0:00 /bin/sh 
/usr/sbin/ceph-disk-prepare -- /dev/sdb /dev/sdd
root  4049  0.1  0.9  43216  9876 pts/1S+   20:51   0:00 
/usr/bin/python /usr/sbin/ceph-disk prepare -- /dev/sdb /dev/sdd



Any idea? Thanks!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why does ceph need a filesystem (was Simulating DiskFailure)

2013-06-14 Thread Da Chun
I have the same question too. I know Ceph was based on a simple fs of its own 
years ago.
I'd like to hear some more details.


-- Original --
From:  James Harperjames.har...@bendigoit.com.au;
Date:  Sat, Jun 15, 2013 11:07 AM
To:  Gregory Farnumg...@inktank.com; Craig 
Lewiscle...@centraldesktop.com; 
Cc:  ceph-us...@ceph.comceph-us...@ceph.com; 
Subject:  [ceph-users] Why does ceph need a filesystem (was Simulating 
DiskFailure)



 
 Yeah. You've picked up on some warty bits of Ceph's error handling here for
 sure, but it's exacerbated by the fact that you're not simulating what you
 think. In a real disk error situation the filesystem would be returning EIO or
 something, but here it's returning ENOENT. Since the OSD is authoritative for
 that key space and the filesystem says there is no such object, presto! It
 doesn't exist.
 If you restart the OSD it does a scan of the PGs on-disk as well as what it
 should have, and can pick up on the data not being there and recover. But
 correctly handling data that has been (from the local FS' perspective)
 properly deleted under a running process would require huge and expensive
 contortions on the part of the daemon (in any distributed system that I can
 think of).
 -Greg
 

Why was the decision made for ceph to require an underlying filesystem, rather 
than direct access to disk (like drbd does)?

All of my recent disk failures have been unrecoverable read errors (pending 
sector in SMART stats), which are easy enough to repair in the short term just 
by rewriting with a known good copy of the data (assuming that there isn't some 
other underlying cause and this was just a power-off-at-the-wrong-moment 
error). Unfortunately because of the disconnect between ceph and the LBA this 
can't be done by ceph.

Just curious...

Thanks

James
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph mount: Only 240 GB , should be 60TB

2013-06-13 Thread Da Chun
Sage,


I have the same issue with ceph 0.61.3 on Ubuntu 13.04.


ceph@ceph-node4:~/mycluster$ df -h
Filesystem   Size  Used Avail Use% Mounted on
/dev/mapper/ubuntu1304--64--vg-root   15G  1.5G   13G  11% /
none 4.0K 0  4.0K   0% /sys/fs/cgroup
udev 487M  4.0K  487M   1% /dev
tmpfs100M  284K  100M   1% /run
none 5.0M 0  5.0M   0% /run/lock
none 498M 0  498M   0% /run/shm
none 100M 0  100M   0% /run/user
/dev/sda1228M   34M  183M  16% /boot
/dev/sdc1 14G  4.4G  9.7G  32% 
/var/lib/ceph/osd/ceph-3
/dev/sdb19.0G  1.6G  7.5G  18% 
/var/lib/ceph/osd/ceph-0
172.18.46.34:6789:/  276M   94M  183M  34% /mnt/mycephfs # 
which should be about 70G.
ceph@ceph-node4:~/mycluster$ uname -a
Linux ceph-node4 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 
x86_64 x86_64 x86_64 GNU/Linux





-- Original --
From:  Sage Weils...@inktank.com;
Date:  Wed, Jun 12, 2013 11:45 PM
To:  Markus Goldberggoldb...@uni-hildesheim.de; 
Cc:  ceph-usersceph-users@lists.ceph.com; 
Subject:  Re: [ceph-users] ceph mount: Only 240 GB , should be 60TB



Hi Markus,

What version of the kernel are you using on the client?  There is an 
annoying compatibility issue with older glibc that makes representing 
large values for statfs(2) (df) difficult.  We switched this behavior to 
hopefully do things the better/more right way for the future, but it's 
possible you have an odd version or combination that gives goofy results.  

sage


On Wed, 12 Jun 2013, Markus Goldberg wrote:

 Hi,
 this is cuttlefish 0.63 on Ubuntu 13.04, underlying OSD-FS is btrfs, 3
 servers, each of them 20TB (Raid6-array)
 
 When i mount at the client (or at one of the servers) the mounted filesystem
 is only 240GB but it should be 60TB.
 
 root@bd-0:~# cat /etc/ceph/ceph.conf
 [global]
 fsid = e0dbf70d-af59-42a5-b834-7ad739a7f89b
 mon_initial_members = bd-0, bd-1, bd-2
 mon_host = ###.###.###.20,###.###.###.21,###.###.###.22
 auth_supported = cephx
 public_network = ###.###.###.0/24
 cluster_network = 192.168.1.0/24
 osd_mkfs_type = btrfs
 osd_mkfs_options_btrfs = -n 32k -l 32k
 osd_mount_options_btrfs = rw,noatime,nodiratime,autodefrag
 osd_journal_size = 10240
 
 root@bd-0:~#
 
 df on one of the servers:
 root@bd-0:~# df -h
 Filesystem  Size  Used Avail Use% Mounted on
 /dev/sda139G  4,5G   32G  13% /
 none4,0K 0  4,0K   0% /sys/fs/cgroup
 udev 16G   12K   16G   1% /dev
 tmpfs   3,2G  852K  3,2G   1% /run
 none5,0M  4,0K  5,0M   1% /run/lock
 none 16G 0   16G   0% /run/shm
 none100M 0  100M   0% /run/user
 /dev/sdc120T  6,6M   20T   1% /var/lib/ceph/osd/ceph-0
 root@bd-0:~#
 root@bd-0:~# ceph -s
health HEALTH_OK
monmap e1: 3 mons at
 {bd-0=###.###.###.20:6789/0,bd-1=###.###.###.21:6789/0,bd-2=###.###.###.22:6789/0},
 election epoch 66, quorum 0,1,2 bd-0,bd-1,bd-2
osdmap e109: 3 osds: 3 up, 3 in
 pgmap v848: 192 pgs: 192 active+clean; 23239 bytes data, 16020 KB used,
 61402 GB / 61408 GB avail
mdsmap e56: 1/1/1 up {0=bd-1=up:active}, 2 up:standby
 
 root@bd-0:~#
 
 
 at the client:
 root@bs4:~#
 root@bs4:~# mount -t ceph ###.###.###.20:6789:/ /mnt/myceph -v -o
 name=admin,secretfile=/etc/ceph/admin.secret
 parsing options: rw,name=admin,secretfile=/etc/ceph/admin.secret
 root@bs4:~# df -h
 Dateisystem   Gr???e Benutzt Verf. Verw% Eingeh??ngt auf
 /dev/sda1   28G3,0G   24G   12% /
 none   4,0K   0  4,0K0% /sys/fs/cgroup
 udev   998M4,0K  998M1% /dev
 tmpfs  201M708K  200M1% /run
 none   5,0M   0  5,0M0% /run/lock
 none  1002M 84K 1002M1% /run/shm
 none   100M   0  100M0% /run/user
 ###.###.###.20:6789:/  240G 25M  240G1% /mnt/myceph
 root@bs4:~#
 root@bs4:~# cd /mnt/myceph
 root@bs4:/mnt/myceph# mkdir Test
 root@bs4:/mnt/myceph# cd Test
 root@bs4:/mnt/myceph/Test# touch testfile
 root@bs4:/mnt/myceph/Test# ls -la
 insgesamt 0
 drwxr-xr-x 1 root root 0 Jun 12  2013 .
 drwxr-xr-x 1 root root 0 Jun 12 10:17 ..
 -rw-r--r-- 1 root root 0 Jun 12 10:18 testfile
 root@bs4:/mnt/myceph/Test# pwd
 /mnt/myceph/Test
 root@bs4:/mnt/myceph/Test# df -h .
 Dateisystem   Gr???e Benutzt Verf. Verw% Eingeh??ngt auf
 ###.###.###.20:6789:/  240G 25M  240G1% /mnt/myceph
 
 
 BTW /dev/sda on the servers are 256GB-SSDs
 
 
 Can anyone please help ?
 
 Thank you,  Markus
 
 -- 
 MfG,
   Markus Goldberg
 
 
 Markus Goldberg | Universit?t 

Re: [ceph-users] core dump: qemu-img info -f rbd

2013-06-07 Thread Da Chun
Yes, it works with -f raw.


“qemu-img convert” has the same problem:
qemu-img convert -f qcow2 -O rbd cirros-0.3.0-x86_64-disk.img 
rbd:vm_disks/test_disk2
core dump
qemu-img convert -f qcow2 -O raw cirros-0.3.0-x86_64-disk.img 
rbd:vm_disks/test_disk2
working


-- Original --
From:  Oliver Franckeoliver.fran...@filoo.de;
Date:  Thu, Jun 6, 2013 07:14 PM
To:  ceph-usersceph-users@lists.ceph.com; 

Subject:  Re: [ceph-users] core dump: qemu-img info -f rbd



Hi,

On 06/06/2013 08:12 AM, Jens Kristian Søgaard wrote:
 Hi,

 I got a core dump when executing: root@ceph-node1:~# qemu-img info -f 
 rbd rbd:vm_disks/box1_disk1

 Try leaving out -f rbd from the command - I have seen that make a 
 difference before.

... or try -f raw. Same is for the -drive format=raw,file=... 
specification. Former format=rbd does not work anymore.

Basically the format _is_ raw ;)

Oliver.


-- 

Oliver Francke

filoo GmbH
Moltkestraße 25a
0 Gütersloh
HRB4355 AG Gütersloh

Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz

Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Failed to clone ceph

2013-06-07 Thread Da Chun
Failed to clone ceph. Do you have the same problem?


root@ceph-node7:~/workspace# git clone --recursive 
https://github.com/ceph/ceph.git
Cloning into 'ceph'...
remote: Counting objects: 192874, done.
remote: Compressing objects: 100% (41154/41154), done.
remote: Total 192874 (delta 155848), reused 186400 (delta 149917)
Receiving objects: 100% (192874/192874), 39.74 MiB | 8 KiB/s, done.
Resolving deltas: 100% (155848/155848), done.
Submodule 'ceph-object-corpus' (git://ceph.com/git/ceph-object-corpus.git) 
registered for path 'ceph-object-corpus'
Submodule 'src/libs3' (git://github.com/ceph/libs3.git) registered for path 
'src/libs3'
Cloning into 'ceph-object-corpus'...
remote: Counting objects: 6255, done.
remote: Compressing objects: 100% (3462/3462), done.
fatal: read error: Connection reset by peeriB | 127 KiB/s
fatal: early EOFs:  10% (626/6255), 72.00 KiB | 127 KiB/s
fatal: recursion detected in die handler
Clone of 'git://ceph.com/git/ceph-object-corpus.git' into submodule path 
'ceph-object-corpus' failed



root@ceph-node7:~/workspace# git clone git://ceph.com/git/ceph-object-corpus.git
Cloning into 'ceph-object-corpus'...
remote: Counting objects: 6255, done.
remote: Compressing objects: 100% (3462/3462), done.
fatal: read error: Connection reset by peeriB | 102 KiB/s
fatal: early EOF
fatal: recursion detected in die handler___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com