Re: Ceph availability test recovering question

2013-03-18 Thread Andrey Korolyov
Hello,

I`m experiencing same long-lasting problem - during recovery ops, some
percentage of read I/O remains in-flight for seconds, rendering
upper-level filesystem on the qemu client very slow and almost
unusable. Different striping has almost no effect on visible delays
and reads may be non-intensive at all but they still are very slow.

Here is some fio results on randread with small blocks, so it is not
affected by readahead as linear one:

Intensive reads during recovery:
lat (msec) : 2=0.01%, 4=0.08%, 10=1.87%, 20=4.17%, 50=8.34%
lat (msec) : 100=13.93%, 250=2.77%, 500=1.19%, 750=25.13%, 1000=0.41%
lat (msec) : 2000=15.45%, =2000=26.66%

same on healthy cluster:
lat (msec) : 20=0.33%, 50=9.17%, 100=23.35%, 250=25.47%, 750=6.53%
lat (msec) : 1000=0.42%, 2000=34.17%, =2000=0.56%


On Sun, Mar 17, 2013 at 8:18 AM,  kelvin_hu...@wiwynn.com wrote:
 Hi, all

 I have some problem after availability test

 Setup:
 Linux kernel: 3.2.0
 OS: Ubuntu 12.04
 Storage server : 11 HDD (each storage server has 11 osd, 7200 rpm, 1T) + 
 10GbE NIC
 RAID card: LSI MegaRAID SAS 9260-4i  For every HDD: RAID0, Write Policy: 
 Write Back with BBU, Read Policy: ReadAhead, IO Policy: Direct
 Storage server number : 2

 Ceph version : 0.48.2
 Replicas : 2
 Monitor number:3


 We have two storage server as a cluter, then use ceph client create 1T RBD 
 image for testing, the client also
 has 10GbE NIC , Linux kernel 3.2.0 , Ubuntu 12.04

 We also use FIO to produce workload

 fio command:
 [Sequencial Read]
 fio --iodepth = 32 --numjobs=1 --runtime=120  --bs = 65536 --rw = read 
 --ioengine=libaio --group_reporting --direct=1 --eta=always  --ramp_time=10 
 --thinktime=10

 [Sequencial Write]
 fio --iodepth = 32 --numjobs=1 --runtime=120  --bs = 65536 --rw = write 
 --ioengine=libaio --group_reporting --direct=1 --eta=always  --ramp_time=10 
 --thinktime=10


 Now I want observe to ceph state when one storage server is crash, so I turn 
 off one storage server networking.
 We expect that data write and data read operation can be quickly resume or 
 even not be suspended in ceph recovering time, but the experimental results 
 show
 the data write and data read operation will pause for about 20~30 seconds in 
 ceph recovering time.

 My question is:
 1.The state of I/O pause is normal when ceph recovering ?
 2.The pause time of I/O that can not be avoided when ceph recovering ?
 3.How to reduce the I/O pause time ?


 Thanks!!
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph availability test recovering question

2013-03-18 Thread Wolfgang Hennerbichler


On 03/17/2013 05:18 AM, kelvin_hu...@wiwynn.com wrote:
 Hi, all

Hi,
 ...
 My question is:
 1.The state of I/O pause is normal when ceph recovering ?

I have experienced the same issue. This works as designed, and is
probably because of the heartbeat-timeout in osd heartbeat grace
period set to 20 secs - see:
http://ceph.com/docs/master/rados/configuration/mon-osd-interaction/

 2.The pause time of I/O that can not be avoided when ceph recovering ?

You can always lower the grace period and heartbeat time, though I don't
know if this is a wise idea. Short networking interruptions might mark
your OSD out very quickly then.

 3.How to reduce the I/O pause time ?

see the link above, or this link here:
http://ceph.com/docs/master/rados/configuration/osd-config-ref/#monitor-osd-interaction

 
 Thanks!!
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


-- 
DI (FH) Wolfgang Hennerbichler
Software Development
Unit Advanced Computing Technologies
RISC Software GmbH
A company of the Johannes Kepler University Linz

IT-Center
Softwarepark 35
4232 Hagenberg
Austria

Phone: +43 7236 3343 245
Fax: +43 7236 3343 250
wolfgang.hennerbich...@risc-software.at
http://www.risc-software.at
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Ceph availability test recovering question

2013-03-16 Thread Kelvin_Huang
Hi, all

I have some problem after availability test

Setup:
Linux kernel: 3.2.0
OS: Ubuntu 12.04
Storage server : 11 HDD (each storage server has 11 osd, 7200 rpm, 1T) + 10GbE 
NIC 
RAID card: LSI MegaRAID SAS 9260-4i  For every HDD: RAID0, Write Policy: Write 
Back with BBU, Read Policy: ReadAhead, IO Policy: Direct 
Storage server number : 2

Ceph version : 0.48.2
Replicas : 2
Monitor number:3


We have two storage server as a cluter, then use ceph client create 1T RBD 
image for testing, the client also 
has 10GbE NIC , Linux kernel 3.2.0 , Ubuntu 12.04

We also use FIO to produce workload

fio command:
[Sequencial Read]
fio --iodepth = 32 --numjobs=1 --runtime=120  --bs = 65536 --rw = read 
--ioengine=libaio --group_reporting --direct=1 --eta=always  --ramp_time=10 
--thinktime=10

[Sequencial Write]
fio --iodepth = 32 --numjobs=1 --runtime=120  --bs = 65536 --rw = write 
--ioengine=libaio --group_reporting --direct=1 --eta=always  --ramp_time=10 
--thinktime=10


Now I want observe to ceph state when one storage server is crash, so I turn 
off one storage server networking.
We expect that data write and data read operation can be quickly resume or even 
not be suspended in ceph recovering time, but the experimental results show 
the data write and data read operation will pause for about 20~30 seconds in 
ceph recovering time.

My question is:
1.The state of I/O pause is normal when ceph recovering ?
2.The pause time of I/O that can not be avoided when ceph recovering ?
3.How to reduce the I/O pause time ?


Thanks!!
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html