Re: [ceph-users] bursty IO, ceph cache pool can not follow evictions

2015-06-03 Thread Kenneth Waegeman



On 06/02/2015 07:08 PM, Nick Fisk wrote:

Hi Kenneth,

I suggested an idea which may help with this, it is being currently being
developed .

https://github.com/ceph/ceph/pull/4792

In short there is a high and low threshold with different flushing
priorities. Hopefully this will help with bursty workloads.


Thanks! Will this also increase the absolute flushing speed? Because I 
think the problem is more the absolute speed.. It is not my workload 
that is bursty, but the actual processing to the ceph cluster, because 
the cache flushes slower than new data entering.
Now I see my cold storage disks aren't doing a lot of usage (see iostat 
usage other email), so is there a way to increase the flushing speed by 
tuning the cache agent for eg parallelism.. ?




Nick


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Kenneth Waegeman
Sent: 02 June 2015 17:54
To: ceph-users@lists.ceph.com
Subject: [ceph-users] bursty IO, ceph cache pool can not follow evictions

Hi,

we were rsync-streaming with 4 cephfs client to a ceph cluster with a

cache

layer upon an erasure coded pool.
This was going on for some time, and didn't have real problems.

Today we added 2 more streams, and very soon we saw some strange
behaviour:
- We are getting blocked requests on our cache pool osds
- our cache pool is often near/ at max ratio
- Our data streams have very bursty IO, (streaming a minute a few hunderds
MB and then nothing)

Our OSDs are not overloaded (nor the ECs nor cache, checked with iostat),
though it seems like the cache pool can not evict objects in time, and get
blocked until that is ok, each time again.
If I rise the target_max_bytes limit, it starts streaming again until it

is full

again.

cache parameters we have are these:
ceph osd pool set cache hit_set_type bloom ceph osd pool set cache
hit_set_count 1 ceph osd pool set cache hit_set_period 3600 ceph osd pool
set cache target_max_bytes $((14*75*1024*1024*1024)) ceph osd pool set
cache cache_target_dirty_ratio 0.4 ceph osd pool set cache
cache_target_full_ratio 0.8


What can be the issue here ? I tried to find some information about the
'cache agent' , but can only find some old references..

Thank you!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bursty IO, ceph cache pool can not follow evictions

2015-06-03 Thread Nick Fisk
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Kenneth Waegeman
 Sent: 03 June 2015 10:51
 To: Nick Fisk; ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] bursty IO, ceph cache pool can not follow
evictions
 
 
 
 On 06/02/2015 07:08 PM, Nick Fisk wrote:
  Hi Kenneth,
 
  I suggested an idea which may help with this, it is being currently
  being developed .
 
  https://github.com/ceph/ceph/pull/4792
 
  In short there is a high and low threshold with different flushing
  priorities. Hopefully this will help with bursty workloads.
 
 Thanks! Will this also increase the absolute flushing speed? Because I
think
 the problem is more the absolute speed.. It is not my workload that is
bursty,
 but the actual processing to the ceph cluster, because the cache flushes
 slower than new data entering.
 Now I see my cold storage disks aren't doing a lot of usage (see iostat
usage
 other email), so is there a way to increase the flushing speed by tuning
the
 cache agent for eg parallelism.. ?

To be honest with you, I'm not 100% sure. I see similar issues to you where
performance seems really poor compared to the actual amount of disk
activity. My best hunch is that it's probably to do with overwriting
existing blocks that 1st have to be promoted to the cache tier before being
overwritten. These will always cause a block on the operation as it can't
continue until the object is in the cache tier.

A few things I have also been toying with seem to help in my case, not sure
if they will help in yours though.
1. Making the object size smaller. Ie, for RBD dropping from 4MB object
sizes to 1MB decreases latency of promotion/demotion operations
2. Making the cache tier a lot bigger to reduce promotions/demotions
3. Using something like flashcache in front of the RBD to hide this
promotion latency from the workload

Nick

 
 
  Nick
 
  -Original Message-
  From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
  Of Kenneth Waegeman
  Sent: 02 June 2015 17:54
  To: ceph-users@lists.ceph.com
  Subject: [ceph-users] bursty IO, ceph cache pool can not follow
  evictions
 
  Hi,
 
  we were rsync-streaming with 4 cephfs client to a ceph cluster with a
  cache
  layer upon an erasure coded pool.
  This was going on for some time, and didn't have real problems.
 
  Today we added 2 more streams, and very soon we saw some strange
  behaviour:
  - We are getting blocked requests on our cache pool osds
  - our cache pool is often near/ at max ratio
  - Our data streams have very bursty IO, (streaming a minute a few
  hunderds MB and then nothing)
 
  Our OSDs are not overloaded (nor the ECs nor cache, checked with
  iostat), though it seems like the cache pool can not evict objects in
  time, and get blocked until that is ok, each time again.
  If I rise the target_max_bytes limit, it starts streaming again until
  it
  is full
  again.
 
  cache parameters we have are these:
  ceph osd pool set cache hit_set_type bloom ceph osd pool set cache
  hit_set_count 1 ceph osd pool set cache hit_set_period 3600 ceph osd
  pool set cache target_max_bytes $((14*75*1024*1024*1024)) ceph osd
  pool set cache cache_target_dirty_ratio 0.4 ceph osd pool set cache
  cache_target_full_ratio 0.8
 
 
  What can be the issue here ? I tried to find some information about
  the 'cache agent' , but can only find some old references..
 
  Thank you!
 
  Kenneth
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bursty IO, ceph cache pool can not follow evictions

2015-06-03 Thread Kenneth Waegeman



On 06/02/2015 07:21 PM, Paul Evans wrote:

Kenneth,
   My guess is that you’re hitting the cache_target_full_ratio on an
individual OSD, which is easy to do since most of us tend to think of
the cache_target_full_ratio as an aggregate of the OSDs (which it is not
according to Greg Farnum).   This posting may shed more light on the
issue, if it is indeed what you are bumping up against.
https://www.mail-archive.com/ceph-users%40lists.ceph.com/msg20207.html

It looks like this indeed, then the question is why it is not flushing more?


   BTW: how are you determining that your OSDs are ‘not overloaded?’
  Are you judging that by iostat utilization, or by capacity consumed?
iostat is showing low utilisation on all disks; soem disks are doing 
'nothing':



Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdn   0.00 0.00  813.50  415.0016.9015.49 
53.99 0.420.350.150.72   0.10  12.00
sdm   0.00 0.00  820.50  490.5013.0621.99 
54.76 0.700.540.181.13   0.12  15.50
sdq   0.00 1.50   14.00   47.00 0.98 0.33 
43.99 0.558.93   18.935.96   6.31  38.50
sdr   0.00 0.000.000.50 0.00 0.00 
14.00 0.000.000.000.00   0.00   0.00
sdd   0.00 9.504.00   21.50 0.27 1.47 
140.00 0.124.712.505.12   4.31  11.00
sda   0.00 8.502.50   14.50 0.26 0.71 
116.91 0.084.414.004.48   4.71   8.00
sdh   0.00 6.002.00   15.00 0.25 1.10 
162.59 0.073.827.503.33   3.53   6.00
sdf   0.0017.503.00   25.00 0.32 1.01 
97.48 0.238.215.008.60   8.21  23.00
sdi   0.0011.001.00   31.50 0.07 2.23 
144.60 0.144.460.004.60   3.85  12.50
sdo   0.00 0.000.001.00 0.00 0.00 
8.00 0.000.000.000.00   0.00   0.00
sdk   0.00 0.00   22.500.00 1.58 0.00 
143.82 0.135.785.780.00   4.00   9.00
sdg   0.00 2.500.00   30.00 0.00 3.35 
228.52 0.144.500.004.50   1.33   4.00
sdc   0.0012.501.50   23.50 0.01 1.36 
111.68 0.176.800.007.23   6.20  15.50
sdj   0.0018.50   27.50   30.50 2.28 1.65 
138.82 0.437.337.826.89   5.86  34.00
sde   0.00 4.000.50   15.00 0.04 0.10 
18.10 0.074.84   10.004.67   2.58   4.00
sdl   0.0023.006.00   33.00 0.58 1.31 
99.22 0.287.05   17.505.15   6.79  26.50
sdb   0.00 5.003.009.00 0.12 0.47 
100.29 0.054.581.675.56   3.75   4.50


In my opinion there should be enough resources to do flushing, and 
therefore not getting a full cache..



--
*Paul *
*
*
*
*

On Jun 2, 2015, at 9:53 AM, Kenneth Waegeman
kenneth.waege...@ugent.be mailto:kenneth.waege...@ugent.be wrote:

Hi,

we were rsync-streaming with 4 cephfs client to a ceph cluster with a
cache layer upon an erasure coded pool.
This was going on for some time, and didn't have real problems.

Today we added 2 more streams, and very soon we saw some strange
behaviour:
- We are getting blocked requests on our cache pool osds
- our cache pool is often near/ at max ratio
- Our data streams have very bursty IO, (streaming a minute a few
hunderds MB and then nothing)

Our OSDs are not overloaded (nor the ECs nor cache, checked with
iostat), though it seems like the cache pool can not evict objects in
time, and get blocked until that is ok, each time again.
If I rise the target_max_bytes limit, it starts streaming again until
it is full again.

cache parameters we have are these:
ceph osd pool set cache hit_set_type bloom
ceph osd pool set cache hit_set_count 1
ceph osd pool set cache hit_set_period 3600
ceph osd pool set cache target_max_bytes $((14*75*1024*1024*1024))
ceph osd pool set cache cache_target_dirty_ratio 0.4
ceph osd pool set cache cache_target_full_ratio 0.8


What can be the issue here ? I tried to find some information about
the 'cache agent' , but can only find some old references..

Thank you!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] bursty IO, ceph cache pool can not follow evictions

2015-06-02 Thread Kenneth Waegeman

Hi,

we were rsync-streaming with 4 cephfs client to a ceph cluster with a 
cache layer upon an erasure coded pool.

This was going on for some time, and didn't have real problems.

Today we added 2 more streams, and very soon we saw some strange behaviour:
- We are getting blocked requests on our cache pool osds
- our cache pool is often near/ at max ratio
- Our data streams have very bursty IO, (streaming a minute a few 
hunderds MB and then nothing)


Our OSDs are not overloaded (nor the ECs nor cache, checked with 
iostat), though it seems like the cache pool can not evict objects in 
time, and get blocked until that is ok, each time again.
If I rise the target_max_bytes limit, it starts streaming again until it 
is full again.


cache parameters we have are these:
ceph osd pool set cache hit_set_type bloom
ceph osd pool set cache hit_set_count 1
ceph osd pool set cache hit_set_period 3600
ceph osd pool set cache target_max_bytes $((14*75*1024*1024*1024))
ceph osd pool set cache cache_target_dirty_ratio 0.4
ceph osd pool set cache cache_target_full_ratio 0.8


What can be the issue here ? I tried to find some information about the 
'cache agent' , but can only find some old references..


Thank you!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bursty IO, ceph cache pool can not follow evictions

2015-06-02 Thread Nick Fisk
Hi Kenneth,

I suggested an idea which may help with this, it is being currently being
developed .

https://github.com/ceph/ceph/pull/4792

In short there is a high and low threshold with different flushing
priorities. Hopefully this will help with bursty workloads.

Nick

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Kenneth Waegeman
 Sent: 02 June 2015 17:54
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] bursty IO, ceph cache pool can not follow evictions
 
 Hi,
 
 we were rsync-streaming with 4 cephfs client to a ceph cluster with a
cache
 layer upon an erasure coded pool.
 This was going on for some time, and didn't have real problems.
 
 Today we added 2 more streams, and very soon we saw some strange
 behaviour:
 - We are getting blocked requests on our cache pool osds
 - our cache pool is often near/ at max ratio
 - Our data streams have very bursty IO, (streaming a minute a few hunderds
 MB and then nothing)
 
 Our OSDs are not overloaded (nor the ECs nor cache, checked with iostat),
 though it seems like the cache pool can not evict objects in time, and get
 blocked until that is ok, each time again.
 If I rise the target_max_bytes limit, it starts streaming again until it
is full
 again.
 
 cache parameters we have are these:
 ceph osd pool set cache hit_set_type bloom ceph osd pool set cache
 hit_set_count 1 ceph osd pool set cache hit_set_period 3600 ceph osd pool
 set cache target_max_bytes $((14*75*1024*1024*1024)) ceph osd pool set
 cache cache_target_dirty_ratio 0.4 ceph osd pool set cache
 cache_target_full_ratio 0.8
 
 
 What can be the issue here ? I tried to find some information about the
 'cache agent' , but can only find some old references..
 
 Thank you!
 
 Kenneth
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bursty IO, ceph cache pool can not follow evictions

2015-06-02 Thread Paul Evans
Kenneth,
  My guess is that you’re hitting the cache_target_full_ratio on an individual 
OSD, which is easy to do since most of us tend to think of the 
cache_target_full_ratio as an aggregate of the OSDs (which it is not according 
to Greg Farnum).   This posting may shed more light on the issue, if it is 
indeed what you are bumping up against.  
https://www.mail-archive.com/ceph-users%40lists.ceph.com/msg20207.html

  BTW: how are you determining that your OSDs are ‘not overloaded?’  Are you 
judging that by iostat utilization, or by capacity consumed?
--
Paul


On Jun 2, 2015, at 9:53 AM, Kenneth Waegeman 
kenneth.waege...@ugent.bemailto:kenneth.waege...@ugent.be wrote:

Hi,

we were rsync-streaming with 4 cephfs client to a ceph cluster with a cache 
layer upon an erasure coded pool.
This was going on for some time, and didn't have real problems.

Today we added 2 more streams, and very soon we saw some strange behaviour:
- We are getting blocked requests on our cache pool osds
- our cache pool is often near/ at max ratio
- Our data streams have very bursty IO, (streaming a minute a few hunderds MB 
and then nothing)

Our OSDs are not overloaded (nor the ECs nor cache, checked with iostat), 
though it seems like the cache pool can not evict objects in time, and get 
blocked until that is ok, each time again.
If I rise the target_max_bytes limit, it starts streaming again until it is 
full again.

cache parameters we have are these:
ceph osd pool set cache hit_set_type bloom
ceph osd pool set cache hit_set_count 1
ceph osd pool set cache hit_set_period 3600
ceph osd pool set cache target_max_bytes $((14*75*1024*1024*1024))
ceph osd pool set cache cache_target_dirty_ratio 0.4
ceph osd pool set cache cache_target_full_ratio 0.8


What can be the issue here ? I tried to find some information about the 'cache 
agent' , but can only find some old references..

Thank you!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com