Re: [ceph-users] performance issue with jewel on ubuntu xenial (kernel)

2016-06-23 Thread Yoann Moulin
Le 23/06/2016 08:25, Sarni Sofiane a écrit :
> Hi Florian,
> 


> On 23.06.16 06:25, "ceph-users on behalf of Florian Haas" 
>  wrote:
> 
>> On Wed, Jun 22, 2016 at 10:56 AM, Yoann Moulin  wrote:
>>> Hello Florian,
>>>
 On Tue, Jun 21, 2016 at 3:11 PM, Yoann Moulin  wrote:
> Hello,
>
> I found a performance drop between kernel 3.13.0-88 (default kernel on 
> Ubuntu
> Trusty 14.04) and kernel 4.4.0.24.14 (default kernel on Ubuntu Xenial 
> 16.04)
>
> ceph version is Jewel (10.2.2).
> All tests have been done under Ubuntu 14.04

 Knowing that you also have an internalis cluster on almost identical
 hardware, can you please let the list know whether you see the same
 behavior (severely reduced throughput on a 4.4 kernel, vs. 3.13) on
 that cluster as well?
>>>
>>> ceph version is infernalis (9.2.0)
>>>
>>> Ceph osd Benchmark:
>>>
>>> Kernel 3.13.0-88-generic : ceph tell osd.ID => average ~84MB/s
>>> Kernel 4.2.0-38-generic  : ceph tell osd.ID => average ~90MB/s
>>> Kernel 4.4.0-24-generic  : ceph tell osd.ID => average ~75MB/s
>>>
>>> The slow down is not as much as I have with Jewel but it is still present.
>>
>> But this is not on precisely identical hardware, is it?
>
> All the benchmarks were run on strictly identical hardware setups per node.
> Clusters differ slightly in sizes (infernalis vs jewel) but nodes and OSDs 
> are identical.

One thing differ in the osd configuration, on the Jewel cluster, we have journal
on disk, on the Infernalis cluster, we have journal on SSD (S3500)

I can restart my test on a Jewel cluster with journal on SSD if needed.
I can do as well a test on an Infernalis cluster with journal on disk.

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] stuck unclean since forever

2016-06-23 Thread 施柏安
In 'rule replicated_ruleset':
..
...
step chooseleaf firstn 0 type *osd*
...
...



2016-06-22 19:38 GMT+08:00 min fang :

> Thanks, actually I create a pool with more pgs also meet this problem.
> Following is my crush map, please help point how to change the crush
> ruleset? thanks.
>
> #begin crush map
> tunable choose_local_tries 0
> tunable choose_local_fallback_tries 0
> tunable choose_total_tries 50
> tunable chooseleaf_descend_once 1
> tunable chooseleaf_vary_r 1
> tunable straw_calc_version 1
>
> # devices
> device 0 device0
> device 1 device1
> device 2 osd.2
> device 3 osd.3
> device 4 osd.4
>
> # types
> type 0 osd
> type 1 host
> type 2 chassis
> type 3 rack
> type 4 row
> type 5 pdu
> type 6 pod
> type 7 room
> type 8 datacenter
> type 9 region
> type 10 root
>
> # buckets
> host redpower-ceph-01 {
> id -2   # do not change unnecessarily
> # weight 3.000
> alg straw
> hash 0  # rjenkins1
> item osd.2 weight 1.000
> item osd.3 weight 1.000
> item osd.4 weight 1.000
> }
> root default {
> id -1   # do not change unnecessarily
> # weight 3.000
> alg straw
> hash 0  # rjenkins1
> item redpower-ceph-01 weight 3.000
> }
>
> # rules
> rule replicated_ruleset {
> ruleset 0
> type replicated
> min_size 1
> max_size 10
> step take default
> step chooseleaf firstn 0 type host
> step emit
> }
>
> # end crush map
>
>
> 2016-06-22 18:27 GMT+08:00 Burkhard Linke <
> burkhard.li...@computational.bio.uni-giessen.de>:
>
>> Hi,
>>
>> On 06/22/2016 12:10 PM, min fang wrote:
>>
>> Hi, I created a new ceph cluster, and create a pool, but see "stuck
>> unclean since forever" errors happen(as the following), can help point out
>> the possible reasons for this? thanks.
>>
>> ceph -s
>> cluster 602176c1-4937-45fc-a246-cc16f1066f65
>>  health HEALTH_WARN
>> 8 pgs degraded
>> 8 pgs stuck unclean
>> 8 pgs undersized
>> too few PGs per OSD (2 < min 30)
>>  monmap e1: 1 mons at {ceph-01=172.0.0.11:6789/0}
>> election epoch 14, quorum 0 ceph-01
>>  osdmap e89: 3 osds: 3 up, 3 in
>> flags
>>   pgmap v310: 8 pgs, 1 pools, 0 bytes data, 0 objects
>> 60112 MB used, 5527 GB / 5586 GB avail
>>8 active+undersized+degraded
>>
>>
>> *snipsnap*
>>
>> With three OSDs and a single host you need to change the crush ruleset
>> for the pool, since it tries to distribute the data across 3 different
>> _host_ by default.
>>
>> Regards,
>> Burkhard
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Best regards,

施柏安 Desmond Shih
技術研發部 Technical Development
 
迎棧科技股份有限公司
│ 886-975-857-982
│ desmond.s@inwinstack .com
│ 886-2-7738-2858 #7725
│ 新北市220板橋區遠東路3號5樓C室
Rm.C, 5F., No.3, Yuandong Rd.,
Banqiao Dist., New Taipei City 220, Taiwan (R.O.C)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cache Tiering with Same Cache Pool

2016-06-23 Thread Lazuardi Nasution
Hi Christian,

So, it seem that at first I must set target_max_bytes with Max. Available
size devided by the number opf cache pools and with calculation of worst
OSDs down possibility, isn't it? And then after some while, I adjust
target_max_bytes per cache pool by monitoring "ceph df detail" output to
see which one should have more size and which one should have less size,
but total still must not more than Max. Available size reduced by worst
OSDs down percentage.

By the way, since there is no maximum age before the object is flushed
(dirty) or evicted (clean), is lowering hit_set_period will helpful?

Best regards,

On Thu, Jun 23, 2016 at 7:23 AM, Christian Balzer  wrote:

>
> Hello,
>
> On Wed, 22 Jun 2016 15:40:40 +0700 Lazuardi Nasution wrote:
>
> > Hi Christian,
> >
> > If I have several cache pool on the same SSD OSDs (by using same ruleset)
> > so those cache pool always show same Max. Available of "ceph df detail"
> > output,
>
> That's true for all pools that share the same backing storage.
>
> >what should I put on target_max_bytes of cache tiering
> > configuration for each cache pool? should it be same and use Max
> > Available size?
>
> Definitely not, you will want to at least subtract enough space from your
> available size to avoid having one failed OSD generating a full disk
> situation. Even more to cover a failed host scenario.
> Then you want to divide the rest by the number of pools you plan to put on
> there and set that as the target_max_bytes in the simplest case.
>
> >If diffrent, how can I know if such cache pool need more
> > size than other.
> >
> By looking at df detail again, the usage is per pool after all.
>
> But a cache pool will of course use all the space it has, so that's not a
> good way to determine your needs.
> Watching how fast they fill up may be more helpful.
>
> You should have decent idea before doing cache tiering about your needs,
> by monitoring the pools (and their storage) you want to cache, again
> with "df detail" (how many writes/reads?), "ceph -w", atop or iostat, etc.
>
> Christian
>
> > Best regards,
> >
> > Date: Mon, 20 Jun 2016 09:34:05 +0900
> > > From: Christian Balzer 
> > > To: ceph-users@lists.ceph.com
> > > Cc: Lazuardi Nasution 
> > > Subject: Re: [ceph-users] Cache Tiering with Same Cache Pool
> > > Message-ID: <20160620093405.732f5...@batzmaru.gol.ad.jp>
> > > Content-Type: text/plain; charset=US-ASCII
> > >
> > > On Mon, 20 Jun 2016 00:14:55 +0700 Lazuardi Nasution wrote:
> > >
> > > > Hi,
> > > >
> > > > Is it possible to do cache tiering for some storage pools with the
> > > > same cache pool?
> > >
> > > As mentioned several times on this ML, no.
> > > There is a strict 1:1 relationship between base and cache pools.
> > > You can of course (if your SSDs/NVMes are large and fast enough) put
> > > more than one cache pool on them.
> > >
> > > > What will happen if cache pool is broken or at least doesn't
> > > > meet quorum when storage pool is OK?
> > > >
> > > With a read-only cache pool nothing should happen, as all writes are
> > > going to the base pool.
> > >
> > > In any other mode (write-back, read-forward or read-proxy) your hottest
> > > objects are likely to be ONLY on the cache pool and never getting
> > > flushed to the base pool.
> > > So that means, if your cache pool fails, your cluster is essentially
> > > dead or at the very least has suffered massive data loss.
> > >
> > > Something to very much think about when doing cache tiering.
> > >
> > > Christian
> > > --
> > > Christian BalzerNetwork/Systems Engineer
> > > ch...@gol.com   Global OnLine Japan/Rakuten Communications
> > > http://www.gol.com/
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cache Tiering with Same Cache Pool

2016-06-23 Thread Christian Balzer

Hello,

On Thu, 23 Jun 2016 14:28:30 +0700 Lazuardi Nasution wrote:

> Hi Christian,
> 
> So, it seem that at first I must set target_max_bytes with Max. Available
> size devided by the number opf cache pools and with calculation of worst
> OSDs down possibility, isn't it? 

Correct.

>And then after some while, I adjust
> target_max_bytes per cache pool by monitoring "ceph df detail" output to
> see which one should have more size and which one should have less size,
> but total still must not more than Max. Available size reduced by worst
> OSDs down percentage.
> 
Again, if you have existing pools that you want to add cache pools to, you
should already have some idea of their I/O needs from the reads and writes
in df detail or the other tools I mentioned.
And thus how to size them respectively.

A cache pool that gets full the quickest might have been caused by a
single large copy, you want to look at IOPS first, then data volume.

> By the way, since there is no maximum age before the object is flushed
> (dirty) or evicted (clean), is lowering hit_set_period will helpful?
> 
I'm not sure what you're asking here, as hit_set_period only affects
promotions, not flushes or evictions. 

And if probably want to set minimum ages, depending on your usage patterns
and cache size.

Christian
> Best regards,
> 
> On Thu, Jun 23, 2016 at 7:23 AM, Christian Balzer  wrote:
> 
> >
> > Hello,
> >
> > On Wed, 22 Jun 2016 15:40:40 +0700 Lazuardi Nasution wrote:
> >
> > > Hi Christian,
> > >
> > > If I have several cache pool on the same SSD OSDs (by using same
> > > ruleset) so those cache pool always show same Max. Available of
> > > "ceph df detail" output,
> >
> > That's true for all pools that share the same backing storage.
> >
> > >what should I put on target_max_bytes of cache tiering
> > > configuration for each cache pool? should it be same and use Max
> > > Available size?
> >
> > Definitely not, you will want to at least subtract enough space from
> > your available size to avoid having one failed OSD generating a full
> > disk situation. Even more to cover a failed host scenario.
> > Then you want to divide the rest by the number of pools you plan to
> > put on there and set that as the target_max_bytes in the simplest case.
> >
> > >If diffrent, how can I know if such cache pool need more
> > > size than other.
> > >
> > By looking at df detail again, the usage is per pool after all.
> >
> > But a cache pool will of course use all the space it has, so that's
> > not a good way to determine your needs.
> > Watching how fast they fill up may be more helpful.
> >
> > You should have decent idea before doing cache tiering about your
> > needs, by monitoring the pools (and their storage) you want to cache,
> > again with "df detail" (how many writes/reads?), "ceph -w", atop or
> > iostat, etc.
> >
> > Christian
> >
> > > Best regards,
> > >
> > > Date: Mon, 20 Jun 2016 09:34:05 +0900
> > > > From: Christian Balzer 
> > > > To: ceph-users@lists.ceph.com
> > > > Cc: Lazuardi Nasution 
> > > > Subject: Re: [ceph-users] Cache Tiering with Same Cache Pool
> > > > Message-ID: <20160620093405.732f5...@batzmaru.gol.ad.jp>
> > > > Content-Type: text/plain; charset=US-ASCII
> > > >
> > > > On Mon, 20 Jun 2016 00:14:55 +0700 Lazuardi Nasution wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Is it possible to do cache tiering for some storage pools with
> > > > > the same cache pool?
> > > >
> > > > As mentioned several times on this ML, no.
> > > > There is a strict 1:1 relationship between base and cache pools.
> > > > You can of course (if your SSDs/NVMes are large and fast enough)
> > > > put more than one cache pool on them.
> > > >
> > > > > What will happen if cache pool is broken or at least doesn't
> > > > > meet quorum when storage pool is OK?
> > > > >
> > > > With a read-only cache pool nothing should happen, as all writes
> > > > are going to the base pool.
> > > >
> > > > In any other mode (write-back, read-forward or read-proxy) your
> > > > hottest objects are likely to be ONLY on the cache pool and never
> > > > getting flushed to the base pool.
> > > > So that means, if your cache pool fails, your cluster is
> > > > essentially dead or at the very least has suffered massive data
> > > > loss.
> > > >
> > > > Something to very much think about when doing cache tiering.
> > > >
> > > > Christian
> > > > --
> > > > Christian BalzerNetwork/Systems Engineer
> > > > ch...@gol.com   Global OnLine Japan/Rakuten Communications
> > > > http://www.gol.com/
> >


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph 10.1.1 rbd map fail

2016-06-23 Thread 王海涛
Hi, Brad:
This is the output of "ceph osd crush show-tunables -f json-pretty"
{
"choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"chooseleaf_vary_r": 1,
"chooseleaf_stable": 0,
"straw_calc_version": 1,
"allowed_bucket_algs": 22,
"profile": "firefly",
"optimal_tunables": 0,
"legacy_tunables": 0,
"minimum_required_version": "firefly",
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"has_v2_rules": 0,
"require_feature_tunables3": 1,
"has_v3_rules": 0,
"has_v4_buckets": 0,
"require_feature_tunables5": 0,
"has_v5_rules": 0
}


The value of "require_feature_tunables3" is 1, I think It need to be 0 to make 
my rbd map success.
So I set it to 0 by ceph osd crush tool, but it still dosn't work.
Then I checked the rbd image info:
rbd image 'myimage':
size 1024 MB in 256 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.5e3074b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags: 


It looks like that some of the features are not supported by my rbd kernel 
module. 
Because when I get rid of the last 4 features, and only keep the "layering" 
feature,
the image seems to be mapped and used rightly.


Thanks for your answer! 


Kind Regards,
Haitao Wang


At 2016-06-23 09:51:02, "Brad Hubbard"  wrote:
>On Wed, Jun 22, 2016 at 3:20 PM, 王海涛  wrote:
>> I find this message in dmesg:
>> [83090.212918] libceph: mon0 192.168.159.128:6789 feature set mismatch, my
>> 4a042a42 < server's 2004a042a42, missing 200
>>
>> According to
>> "http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client";,
>> this could mean that I need to upgrade kernel client up to 3.15 or disable
>> tunable 3 features.
>> Our cluster is not convenient to upgrade.
>> Could you tell me how to disable tunable 3 features?
>
>Can you show the output of the following command please?
>
># ceph osd crush show-tunables -f json-pretty
>
>I believe you'll need to use "ceph osd crush tunables " to adjust this.
>
>>
>> Thanks!
>>
>> Kind Regards,
>> Haitao Wang
>>
>>
>> At 2016-06-22 12:33:42, "Brad Hubbard"  wrote:
>>>On Wed, Jun 22, 2016 at 1:35 PM, 王海涛  wrote:
 Hi All

 I'm using ceph-10.1.1 to map a rbd image ,but it dosen't work ,the error
 messages are:

 root@heaven:~#rbd map rbd/myimage --id admin
 2016-06-22 11:16:34.546623 7fc87ca53d80 -1 WARNING: the following
 dangerous
 and experimental features are enabled: bluestore,rocksdb
 2016-06-22 11:16:34.547166 7fc87ca53d80 -1 WARNING: the following
 dangerous
 and experimental features are enabled: bluestore,rocksdb
 2016-06-22 11:16:34.549018 7fc87ca53d80 -1 WARNING: the following
 dangerous
 and experimental features are enabled: bluestore,rocksdb
 rbd: sysfs write failed
 rbd: map failed: (5) Input/output error
>>>
>>>Anything in dmesg, or anywhere, about "feature set mismatch" ?
>>>
>>>http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client
>>>

 Could someone tell me what's wrong?
 Thanks!

 Kind Regards,
 Haitao Wang


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>
>>>
>>>
>>>--
>>>Cheers,
>>>Brad
>
>
>
>-- 
>Cheers,
>Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RadosGW and Openstack meters

2016-06-23 Thread magicb...@hotmail.com

Hi

is there any possibility to make RadosGW to account stats for something 
similar to "radosgw.objects.(incoming|outgoing).bytes" similar to 
Swift's meters storage.objects.(incoming|outgoing).bytes??



Thanks
J.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph 10.1.1 rbd map fail

2016-06-23 Thread Brad Hubbard
On Thu, Jun 23, 2016 at 6:38 PM, 王海涛  wrote:
> Hi, Brad:
> This is the output of "ceph osd crush show-tunables -f json-pretty"
> {
> "choose_local_tries": 0,
> "choose_local_fallback_tries": 0,
> "choose_total_tries": 50,
> "chooseleaf_descend_once": 1,
> "chooseleaf_vary_r": 1,
> "chooseleaf_stable": 0,
> "straw_calc_version": 1,
> "allowed_bucket_algs": 22,
> "profile": "firefly",
> "optimal_tunables": 0,
> "legacy_tunables": 0,
> "minimum_required_version": "firefly",
> "require_feature_tunables": 1,
> "require_feature_tunables2": 1,
> "has_v2_rules": 0,
> "require_feature_tunables3": 1,
> "has_v3_rules": 0,
> "has_v4_buckets": 0,
> "require_feature_tunables5": 0,
> "has_v5_rules": 0
> }
>
> The value of "require_feature_tunables3" is 1, I think It need to be 0 to
> make my rbd map success.
> So I set it to 0 by ceph osd crush tool, but it still dosn't work.
> Then I checked the rbd image info:
> rbd image 'myimage':
> size 1024 MB in 256 objects
> order 22 (4096 kB objects)
> block_name_prefix: rbd_data.5e3074b0dc51
> format: 2
> features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
> flags:
>
> It looks like that some of the features are not supported by my rbd kernel
> module.
> Because when I get rid of the last 4 features, and only keep the "layering"
> feature,
> the image seems to be mapped and used rightly.
>
> Thanks for your answer!

yw

>
> Kind Regards,
> Haitao Wang
>
> At 2016-06-23 09:51:02, "Brad Hubbard"  wrote:
>>On Wed, Jun 22, 2016 at 3:20 PM, 王海涛  wrote:
>>> I find this message in dmesg:
>>> [83090.212918] libceph: mon0 192.168.159.128:6789 feature set mismatch,
>>> my
>>> 4a042a42 < server's 2004a042a42, missing 200
>>>
>>> According to
>>>
>>> "http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client";,
>>> this could mean that I need to upgrade kernel client up to 3.15 or
>>> disable
>>> tunable 3 features.
>>> Our cluster is not convenient to upgrade.
>>> Could you tell me how to disable tunable 3 features?
>>
>>Can you show the output of the following command please?
>>
>># ceph osd crush show-tunables -f json-pretty
>>
>>I believe you'll need to use "ceph osd crush tunables " to adjust this.
>>
>>>
>>> Thanks!
>>>
>>> Kind Regards,
>>> Haitao Wang
>>>
>>>
>>> At 2016-06-22 12:33:42, "Brad Hubbard"  wrote:
On Wed, Jun 22, 2016 at 1:35 PM, 王海涛  wrote:
> Hi All
>
> I'm using ceph-10.1.1 to map a rbd image ,but it dosen't work ,the
> error
> messages are:
>
> root@heaven:~#rbd map rbd/myimage --id admin
> 2016-06-22 11:16:34.546623 7fc87ca53d80 -1 WARNING: the following
> dangerous
> and experimental features are enabled: bluestore,rocksdb
> 2016-06-22 11:16:34.547166 7fc87ca53d80 -1 WARNING: the following
> dangerous
> and experimental features are enabled: bluestore,rocksdb
> 2016-06-22 11:16:34.549018 7fc87ca53d80 -1 WARNING: the following
> dangerous
> and experimental features are enabled: bluestore,rocksdb
> rbd: sysfs write failed
> rbd: map failed: (5) Input/output error

Anything in dmesg, or anywhere, about "feature set mismatch" ?

http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client

>
> Could someone tell me what's wrong?
> Thanks!
>
> Kind Regards,
> Haitao Wang
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



--
Cheers,
Brad
>>
>>
>>
>>--
>>Cheers,
>>Brad



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW and Openstack meters

2016-06-23 Thread M Ranga Swami Reddy
Its already supported from openstack Kilo version of Ceilometer. We
have added only 6 meters from radogw.

url - 
https://blueprints.launchpad.net/ceilometer/+spec/ceph-ceilometer-integration

Thanks
Swami

On Thu, Jun 23, 2016 at 2:37 PM, magicb...@hotmail.com
 wrote:
> Hi
>
> is there any possibility to make RadosGW to account stats for something
> similar to "radosgw.objects.(incoming|outgoing).bytes" similar to Swift's
> meters storage.objects.(incoming|outgoing).bytes??
>
>
> Thanks
> J.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW and Openstack meters

2016-06-23 Thread c.y. lee
Hi,

Did you check Admin API of Rados gateway?

http://docs.ceph.com/docs/master/radosgw/adminops/#get-usage


On Thu, Jun 23, 2016 at 5:07 PM, magicb...@hotmail.com <
magicb...@hotmail.com> wrote:

> Hi
>
> is there any possibility to make RadosGW to account stats for something
> similar to "radosgw.objects.(incoming|outgoing).bytes" similar to Swift's
> meters storage.objects.(incoming|outgoing).bytes??
>
>
> Thanks
> J.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW and Openstack meters

2016-06-23 Thread magicb...@hotmail.com

Hi

yes, radosgw keeps stats but those stats aren't pushed into telemetry 
service I think..


On 23/06/16 11:55, c.y. lee wrote:

Hi,

Did you check Admin API of Rados gateway?

http://docs.ceph.com/docs/master/radosgw/adminops/#get-usage


On Thu, Jun 23, 2016 at 5:07 PM, magicb...@hotmail.com 
 > wrote:


Hi

is there any possibility to make RadosGW to account stats for
something similar to "radosgw.objects.(incoming|outgoing).bytes"
similar to Swift's meters storage.objects.(incoming|outgoing).bytes??


Thanks
J.
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW and Openstack meters

2016-06-23 Thread magicb...@hotmail.com

HI

I'm running liberty and ceph hammer, and these are my available meters:
* ceph.storage.objects
* ceph.storage.objects.size
* ceph.storage.objects.containers
* ceph.storage.containers.objects
* ceph.storage.containers.objects.size
* ceph.storage.api.request

I'd like to have some

"radosgw.objects.(incoming|outgoing).bytes" similar to Swift's



On 23/06/16 11:50, M Ranga Swami Reddy wrote:

Its already supported from openstack Kilo version of Ceilometer. We
have added only 6 meters from radogw.

url - 
https://blueprints.launchpad.net/ceilometer/+spec/ceph-ceilometer-integration

Thanks
Swami

On Thu, Jun 23, 2016 at 2:37 PM, magicb...@hotmail.com
 wrote:

Hi

is there any possibility to make RadosGW to account stats for something
similar to "radosgw.objects.(incoming|outgoing).bytes" similar to Swift's
meters storage.objects.(incoming|outgoing).bytes??


Thanks
J.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW and Openstack meters

2016-06-23 Thread M Ranga Swami Reddy
yes...use rgw admin APIs for getting the meters.

On Thu, Jun 23, 2016 at 3:25 PM, c.y. lee  wrote:
> Hi,
>
> Did you check Admin API of Rados gateway?
>
> http://docs.ceph.com/docs/master/radosgw/adminops/#get-usage
>
>
> On Thu, Jun 23, 2016 at 5:07 PM, magicb...@hotmail.com
>  wrote:
>>
>> Hi
>>
>> is there any possibility to make RadosGW to account stats for something
>> similar to "radosgw.objects.(incoming|outgoing).bytes" similar to Swift's
>> meters storage.objects.(incoming|outgoing).bytes??
>>
>>
>> Thanks
>> J.
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW and Openstack meters

2016-06-23 Thread M Ranga Swami Reddy
Those meters pushed into ceilometer...the patches already in from Kilo version..

On Thu, Jun 23, 2016 at 4:05 PM, magicb...@hotmail.com
 wrote:
> Hi
>
> yes, radosgw keeps stats but those stats aren't pushed into telemetry
> service I think..
>
>
> On 23/06/16 11:55, c.y. lee wrote:
>
> Hi,
>
> Did you check Admin API of Rados gateway?
>
> http://docs.ceph.com/docs/master/radosgw/adminops/#get-usage
>
>
> On Thu, Jun 23, 2016 at 5:07 PM, magicb...@hotmail.com
>  wrote:
>>
>> Hi
>>
>> is there any possibility to make RadosGW to account stats for something
>> similar to "radosgw.objects.(incoming|outgoing).bytes" similar to Swift's
>> meters storage.objects.(incoming|outgoing).bytes??
>>
>>
>> Thanks
>> J.
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Issues creating ceoh cluster in Calamari UI

2016-06-23 Thread Venkata Manojawa Paritala
Hi,

I am having issues adding RedHat Ceph (10.2.1) nodes to Calamari 1.3-7.
Below are more details.

1. On RHEL 7.2 VMs, configured Ceph (10.2.1) cluster with 3 mons and 19
osds .

2. Configured Calamari 1.3-7 on one node. Installation was done through
ICE_SETUP with ISO Image. Diamond packages were installed manually on all
the nodes.

3. Now, on the Calamari node, using "ceph-deploy calamari connect osd1
osd2.." I could add all the nodes (OSD/Mon) to calamari.

4. In calamari UI, I could see all the nodes. Now, when I try to create a
Ceph-cluster, the operation is getting timedout and I see the below message.

"New Calamari Installation

This appears to be the first time you have started Calamari and there are
no clusters currently configured.

9 Ceph servers are connected to Calamari, but no Ceph cluster has been
created yet. Please use ceph-deploy to create a cluster; please see the
Inktank Ceph Enterprise documentation for more details."

At this point of time Health of the cluster is ok (good).

Any idea what could be the issue? Appreciate your help at the earliest.

Thanks & Regards,
Manoj
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Regarding executing COSBench onto a specific pool

2016-06-23 Thread Venkata Manojawa Paritala
Gentle reminder on my question. It would be great if you can suggest us on
any work around for achieving the same.

Thanks & Regards,
Manoj

On Tue, Jun 21, 2016 at 5:25 PM, Venkata Manojawa Paritala <
manojaw...@vedams.com> wrote:

> Hi,
>
> In Ceph cluster, currently we are seeing that COSBench is writing IO to
> default pools that are created while configuring rados gw. Can you please
> let me know, if there is a way to execute IO (using COSBech) on a specific
> pool.
>
> Thanks & Regards,
> Manoj
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] about image's largest size

2016-06-23 Thread Adrien Gillard
A single image can be as large as you want, or at least as large as your
pool size.
But you want to take into consideration the maximum size allowed by the
filesystem on top of your volume and the maximum size supported by you OS
vendor if any.

And even if supported and even considered the resiliency of most linux
filesystems, I do not think having a filesystem of hundreds of TB or even
PB would be a good choice. Issues and errors just happen and you do not
want to put all your eggs in the same basket.

Also keep in mind that Ceph is a distributed storage platform, hence
performance is mainly achieved by parallel accesses to the cluster. So you
want many files accessed simultaneously on many volumes to maximize
performance.

On Thu, Jun 23, 2016 at 3:28 AM, Ops Cloud  wrote:

>
> We want to run a backup server, which has huge storage as backend.
> If we use rbd client to mount a block storage from ceph, for a single
> image, how large can it be? xxx TB or PB?
>
> Thank you.
>
> --
> Ops Cloud
> o...@19cloud.net
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] image map failed

2016-06-23 Thread Ishmael Tsoaela
Hi All,

I  have created an image but cannot map the image, anybody know what could
be the problem:



sudo rbd map data/data_01

rbd: sysfs write failed
RBD image feature set mismatch. You can disable features unsupported by the
kernel with "rbd feature disable".
In some cases useful info is found in syslog - try "dmesg | tail" or so.
rbd: map failed: (6) No such device or address



cluster_master@nodeC:~$ dmesg |tail
[89572.831725] libceph: client4227 fsid 70cc6b75-9f83-4c67-a1c4-4fe846b4849e
[89572.832413] libceph: mon0 155.232.195.4:6789 session established
[89573.042375] libceph: client4229 fsid 70cc6b75-9f83-4c67-a1c4-4fe846b4849e
[89573.043046] libceph: mon0 155.232.195.4:6789 session established



command to create image:

rbd create data_01 --size 102400 --pool data


cluster_master@nodeC:~$ rbd ls data
data_01


cluster_master@nodeC:~$ rbd --image data_01 -p data info
rbd image 'data_01':
size 102400 MB in 25600 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.105f2ae8944a
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:


cluster_master@nodeC:~$ ceph status
cluster 70cc6b75-9f83-4c67-a1c4-4fe846b4849e
 health HEALTH_OK
 monmap e1: 1 mons at {nodeB=155.232.195.4:6789/0}
election epoch 3, quorum 0 nodeB
 osdmap e17: 2 osds: 2 up, 2 in
flags sortbitwise
  pgmap v160: 192 pgs, 2 pools, 6454 bytes data, 5 objects
10311 MB used, 1851 GB / 1861 GB avail
 192 active+clean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] image map failed

2016-06-23 Thread Jason Dillaman
On Thu, Jun 23, 2016 at 10:16 AM, Ishmael Tsoaela  wrote:
> cluster_master@nodeC:~$ rbd --image data_01 -p data info
> rbd image 'data_01':
> size 102400 MB in 25600 objects
> order 22 (4096 kB objects)
> block_name_prefix: rbd_data.105f2ae8944a
> format: 2
> features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
> flags:


You need to disable the "exclusive-lock, object-map, fast-diff,
deep-flatten" features on the image.

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] image map failed

2016-06-23 Thread Ishmael Tsoaela
it worked thanks:

cluster_master@nodeC:~$ sudo rbd map data/data_01
/dev/rbd0



On Thu, Jun 23, 2016 at 4:37 PM, Jason Dillaman  wrote:

> On Thu, Jun 23, 2016 at 10:16 AM, Ishmael Tsoaela 
> wrote:
> > cluster_master@nodeC:~$ rbd --image data_01 -p data info
> > rbd image 'data_01':
> > size 102400 MB in 25600 objects
> > order 22 (4096 kB objects)
> > block_name_prefix: rbd_data.105f2ae8944a
> > format: 2
> > features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
> > flags:
>
>
> You need to disable the "exclusive-lock, object-map, fast-diff,
> deep-flatten" features on the image.
>
> --
> Jason
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Tech Talks: Bluestore

2016-06-23 Thread Patrick McGarry
Hey cephers,

If you missed Tuesday’s Ceph Tech Talk by Sage on the new Bluestore
backend for Ceph, it is now available on Youtube:
https://youtu.be/kuacS4jw5pM

We love it when our community shares what they are doing, so if you
would like to give a Ceph Tech Talk in the future, please drop me a
line. Thanks.

-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSDs down following ceph-deploy guide

2016-06-23 Thread Dimitris Bozelos
Hello,

Trying out Ceph for the first time, following the installation guide using
ceph-deploy. All goes well, "ceph -s" reports health as ok at the
beginning, but shortly after it shows all placement groups as inactive, and
the 2 osds are down and out.

I understand this could be for a variety of reasons. Quick question, I've
read on another mail that you have to manually mount  the partitions on
reboot and that this is not mentioned on the guide. I am testing this on a
cloud provider and I wanted to test Ceph with 2 servers on the local
filesystem e.g. I just created folders under /var/local/osd0 as in the
tutorial without attaching a storage volume to the host and mounting it. Is
this possible, or does Ceph always require partitions to be used for osds?

That would be a reason causing this failure.

Cheers
Dimitris
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Rados error calling trunc on erasure coded pool ENOTSUP

2016-06-23 Thread Wyatt Rivers
Hello,
  I am using a script that calls the librados ioctx.trunc() method, and I
am getting an
 errno ENOTSUP

I can read/write to the pool, and I can call the trunc method on a
replicated pool.

I am using version 0.80.7

I am just wondering if this is intended? Or is there maybe something wrong
with my setup ?
 I spent some time looking at the source but couldn't really find anything
that pointed to this error
directly.

Thank you
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Strange behavior in Hammer

2016-06-23 Thread Rick Stehno
When doing FIO RBD benchmarking using 94.7 on Ubuntu 14.04 using 10 SSD/OSD
and with/and without journals on separate SSDs, I get an even distribution
of IO to the OSD and to the journals (if used).

If I drop the # of OSD's down to 8, the IO to the journals is skewed by
40%, meaning 1 journal is doing 40% more IO than the other journal. Same
test, same 32 iodepth, performance drops because 1 journal is 90% busy and
the other journal is 40% busy.

I did another set of tests putting the journals on a single PCIe flash
card. Using 10 ssd/osd's, the ceph ops is very consistent. If I drop the #
of OSD's down to 8, the ceph ops varies from 28,000 to 10,000 and
performance drops.

The ssd's and pcie card operate without any errors and the appropriate
device tuning has been applied. Ceph health is OK, no errors at the system
or ceph level.

Any ideas what could be cuasing this behavior?
Thanks

Rick Stehno
Sr. Database and Ceph Performance Architect  @ Seagate
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dramatic performance drop at certain number of objects in pool

2016-06-23 Thread Warren Wang - ISD
vm.vfs_cache_pressure = 100

Go the other direction on that. You¹ll want to keep it low to help keep
inode/dentry info in memory. We use 10, and haven¹t had a problem.


Warren Wang




On 6/22/16, 9:41 PM, "Wade Holler"  wrote:

>Blairo,
>
>We'll speak in pre-replication numbers, replication for this pool is 3.
>
>23.3 Million Objects / OSD
>pg_num 2048
>16 OSDs / Server
>3 Servers
>660 GB RAM Total, 179 GB Used (free -t) / Server
>vm.swappiness = 1
>vm.vfs_cache_pressure = 100
>
>Workload is native librados with python.  ALL 4k objects.
>
>Best Regards,
>Wade
>
>
>On Wed, Jun 22, 2016 at 9:33 PM, Blair Bethwaite
> wrote:
>> Wade, good to know.
>>
>> For the record, what does this work out to roughly per OSD? And how
>> much RAM and how many PGs per OSD do you have?
>>
>> What's your workload? I wonder whether for certain workloads (e.g.
>> RBD) it's better to increase default object size somewhat before
>> pushing the split/merge up a lot...
>>
>> Cheers,
>>
>> On 23 June 2016 at 11:26, Wade Holler  wrote:
>>> Based on everyones suggestions; The first modification to 50 / 16
>>> enabled our config to get to ~645Mill objects before the behavior in
>>> question was observed (~330 was the previous ceiling).  Subsequent
>>> modification to 50 / 24 has enabled us to get to 1.1 Billion+
>>>
>>> Thank you all very much for your support and assistance.
>>>
>>> Best Regards,
>>> Wade
>>>
>>>
>>> On Mon, Jun 20, 2016 at 6:58 PM, Christian Balzer 
>>>wrote:

 Hello,

 On Mon, 20 Jun 2016 20:47:32 + Warren Wang - ISD wrote:

> Sorry, late to the party here. I agree, up the merge and split
> thresholds. We're as high as 50/12. I chimed in on an RH ticket here.
> One of those things you just have to find out as an operator since
>it's
> not well documented :(
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1219974
>
> We have over 200 million objects in this cluster, and it's still
>doing
> over 15000 write IOPS all day long with 302 spinning drives + SATA
>SSD
> journals. Having enough memory and dropping your vfs_cache_pressure
> should also help.
>
 Indeed.

 Since it was asked in that bug report and also my first suspicion, it
 would probably be good time to clarify that it isn't the splits that
cause
 the performance degradation, but the resulting inflation of dir
entries
 and exhaustion of SLAB and thus having to go to disk for things that
 normally would be in memory.

 Looking at Blair's graph from yesterday pretty much makes that clear,
a
 purely split caused degradation should have relented much quicker.


> Keep in mind that if you change the values, it won't take effect
> immediately. It only merges them back if the directory is under the
> calculated threshold and a write occurs (maybe a read, I forget).
>
 If it's a read a plain scrub might do the trick.

 Christian
> Warren
>
>
> From: ceph-users
> 
>mailto:ceph-users-boun...@lists.cep
>h.com>>
> on behalf of Wade Holler
> mailto:wade.hol...@gmail.com>> Date: Monday,
>June
> 20, 2016 at 2:48 PM To: Blair Bethwaite
> mailto:blair.bethwa...@gmail.com>>, Wido
>den
> Hollander mailto:w...@42on.com>> Cc: Ceph Development
> mailto:ceph-de...@vger.kernel.org>>,
> "ceph-users@lists.ceph.com"
> mailto:ceph-users@lists.ceph.com>>
>Subject:
> Re: [ceph-users] Dramatic performance drop at certain number of
>objects
> in pool
>
> Thanks everyone for your replies.  I sincerely appreciate it. We are
> testing with different pg_num and filestore_split_multiple settings.
> Early indications are  well not great. Regardless it is nice to
> understand the symptoms better so we try to design around it.
>
> Best Regards,
> Wade
>
>
> On Mon, Jun 20, 2016 at 2:32 AM Blair Bethwaite
> mailto:blair.bethwa...@gmail.com>> wrote:
>On
> 20 June 2016 at 09:21, Blair Bethwaite
> mailto:blair.bethwa...@gmail.com>> wrote:
> > slow request issues). If you watch your xfs stats you'll likely get
> > further confirmation. In my experience xs_dir_lookups balloons
>(which
> > means directory lookups are missing cache and going to disk).
>
> Murphy's a bitch. Today we upgraded a cluster to latest Hammer in
> preparation for Jewel/RHCS2. Turns out when we last hit this very
> problem we had only ephemerally set the new filestore merge/split
> values - oops. Here's what started happening when we upgraded and
> restarted a bunch of OSDs:
> 
>https://au-east.erc.monash.edu.au/swift/v1/public/grafana-ceph-xs_dir_
>lookup.png
>
> Seemed to cause lots of slow requests :-/. We corrected it about
> 12:30, then still took a while to settle.
>
> --
> Cheers,
> ~Blairo
>

Re: [ceph-users] Dramatic performance drop at certain number of objects in pool

2016-06-23 Thread Somnath Roy
Or even vm.vfs_cache_pressure = 0 if you have sufficient memory to *pin* 
inode/dentries in memory.
We are using that for long now (with 128 TB node memory) and it seems helping 
specially for the random write workload and saving xattrs read in between.

Thanks & Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Warren 
Wang - ISD
Sent: Thursday, June 23, 2016 3:09 PM
To: Wade Holler; Blair Bethwaite
Cc: Ceph Development; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Dramatic performance drop at certain number of 
objects in pool

vm.vfs_cache_pressure = 100

Go the other direction on that. You¹ll want to keep it low to help keep 
inode/dentry info in memory. We use 10, and haven¹t had a problem.


Warren Wang




On 6/22/16, 9:41 PM, "Wade Holler"  wrote:

>Blairo,
>
>We'll speak in pre-replication numbers, replication for this pool is 3.
>
>23.3 Million Objects / OSD
>pg_num 2048
>16 OSDs / Server
>3 Servers
>660 GB RAM Total, 179 GB Used (free -t) / Server vm.swappiness = 1
>vm.vfs_cache_pressure = 100
>
>Workload is native librados with python.  ALL 4k objects.
>
>Best Regards,
>Wade
>
>
>On Wed, Jun 22, 2016 at 9:33 PM, Blair Bethwaite
> wrote:
>> Wade, good to know.
>>
>> For the record, what does this work out to roughly per OSD? And how
>> much RAM and how many PGs per OSD do you have?
>>
>> What's your workload? I wonder whether for certain workloads (e.g.
>> RBD) it's better to increase default object size somewhat before
>> pushing the split/merge up a lot...
>>
>> Cheers,
>>
>> On 23 June 2016 at 11:26, Wade Holler  wrote:
>>> Based on everyones suggestions; The first modification to 50 / 16
>>> enabled our config to get to ~645Mill objects before the behavior in
>>> question was observed (~330 was the previous ceiling).  Subsequent
>>> modification to 50 / 24 has enabled us to get to 1.1 Billion+
>>>
>>> Thank you all very much for your support and assistance.
>>>
>>> Best Regards,
>>> Wade
>>>
>>>
>>> On Mon, Jun 20, 2016 at 6:58 PM, Christian Balzer 
>>>wrote:

 Hello,

 On Mon, 20 Jun 2016 20:47:32 + Warren Wang - ISD wrote:

> Sorry, late to the party here. I agree, up the merge and split
>thresholds. We're as high as 50/12. I chimed in on an RH ticket here.
> One of those things you just have to find out as an operator since
>it's  not well documented :(
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1219974
>
> We have over 200 million objects in this cluster, and it's still
>doing  over 15000 write IOPS all day long with 302 spinning drives
>+ SATA SSD  journals. Having enough memory and dropping your
>vfs_cache_pressure  should also help.
>
 Indeed.

 Since it was asked in that bug report and also my first suspicion,
it  would probably be good time to clarify that it isn't the splits
that cause  the performance degradation, but the resulting inflation
of dir entries  and exhaustion of SLAB and thus having to go to disk
for things that  normally would be in memory.

 Looking at Blair's graph from yesterday pretty much makes that
clear, a  purely split caused degradation should have relented much
quicker.


> Keep in mind that if you change the values, it won't take effect
> immediately. It only merges them back if the directory is under
> the calculated threshold and a write occurs (maybe a read, I forget).
>
 If it's a read a plain scrub might do the trick.

 Christian
> Warren
>
>
> From: ceph-users
>
>mailto:ceph-users-bounces@lists.
>cep
>h.com>>
> on behalf of Wade Holler
> mailto:wade.hol...@gmail.com>> Date:
>Monday, June  20, 2016 at 2:48 PM To: Blair Bethwaite
>mailto:blair.bethwa...@gmail.com>>, Wido
>den  Hollander mailto:w...@42on.com>> Cc: Ceph
>Development
>mailto:ceph-de...@vger.kernel.org>>,
> "ceph-users@lists.ceph.com"
> mailto:ceph-users@lists.ceph.com>>
>Subject:
> Re: [ceph-users] Dramatic performance drop at certain number of
>objects  in pool
>
> Thanks everyone for your replies.  I sincerely appreciate it. We
> are testing with different pg_num and filestore_split_multiple settings.
> Early indications are  well not great. Regardless it is nice
> to understand the symptoms better so we try to design around it.
>
> Best Regards,
> Wade
>
>
> On Mon, Jun 20, 2016 at 2:32 AM Blair Bethwaite
>mailto:blair.bethwa...@gmail.com>> wrote:
>On
> 20 June 2016 at 09:21, Blair Bethwaite
>mailto:blair.bethwa...@gmail.com>> wrote:
> > slow request issues). If you watch your xfs stats you'll likely
> > get further confirmation. In my experience xs_dir_lookups
> > balloons
>(which
> > means directory lookups are missing cache and going to disk).
>
> Murphy

Re: [ceph-users] Dramatic performance drop at certain number of objects in pool

2016-06-23 Thread Christian Balzer

Hello,

On Thu, 23 Jun 2016 22:24:59 + Somnath Roy wrote:

> Or even vm.vfs_cache_pressure = 0 if you have sufficient memory to *pin*
> inode/dentries in memory. We are using that for long now (with 128 TB
> node memory) and it seems helping specially for the random write
> workload and saving xattrs read in between.
>
128TB node memory, really?
Can I have some of those, too? ^o^
And here I was thinking that Wade's 660GB machines were on the excessive
side.

There's something to be said (and optimized) when your storage nodes have
the same or more RAM as your compute nodes...

As for Warren, well spotted. 
I personally use vm.vfs_cache_pressure = 1, this avoids the potential
fireworks if your memory is really needed elsewhere, while keeping things
in memory normally. 

Christian

> Thanks & Regards
> Somnath
> 
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Warren Wang - ISD Sent: Thursday, June 23, 2016 3:09 PM
> To: Wade Holler; Blair Bethwaite
> Cc: Ceph Development; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Dramatic performance drop at certain number of
> objects in pool
> 
> vm.vfs_cache_pressure = 100
> 
> Go the other direction on that. You¹ll want to keep it low to help keep
> inode/dentry info in memory. We use 10, and haven¹t had a problem.
> 
> 
> Warren Wang
> 
> 
> 
> 
> On 6/22/16, 9:41 PM, "Wade Holler"  wrote:
> 
> >Blairo,
> >
> >We'll speak in pre-replication numbers, replication for this pool is 3.
> >
> >23.3 Million Objects / OSD
> >pg_num 2048
> >16 OSDs / Server
> >3 Servers
> >660 GB RAM Total, 179 GB Used (free -t) / Server vm.swappiness = 1
> >vm.vfs_cache_pressure = 100
> >
> >Workload is native librados with python.  ALL 4k objects.
> >
> >Best Regards,
> >Wade
> >
> >
> >On Wed, Jun 22, 2016 at 9:33 PM, Blair Bethwaite
> > wrote:
> >> Wade, good to know.
> >>
> >> For the record, what does this work out to roughly per OSD? And how
> >> much RAM and how many PGs per OSD do you have?
> >>
> >> What's your workload? I wonder whether for certain workloads (e.g.
> >> RBD) it's better to increase default object size somewhat before
> >> pushing the split/merge up a lot...
> >>
> >> Cheers,
> >>
> >> On 23 June 2016 at 11:26, Wade Holler  wrote:
> >>> Based on everyones suggestions; The first modification to 50 / 16
> >>> enabled our config to get to ~645Mill objects before the behavior in
> >>> question was observed (~330 was the previous ceiling).  Subsequent
> >>> modification to 50 / 24 has enabled us to get to 1.1 Billion+
> >>>
> >>> Thank you all very much for your support and assistance.
> >>>
> >>> Best Regards,
> >>> Wade
> >>>
> >>>
> >>> On Mon, Jun 20, 2016 at 6:58 PM, Christian Balzer 
> >>>wrote:
> 
>  Hello,
> 
>  On Mon, 20 Jun 2016 20:47:32 + Warren Wang - ISD wrote:
> 
> > Sorry, late to the party here. I agree, up the merge and split
> >thresholds. We're as high as 50/12. I chimed in on an RH ticket
> >here.
> > One of those things you just have to find out as an operator since
> >it's  not well documented :(
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=1219974
> >
> > We have over 200 million objects in this cluster, and it's still
> >doing  over 15000 write IOPS all day long with 302 spinning drives
> >+ SATA SSD  journals. Having enough memory and dropping your
> >vfs_cache_pressure  should also help.
> >
>  Indeed.
> 
>  Since it was asked in that bug report and also my first suspicion,
> it  would probably be good time to clarify that it isn't the splits
> that cause  the performance degradation, but the resulting inflation
> of dir entries  and exhaustion of SLAB and thus having to go to disk
> for things that  normally would be in memory.
> 
>  Looking at Blair's graph from yesterday pretty much makes that
> clear, a  purely split caused degradation should have relented much
> quicker.
> 
> 
> > Keep in mind that if you change the values, it won't take effect
> > immediately. It only merges them back if the directory is under
> > the calculated threshold and a write occurs (maybe a read, I
> > forget).
> >
>  If it's a read a plain scrub might do the trick.
> 
>  Christian
> > Warren
> >
> >
> > From: ceph-users
> >
> >mailto:ceph-users-bounces@lists.
> >cep
> >h.com>>
> > on behalf of Wade Holler
> > mailto:wade.hol...@gmail.com>> Date:
> >Monday, June  20, 2016 at 2:48 PM To: Blair Bethwaite
> >mailto:blair.bethwa...@gmail.com>>, Wido
> >den  Hollander mailto:w...@42on.com>> Cc: Ceph
> >Development
> >mailto:ceph-de...@vger.kernel.org>>,
> > "ceph-users@lists.ceph.com"
> > mailto:ceph-users@lists.ceph.com>>
> >Subject:
> > Re: [ceph-users] Dramatic performance drop at certain number of
> >objects  in pool

Re: [ceph-users] Dramatic performance drop at certain number of objects in pool

2016-06-23 Thread Somnath Roy
Oops , typo , 128 GB :-)...

-Original Message-
From: Christian Balzer [mailto:ch...@gol.com]
Sent: Thursday, June 23, 2016 5:08 PM
To: ceph-users@lists.ceph.com
Cc: Somnath Roy; Warren Wang - ISD; Wade Holler; Blair Bethwaite; Ceph 
Development
Subject: Re: [ceph-users] Dramatic performance drop at certain number of 
objects in pool


Hello,

On Thu, 23 Jun 2016 22:24:59 + Somnath Roy wrote:

> Or even vm.vfs_cache_pressure = 0 if you have sufficient memory to
> *pin* inode/dentries in memory. We are using that for long now (with
> 128 TB node memory) and it seems helping specially for the random
> write workload and saving xattrs read in between.
>
128TB node memory, really?
Can I have some of those, too? ^o^
And here I was thinking that Wade's 660GB machines were on the excessive side.

There's something to be said (and optimized) when your storage nodes have the 
same or more RAM as your compute nodes...

As for Warren, well spotted.
I personally use vm.vfs_cache_pressure = 1, this avoids the potential fireworks 
if your memory is really needed elsewhere, while keeping things in memory 
normally.

Christian

> Thanks & Regards
> Somnath
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> Of Warren Wang - ISD Sent: Thursday, June 23, 2016 3:09 PM
> To: Wade Holler; Blair Bethwaite
> Cc: Ceph Development; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Dramatic performance drop at certain number
> of objects in pool
>
> vm.vfs_cache_pressure = 100
>
> Go the other direction on that. You¹ll want to keep it low to help
> keep inode/dentry info in memory. We use 10, and haven¹t had a problem.
>
>
> Warren Wang
>
>
>
>
> On 6/22/16, 9:41 PM, "Wade Holler"  wrote:
>
> >Blairo,
> >
> >We'll speak in pre-replication numbers, replication for this pool is 3.
> >
> >23.3 Million Objects / OSD
> >pg_num 2048
> >16 OSDs / Server
> >3 Servers
> >660 GB RAM Total, 179 GB Used (free -t) / Server vm.swappiness = 1
> >vm.vfs_cache_pressure = 100
> >
> >Workload is native librados with python.  ALL 4k objects.
> >
> >Best Regards,
> >Wade
> >
> >
> >On Wed, Jun 22, 2016 at 9:33 PM, Blair Bethwaite
> > wrote:
> >> Wade, good to know.
> >>
> >> For the record, what does this work out to roughly per OSD? And how
> >> much RAM and how many PGs per OSD do you have?
> >>
> >> What's your workload? I wonder whether for certain workloads (e.g.
> >> RBD) it's better to increase default object size somewhat before
> >> pushing the split/merge up a lot...
> >>
> >> Cheers,
> >>
> >> On 23 June 2016 at 11:26, Wade Holler  wrote:
> >>> Based on everyones suggestions; The first modification to 50 / 16
> >>> enabled our config to get to ~645Mill objects before the behavior
> >>> in question was observed (~330 was the previous ceiling).
> >>> Subsequent modification to 50 / 24 has enabled us to get to 1.1
> >>> Billion+
> >>>
> >>> Thank you all very much for your support and assistance.
> >>>
> >>> Best Regards,
> >>> Wade
> >>>
> >>>
> >>> On Mon, Jun 20, 2016 at 6:58 PM, Christian Balzer 
> >>>wrote:
> 
>  Hello,
> 
>  On Mon, 20 Jun 2016 20:47:32 + Warren Wang - ISD wrote:
> 
> > Sorry, late to the party here. I agree, up the merge and split
> >thresholds. We're as high as 50/12. I chimed in on an RH ticket
> >here.
> > One of those things you just have to find out as an operator
> >since it's  not well documented :(
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=1219974
> >
> > We have over 200 million objects in this cluster, and it's still
> >doing  over 15000 write IOPS all day long with 302 spinning
> >drives
> >+ SATA SSD  journals. Having enough memory and dropping your
> >vfs_cache_pressure  should also help.
> >
>  Indeed.
> 
>  Since it was asked in that bug report and also my first
> suspicion, it  would probably be good time to clarify that it
> isn't the splits that cause  the performance degradation, but the
> resulting inflation of dir entries  and exhaustion of SLAB and
> thus having to go to disk for things that  normally would be in memory.
> 
>  Looking at Blair's graph from yesterday pretty much makes that
> clear, a  purely split caused degradation should have relented
> much quicker.
> 
> 
> > Keep in mind that if you change the values, it won't take effect
> > immediately. It only merges them back if the directory is under
> > the calculated threshold and a write occurs (maybe a read, I
> > forget).
> >
>  If it's a read a plain scrub might do the trick.
> 
>  Christian
> > Warren
> >
> >
> > From: ceph-users
> >
> >mailto:ceph-users-bounces@lists.
> >cep
> >h.com>>
> > on behalf of Wade Holler
> > mailto:wade.hol...@gmail.com>> Date:
> >Monday, June  20, 2016 at 2:48 PM To: Blair Bethwaite
> >mailto:blair.bethwa...@gmail.com>>,

[ceph-users] ceph pg level IO sequence

2016-06-23 Thread min fang
Hi, as my understanding, in PG level, IOs are execute in a sequential way,
such as the following cases:

Case 1:
Write A, Write B, Write C to the same data area in a PG --> A Committed,
then B committed, then C.  The final data will from write C. Impossible
that mixed (A, B,C) data is in the data area.

Case 2:
Write A, Write B, Read C to the same data area in a PG-> Read C will return
the data from Write B, not Write A.

Are the above cases true?

thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph pg level IO sequence

2016-06-23 Thread Anand Bhat
Correct. This is guaranteed.

Regards,
Anand

On Fri, Jun 24, 2016 at 10:37 AM, min fang  wrote:

> Hi, as my understanding, in PG level, IOs are execute in a sequential way,
> such as the following cases:
>
> Case 1:
> Write A, Write B, Write C to the same data area in a PG --> A Committed,
> then B committed, then C.  The final data will from write C. Impossible
> that mixed (A, B,C) data is in the data area.
>
> Case 2:
> Write A, Write B, Read C to the same data area in a PG-> Read C will
> return the data from Write B, not Write A.
>
> Are the above cases true?
>
> thanks.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Never say never.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com