Re: [ceph-users] Performance dégradation after upgrade to hammer

Florent MONTHEL Wed, 22 Jul 2015 19:13:07 -0700

Hi Mark

Yes enough PG and no error on Apache logs
We identified some bottleneck on bucket index with huge IOPs on one OSD (IOPs 
is done on only 1 bucket)


With bucket sharding (32) configured write IOPs us now 5x better (and after 
bucket delete/create). But we don't yet reach Firefly performance

RedHat case in progress. I will share later with community 

Sent from my iPhone

> On 22 juil. 2015, at 08:20, Mark Nelson <mnel...@redhat.com> wrote:
> 
> Ok,
> 
> So good news that RADOS appears to be doing well.  I'd say next is to follow 
> some of the recommendations here:
> 
> http://ceph.com/docs/master/radosgw/troubleshooting/
> 
> If you examine the objecter_requests and perfcounters during your cosbench 
> write test, it might help explain where the requests are backing up.  Another 
> thing to look for (as noted in the above URL) are HTTP errors in the apache 
> logs (if relevant).
> 
> Other general thoughts:  When you upgraded to hammer did you change the RGW 
> configuration at all?  Are you using civetweb now?  Does the rgw.buckets pool 
> have enough PGs?
> 
> 
> Mark
> 
>> On 07/21/2015 08:17 PM, Florent MONTHEL wrote:
>> Hi Mark
>> 
>> I've something like 600 write IOPs on EC pool and 800 write IOPs on 
>> replicated 3 pool with rados bench
>> 
>> With  Radosgw  I have 30/40 write IOPs with Cosbench (1 radosgw- the same 
>> with 2) and servers are sleeping :
>> - 0.005 core for radosgw process
>> - 0.01 core for osd process
>> 
>> I don't know if we can have .rgw* pool locking or something like that with 
>> Hammer (or situation specific to me)
>> 
>> On 100% read profile, Radosgw and Ceph servers are working very well with 
>> more than 6000 IOPs on one radosgw server :
>> - 7 cores for radosgw process
>> - 1 core for each osd process
>> - 0,5 core for each Apache process
>> 
>> Thanks
>> 
>> Sent from my iPhone
>> 
>>> On 14 juil. 2015, at 21:03, Mark Nelson <mnel...@redhat.com> wrote:
>>> 
>>> Hi Florent,
>>> 
>>> 10x degradation is definitely unusual!  A couple of things to look at:
>>> 
>>> Are 8K rados bench writes to the rgw.buckets pool slow?  You can with 
>>> something like:
>>> 
>>> rados -p rgw.buckets bench 30 write -t 256 -b 8192
>>> 
>>> You may also want to try targeting a specific RGW server to make sure the 
>>> RR-DNS setup isn't interfering (at least while debugging).  It may also be 
>>> worth creating a new replicated pool and try writes to that pool as well to 
>>> see if you see much difference.
>>> 
>>> Mark
>>> 
>>>> On 07/14/2015 07:17 PM, Florent MONTHEL wrote:
>>>> Yes of course thanks Mark
>>>> 
>>>> Infrastructure : 5 servers with 10 sata disks (50 osd at all) - 10gb 
>>>> connected - EC 2+1 on rgw.buckets pool - 2 radosgw RR-DNS like installed 
>>>> on 2 cluster servers
>>>> No SSD drives used
>>>> 
>>>> We're using Cosbench to send :
>>>> - 8k object size : 100% read with 256 workers : better results with Hammer
>>>>  - 8k object size : 80% read - 20% write with 256 workers : real 
>>>> degradation between Firefly and Hammer (divided by something like 10)
>>>> - 8k object size : 100% write with 256 workers : real degradation between 
>>>> Firefly and Hammer (divided by something like 10)
>>>> 
>>>> Thanks
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>>> On 14 juil. 2015, at 19:57, Mark Nelson <mnel...@redhat.com> wrote:
>>>>>> 
>>>>>> On 07/14/2015 06:42 PM, Florent MONTHEL wrote:
>>>>>> Hi All,
>>>>>> 
>>>>>> I've just upgraded Ceph cluster from Firefly 0.80.8 (Redhat Ceph 1.2.3) 
>>>>>> to Hammer (Redhat Ceph 1.3) - Usage : radosgw with Apache 2.4.19 on MPM 
>>>>>> prefork mode
>>>>>> I'm experiencing huge write performance degradation just after upgrade 
>>>>>> (Cosbench).
>>>>>> 
>>>>>> Do you already run performance tests between Hammer and Firefly ?
>>>>>> 
>>>>>> No problem with read performance that was amazing
>>>>> 
>>>>> Hi Florent,
>>>>> 
>>>>> Can you talk a little bit about how your write tests are setup?  How many 
>>>>> concurrent IOs and what size?  Also, do you see similar problems with 
>>>>> rados bench?
>>>>> 
>>>>> We have done some testing and haven't seen significant performance 
>>>>> degradation except when switching to civetweb which appears to perform 
>>>>> deletes more slowly than what we saw with apache+fcgi.
>>>>> 
>>>>> Mark
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@lists.ceph.com
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Performance dégradation after upgrade to hammer

Reply via email to