Hi Mark Yes enough PG and no error on Apache logs We identified some bottleneck on bucket index with huge IOPs on one OSD (IOPs is done on only 1 bucket)
With bucket sharding (32) configured write IOPs us now 5x better (and after bucket delete/create). But we don't yet reach Firefly performance RedHat case in progress. I will share later with community Sent from my iPhone > On 22 juil. 2015, at 08:20, Mark Nelson <mnel...@redhat.com> wrote: > > Ok, > > So good news that RADOS appears to be doing well. I'd say next is to follow > some of the recommendations here: > > http://ceph.com/docs/master/radosgw/troubleshooting/ > > If you examine the objecter_requests and perfcounters during your cosbench > write test, it might help explain where the requests are backing up. Another > thing to look for (as noted in the above URL) are HTTP errors in the apache > logs (if relevant). > > Other general thoughts: When you upgraded to hammer did you change the RGW > configuration at all? Are you using civetweb now? Does the rgw.buckets pool > have enough PGs? > > > Mark > >> On 07/21/2015 08:17 PM, Florent MONTHEL wrote: >> Hi Mark >> >> I've something like 600 write IOPs on EC pool and 800 write IOPs on >> replicated 3 pool with rados bench >> >> With Radosgw I have 30/40 write IOPs with Cosbench (1 radosgw- the same >> with 2) and servers are sleeping : >> - 0.005 core for radosgw process >> - 0.01 core for osd process >> >> I don't know if we can have .rgw* pool locking or something like that with >> Hammer (or situation specific to me) >> >> On 100% read profile, Radosgw and Ceph servers are working very well with >> more than 6000 IOPs on one radosgw server : >> - 7 cores for radosgw process >> - 1 core for each osd process >> - 0,5 core for each Apache process >> >> Thanks >> >> Sent from my iPhone >> >>> On 14 juil. 2015, at 21:03, Mark Nelson <mnel...@redhat.com> wrote: >>> >>> Hi Florent, >>> >>> 10x degradation is definitely unusual! A couple of things to look at: >>> >>> Are 8K rados bench writes to the rgw.buckets pool slow? You can with >>> something like: >>> >>> rados -p rgw.buckets bench 30 write -t 256 -b 8192 >>> >>> You may also want to try targeting a specific RGW server to make sure the >>> RR-DNS setup isn't interfering (at least while debugging). It may also be >>> worth creating a new replicated pool and try writes to that pool as well to >>> see if you see much difference. >>> >>> Mark >>> >>>> On 07/14/2015 07:17 PM, Florent MONTHEL wrote: >>>> Yes of course thanks Mark >>>> >>>> Infrastructure : 5 servers with 10 sata disks (50 osd at all) - 10gb >>>> connected - EC 2+1 on rgw.buckets pool - 2 radosgw RR-DNS like installed >>>> on 2 cluster servers >>>> No SSD drives used >>>> >>>> We're using Cosbench to send : >>>> - 8k object size : 100% read with 256 workers : better results with Hammer >>>> - 8k object size : 80% read - 20% write with 256 workers : real >>>> degradation between Firefly and Hammer (divided by something like 10) >>>> - 8k object size : 100% write with 256 workers : real degradation between >>>> Firefly and Hammer (divided by something like 10) >>>> >>>> Thanks >>>> >>>> Sent from my iPhone >>>> >>>>>> On 14 juil. 2015, at 19:57, Mark Nelson <mnel...@redhat.com> wrote: >>>>>> >>>>>> On 07/14/2015 06:42 PM, Florent MONTHEL wrote: >>>>>> Hi All, >>>>>> >>>>>> I've just upgraded Ceph cluster from Firefly 0.80.8 (Redhat Ceph 1.2.3) >>>>>> to Hammer (Redhat Ceph 1.3) - Usage : radosgw with Apache 2.4.19 on MPM >>>>>> prefork mode >>>>>> I'm experiencing huge write performance degradation just after upgrade >>>>>> (Cosbench). >>>>>> >>>>>> Do you already run performance tests between Hammer and Firefly ? >>>>>> >>>>>> No problem with read performance that was amazing >>>>> >>>>> Hi Florent, >>>>> >>>>> Can you talk a little bit about how your write tests are setup? How many >>>>> concurrent IOs and what size? Also, do you see similar problems with >>>>> rados bench? >>>>> >>>>> We have done some testing and haven't seen significant performance >>>>> degradation except when switching to civetweb which appears to perform >>>>> deletes more slowly than what we saw with apache+fcgi. >>>>> >>>>> Mark >>>>> >>>>>> >>>>>> >>>>>> Sent from my iPhone >>>>>> _______________________________________________ >>>>>> ceph-users mailing list >>>>>> ceph-users@lists.ceph.com >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@lists.ceph.com >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com