Re: [ceph-users] How to detect journal problems

Josef Johansson Tue, 08 Apr 2014 01:32:29 -0700

On 08/04/14 10:04, Christian Balzer wrote:
> Hello,
>
> On Tue, 08 Apr 2014 09:31:18 +0200 Josef Johansson wrote:
>
>> Hi all,
>>
>> I am currently benchmarking a standard setup with Intel DC S3700 disks
>> as journals and Hitachi 4TB-disks as data-drives, all on LACP 10GbE
>> network.
>>
> Unless that is the 400GB version of the DC3700, you're already limiting
> yourself to 365MB/s throughput with the 200GB variant.
> If sequential write speed is that important to you and you think you'll
> ever get those 5 HDs to write at full speed with Ceph (unlikely).
>  
It's the 400GB version of the DC3700, and yes, I'm aware that I need a
1:3 ratio to max out these disks, as they write sequential data at about
150MB/s.
But our thoughts are that it would cover the current demand with a 1:5
ratio, but we could upgrade.
>> The size of my journals are 25GB each, and I have two journals per
>> machine, with 5 OSDs per journal, with 5 machines in total. We currently
>> use the tunables optimal and the version of ceph is the latest dumpling.
>>
>> Benchmarking writes with rbd show that there's no problem hitting upper
>> levels on the 4TB-disks with sequential data, thus maxing out 10GbE. At
>> this moment we see full utilization on the journals.
>>
>> But lowering the byte-size to 4k shows that the journals are utilized to
>> about 20%, and the 4TB-disks 100%. (rados -p <pool> -b 4096 -t 256 100
>> write)
>>
> When you're saying utilization I assume you're talking about iostat or
> atop output?
Yes, the utilization is iostat.
> That's not a bug, that's comparing apple to oranges.
You mean comparing iostat-results with the ones from rados benchmark?
> The rados bench default is 4MB, which not only happens to be the default
> RBD objectsize but also to generate a nice amount of bandwidth. 
>
> While at 4k writes your SDD is obviously bored, but actual OSD needs to
> handle all those writes and becomes limited by the IOPS it can peform.
Yes, it's quite bored and just shuffles data.
Maybe I've been thinking about this the wrong way,
but shouldn't the Journal buffer more until the Journal partition is full
or when the flush_interval is met.


Right now the rados benchmark gets about 1MB/s throughput. I really
don't know what is expected though, but it seems quite slow.

sudo rados bench -p shared-1 -b 4096 300 write
 Maintaining 16 concurrent writes of 4096 bytes for up to 300 seconds or
0 objects
 Object prefix: benchmark_data_px1_1502
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
     0       0         0         0         0         0         -         0
     1      16       203       187  0.730312  0.730469  0.030537  0.080467
     2      16       397       381  0.744003  0.757812  0.141118 0.0811331
     3      16       625       609  0.792841  0.890625  0.017979 0.0776631
     4      16       889       873  0.852415   1.03125   0.10221 0.0725933
     5      16      1122      1106  0.863941  0.910156  0.001871 0.0709095
     6      16      1437      1421  0.924995   1.23047  0.035859 0.0665901

Thanks for helping me out,
Josef
> Regards,
>
> Christian

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to detect journal problems

Reply via email to