Am 22.09.2017 um 23:56 schrieb Gregory Farnum:
> On Fri, Sep 22, 2017 at 2:49 PM, Danny Al-Gaaf <danny.al-g...@bisect.de> 
> wrote:
>> Am 22.09.2017 um 22:59 schrieb Gregory Farnum:
>> [..]
>>> This is super cool! Is there anything written down that explains this
>>> for Ceph developers who aren't familiar with the workings of Dovecot?
>>> I've got some questions I see going through it, but they may be very
>>> dumb.
>>>
>>> *) Why are indexes going on CephFS? Is this just about wanting a local
>>> cache, or about the existing Dovecot implementations, or something
>>> else? Almost seems like you could just store the whole thing in a
>>> CephFS filesystem if that's safe. ;)
>>
>> This is, if everything works as expected, only an intermediate step. An
>> idea is
>> (https://dalgaaf.github.io/CephMeetUpBerlin20170918-librmb/#/status-3)
>> be to use omap to store the index/meta data.
>>
>> We chose a step-by-step approach and since we are currently not sure if
>> using omap would work performance wise, we use CephFS (also since this
>> requires no changes in Dovecot). Currently we put our focus on the
>> development of the first version of librmb, but the code to use omap is
>> already there. It needs integration, testing, and performance tuning to
>> verify if it would work with our requirements.
>>
>>> *) It looks like each email is getting its own object in RADOS, and I
>>> assume those are small messages, which leads me to
>>
>> The mail distribution looks like this:
>> https://dalgaaf.github.io/CephMeetUpBerlin20170918-librmb/#/mailplatform-mails-dist
>>
>>
>> Yes, the majority of the mails are under 500k, but the most objects are
>> around 50k. Not so many very small objects.
> 
> Ah, that slide makes more sense with that context — I was paging
> through it in bed last night and thought it was about the number of
> emails per user or something weird.
> 
> So those mail objects are definitely bigger than I expected; interesting.
> 
>>
>>>   *) is it really cost-acceptable to not use EC pools on email data?
>>
>> We will use EC pools for the mail objects and replication for CephFS.
>>
>> But even without EC there would be a cost case compared to the current
>> system. We will save a large amount of IOPs in the new platform since
>> the (NFS) POSIX layer is removed from the IO path (at least for the mail
>> objects). And we expect with Ceph and commodity hardware we can compete
>> with a traditional enterprise NAS/NFS anyway.
>>
>>>   *) isn't per-object metadata overhead a big cost compared to the
>>> actual stored data?
>>
>> I assume not. The metadata/index is not so much compared to the size of
>> the mails (currently with NFS around 10% I would say). In the classic
>> NFS based dovecot the number of index/cache/metadata files is an issue
>> anyway. With 6.7 billion mails we have 1.2 billion index/cache/metadata
>> files
>> (https://dalgaaf.github.io/CephMeetUpBerlin20170918-librmb/#/mailplatform-mails-nums).
> 
> I was unclear; I meant the RADOS metadata cost of storing an object. I
> haven't quantified that in a while but it was big enough to make 4KB
> objects pretty expensive, which I was incorrectly assuming would be
> the case for most emails.
> EC pools have the same issue; if you want to erasure-code a 40KB
> object into 5+3 then you pay the metadata overhead for each 8KB
> (40KB/5) of data, but again that's more on the practical side of
> things than my initial assumptions placed it.
> 
> This is super cool!

Currently we assume/hope it will work out. But we will have an eye on
this topic as soon as we go into the PoC phase with the new hardware and
run real load tests.

Danny



_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to