librmb: Mail storage on RADOS with Dovecot

2017-09-22 Thread mj

Hi ceph-ers,

The email below was posted on the ceph mailinglist yesterday by Wido den 
Hollander. I guess this could be interesting for user here as well.


MJ

 Forwarded Message 
Subject: [ceph-users] librmb: Mail storage on RADOS with Dovecot
Date: Thu, 21 Sep 2017 10:40:03 +0200 (CEST)
From: Wido den Hollander 
To: ceph-us...@ceph.com

Hi,

A tracker issue has been out there for a while:
http://tracker.ceph.com/issues/12430

Storing e-mail in RADOS with Dovecot, the IMAP/POP3/LDA server with a 
huge marketshare.


It took a while, but last year Deutsche Telekom took on the heavy work 
and started a project to develop librmb: LibRadosMailBox


Together with Deutsche Telekom and Tallence GmbH (DE) this project came 
to life.


First, the Github link:
https://github.com/ceph-dovecot/dovecot-ceph-plugin

I am not going to repeat everything which is on Github, put a short summary:

- CephFS is used for storing Mailbox Indexes
- E-Mails are stored directly as RADOS objects
- It's a Dovecot plugin

We would like everybody to test librmb and report back issues on Github 
so that further development can be done.


It's not finalized yet, but all the help is welcome to make librmb the 
best solution for storing your e-mails on Ceph with Dovecot.


Danny Al-Gaaf has written a small blogpost about it and a presentation:

- https://dalgaaf.github.io/CephMeetUpBerlin20170918-librmb/
- http://blog.bisect.de/2017/09/ceph-meetup-berlin-followup-librmb.html

To get a idea of the scale: 4,7PB of RAW storage over 1.200 OSDs is the 
final goal (last slide in presentation). That will provide roughly 1,2PB 
of usable storage capacity for storing e-mail, a lot of e-mail.


To see this project finally go into the Open Source world excites me a 
lot :-)


A very, very big thanks to Deutsche Telekom for funding this awesome 
project!


A big thanks as well to Tallence as they did an awesome job in 
developing librmb in such a short time.


Wido


Re: librmb: Mail storage on RADOS with Dovecot

2017-09-23 Thread Timo Sirainen
On 22 Sep 2017, at 14.18, mj  wrote:
> First, the Github link:
> https://github.com/ceph-dovecot/dovecot-ceph-plugin
> 
> I am not going to repeat everything which is on Github, put a short summary:
> 
> - CephFS is used for storing Mailbox Indexes
> - E-Mails are stored directly as RADOS objects
> - It's a Dovecot plugin
> 
> We would like everybody to test librmb and report back issues on Github so 
> that further development can be done.
> 
> It's not finalized yet, but all the help is welcome to make librmb the best 
> solution for storing your e-mails on Ceph with Dovecot.

It would be have been nicer if RADOS support was implemented as lib-fs driver, 
and the fs-API had been used all over the place elsewhere. So 1) 
LibRadosMailBox wouldn't have been relying so much on RADOS specifically and 2) 
fs-rados could have been used for other purposes. There are already fs-dict and 
dict-fs drivers, so the RADOS dict driver may not have been necessary to 
implement if fs-rados was implemented instead (although I didn't check it 
closely enough to verify). (We've had fs-rados on our TODO list for a while 
also.)

BTW. We've also been planning on open sourcing some of the obox pieces, mainly 
fs-drivers (e.g. fs-s3). The obox format maybe too, but without the "metacache" 
piece. The current obox code is a bit too much married into the metacache 
though to make open sourcing it easy. (The metacache is about storing the 
Dovecot index files in object storage and efficiently caching them on local 
filesystem, which isn't planned to be open sourced in near future. That's 
pretty much the only difficult piece of the obox plugin, with Cassandra 
integration coming as a good second. I wish there had been a better/easier 
geo-distributed key-value database to use - tombstones are annoyingly 
troublesome.)

And using rmb-mailbox format, my main worries would be:
 * doesn't store index files (= message flags) - not necessarily a problem, as 
long as you don't want geo-replication
 * index corruption means rebuilding them, which means rescanning list of mail 
files, which means rescanning the whole RADOS namespace, which practically 
means  rescanning the RADOS pool. That most likely is a very very slow 
operation, which you want to avoid unless it's absolutely necessary. Need to be 
very careful to avoid that happening, and in general to avoid losing mails in 
case of crashes or other bugs.
 * I think copying/moving mails physically copies the full data on disk
 * Each IMAP/POP3/LMTP/etc process connects to RADOS separately from each 
others - some connection pooling would likely help here


Re: librmb: Mail storage on RADOS with Dovecot

2017-09-24 Thread Danny Al-Gaaf
Am 24.09.2017 um 02:43 schrieb Timo Sirainen:
> On 22 Sep 2017, at 14.18, mj  wrote:
>> First, the Github link: 
>> https://github.com/ceph-dovecot/dovecot-ceph-plugin
>> 
>> I am not going to repeat everything which is on Github, put a short
>> summary:
>> 
>> - CephFS is used for storing Mailbox Indexes - E-Mails are stored
>> directly as RADOS objects - It's a Dovecot plugin
>> 
>> We would like everybody to test librmb and report back issues on
>> Github so that further development can be done.
>> 
>> It's not finalized yet, but all the help is welcome to make librmb
>> the best solution for storing your e-mails on Ceph with Dovecot.
> 
> It would be have been nicer if RADOS support was implemented as
> lib-fs driver, and the fs-API had been used all over the place
> elsewhere. So 1) LibRadosMailBox wouldn't have been relying so much
> on RADOS specifically and 2) fs-rados could have been used for other
> purposes. There are already fs-dict and dict-fs drivers, so the RADOS
> dict driver may not have been necessary to implement if fs-rados was
> implemented instead (although I didn't check it closely enough to
> verify). (We've had fs-rados on our TODO list for a while also.)

Please note: librmb is not Dovecot specific. The goal of this library is
to abstract email storage at Ceph independent of Dovecot to allow also
other mail systems to store emails in RADOS via one library. This is
also the reason why it's relying on RADOS.

[...]
> And using rmb-mailbox format, my main worries would be: 
> * doesn't store index files (= message flags) - not necessarily a problem, as
> long as you don't want geo-replication 

The index files are stored via Dovecot's lib-index on CephFS. This is
only an intermediate step. The goal is to store also index data directly
in RADOS/Ceph omap key-value store. Currently geo-replication isn't an
important topic for our PoC setup at Deutsche Telekom.

> * index corruption means > rebuilding them, which means rescanning list of 
> mail files, which
> means rescanning the whole RADOS namespace, which practically means
> rescanning the RADOS pool. That most likely is a very very slow
> operation, which you want to avoid unless it's absolutely necessary.
> Need to be very careful to avoid that happening, and in general to
> avoid losing mails in case of crashes or other bugs.

This could be may avoided by snapshot on CephFS currently, at least
partially. But we will take a look at it during the PoC phase.

> * I think copying/moving mails physically copies the full data on disk
> * Each IMAP/POP3/LMTP/etc process connects to RADOS separately from each
> others - some connection pooling would likely help here

I'm not so deep in what Dovecot is currently doing. It's still under
heavy development and any comment and feedback is really welcome as Wido
already pointed out.

Danny


Re: librmb: Mail storage on RADOS with Dovecot

2017-09-25 Thread Peter Mauritius
Hi Timo,

I am one of the authors of the software Wido announced in his mail. First, I'd 
like to say that Dovecot is a wonderful piece of software and thank you for it. 
I would like to give some explanations regarding the design we choose.

Von: Timo Sirainen <mailto:t...@iki.fi>
Antworten: Dovecot Mailing List 
<mailto:dovecot@dovecot.org>
Datum: 24. September 2017 at 02:43:44
An: Dovecot Mailing List <mailto:dovecot@dovecot.org>
Betreff:  Re: librmb: Mail storage on RADOS with Dovecot

It would be have been nicer if RADOS support was implemented as lib-fs driver, 
and the fs-API had been used all over the place elsewhere. So 1) 
LibRadosMailBox wouldn't have been relying so much on RADOS specifically and 2) 
fs-rados could have been used for other purposes. There are already fs-dict and 
dict-fs drivers, so the RADOS dict driver may not have been necessary to 
implement if fs-rados was implemented instead (although I didn't check it 
closely enough to verify). (We've had fs-rados on our TODO list for a while 
also.)

Actually I considered using the fs-api to build a RADOS driver. But I did not 
follow that path:

The dict-fs mapping is quite simplistic. For example, I would not be able to 
use RADOS read/write operations to batch request or model the dictionary 
transactions.  Also there is no async support if you hide the RADOS dictionary 
behind as fs-api module, which would make the use of dict-rados in the 
dict-proxy harder. Doing this would help to lower the price you have to pay for 
the process model Dovecot ist using a lot.

Using a fs-rados module behing a storage module, let’s say sdbox, would IMO not 
fit to our goals. We planned to store mails in RADOS object and their 
(immutable) metadata in RADOS omap K/V. We want to be able to access the 
objects without Dovecot. This is not possible if RADOS is hidden behind a 
fs-rados module. The format of the stored objects would be different and 
depended on the storage module sitting in front of fs-rados.
Another reason is that at the fs level the operations are to decomposed. We 
would not have any, as with the dictionaries, transactional contexts etc. This 
context information allows us to use the RADOS operations in an optimized way. 
The storage API is IMO the right level of abstraction. Especially if we follow 
our long term goal to eliminate the fs needs for index data to. I like the 
internal abstraction of sdbox/mdbox a lot. But for our purpose it should have 
been on mail and not file level.

But building a fs-rados should not be very hard.

BTW. We've also been planning on open sourcing some of the obox pieces, mainly 
fs-drivers (e.g. fs-s3). The obox format maybe too, but without the "metacache" 
piece. The current obox code is a bit too much married into the metacache 
though to make open sourcing it easy. (The metacache is about storing the 
Dovecot index files in object storage and efficiently caching them on local 
filesystem, which isn't planned to be open sourced in near future. That's 
pretty much the only difficult piece of the obox plugin, with Cassandra 
integration coming as a good second. I wish there had been a better/easier 
geo-distributed key-value database to use - tombstones are annoyingly 
troublesome.)


That would be great.

And using rmb-mailbox format, my main worries would be:
* doesn't store index files (= message flags) - not necessarily a problem, as 
long as you don't want geo-replication

Your index management is awesome, highly optimized and not easily 
reimplemented. Very nice work. Unfortunately it is not using the fs-api and 
therefore not capable of being located on not fs storage. We are believing that 
CephFS will be a good and stable solution for the next time. Of course it would 
be nicer to have a lib-index that allows us to plug in different backends.

* index corruption means rebuilding them, which means rescanning list of mail 
files, which means rescanning the whole RADOS namespace, which practically 
means rescanning the RADOS pool. That most likely is a very very slow 
operation, which you want to avoid unless it's absolutely necessary. Need to be 
very careful to avoid that happening, and in general to avoid losing mails in 
case of crashes or other bugs.

Yes, disaster is a problem. We are trying to build as many rescue tools as 
possible but in the end scanning mails is involved. All mails are stored within 
separate RADOS namespaces each representing a different user. This will help us 
to avoid scanning the whole pool. But it this not should not be a regular 
operation. You are right.

* I think copying/moving mails physically copies the full data on disk

We tried to optimize this. Moves within a users mailboxes are done without 
copying the mails by just changing the index data. Copies, when really 
necessary, are done be native RADOS commands (OSD to OSD) without transferring 
the data to the cli