On 5.12.2011, at 0.07, Lorens Kockum wrote:

> Timo Sirainen wrote:
>> And before designing it I'd need to look into how the backup
>> softwares usually work.. If anyone has any ideas about this,
>> I'd like to hear.
> 
> Simple or even moderately efficient backup programs like rsync
> copy all the files.

I'm mainly wondering if it's common for backup programs to support using a 
separate program to generate the backups. For example if there was a 
"dovecot-backup" binary that just dumps all (or new-since-last-backup) of the 
users' mails into stdout, which the backup program can use. Or perhaps in that 
case there wouldn't really be much of anything for the backup to do except to 
write it to tape..

>> Also backing up the attachment links could be problematic if
>> the backup system doesn't support hard links. Each attachment
>> always has at least 2 links, so if the backup doesn't realize
>> that it at minimum duplicates the space used by attachments.
> 
> rsync recognizes hard links with option -H, but at a very
> noticeable performance cost when dealing with millions of
> files. If the aa/bb/aabccddeeff-etc is unique across the whole
> mailstore, it would be easy to replace the hard link with a
> symlink, as you said:

SIS was designed to work with hard links. They couldn't be replaced with 
symlinks without a redesign (which would be less efficient in normal operation).

>> maybe not storing the attachments directly to backups, but add
>> symlinks to them so they can be used to figure out what to
>> restore. Or maybe the backing up wouldn't need a special tool,
>> but the restoring tool could just read through the dbox files
>> to see what attachments are also needed and write a list of
>> them somewhere so they can be taken from backups as well.
> 
> In the second way, you would have a separate hierarchy for
> multiple-recipient attachments, or would the attachment be
> "really" stored in the box of a recipient chosen at random?

I meant that SIS would work exactly like it works now, with hard links and 
everything, but on top of that it would also create symlinks to the used files 
simply to make it easier to find what files are used. The annoying thing about 
that is that in error situations the symlinks can get out of sync with the 
reality.

> Just some random thoughts: professionally, I use
> Zimbra. Messages are stored in Maildir-equivalents. The time
> it takes to backup is a quite severe constraint on the backup
> technique. For example, compressing the backup files takes
> too long, so the zip files are not compressed. Instead, the
> individual mails are stored compressed on disk. Each backup
> zips up the mails in a few big backup files.

You mean you first create uncompressed zip files (why not just tar?) of all the 
mails to the filesystem and the backup software then backups those zip files?

> An improvement
> could be to sort mails into backup zip files so that once a
> zip file is made, it stays the same. After all, if a mail is not
> deleted a month after it is read, then it will probably stay
> in the same state forever, or at least until the user starts a
> keep-me-under-quota cleaning-up spree. During this time, backing
> up that big zip file can just be a check to see if it is already
> OK in the backup, which is much quicker. I have no idea if this
> could be applied to Dovecot, but who knows.


Dovecot's mdbox files already contain multiple messages in each file, so it 
should be a lot more efficient to do backups on those. And each message in an 
mdbox file can be compressed if zlib plugin is enabled. So I think that sounds 
quite a lot like what you propose.

Reply via email to