Re: overview zlib efficiency? Summary and added note

2016-03-19 Thread Harald Leithner
In the vmail directory are only attachment stored which are smaller then 
64k every attachment that is bigger get into the SIS store.


The SIS store has no compression but it seams that attachments are 
stored in raw and not base64 encoded so its saves 30%? on binary data.


Also I wrote that 'du -l' maybe not the correct way to count 
de-duplication.


It seams that every attachment has minimum 2 hardlinks in the SIS, I 
missed that before I wrote the other mail. That also explains why 
storage uses so much more space then the counted mail size ;-)


I think ignoring the hashes folder in the sis would give better results:

find vmail_sis -type f -printf '%s %p\n' | grep -v hashes | awk 
'{s+=$1}END{printf("%.2fMB\n", s/1024/1024);}'


In my case this is:

142922.29MB (So forget 209G from my previous mail.)

doveadm -f table fetch -A "size.physical" ALL | awk 
'{s+=$2}END{printf("%.2fMB\n", s/1024/1024);}'


195861.12MB

du -sh vmail

56G (it also seams that mdbox tricked me with spare file size)

Mails in mdbox storage compressed without index/logs

find vmail -type f -printf '%s %p\n' | grep "/storage/m." | awk 
'{s+=$1}END{printf("%.2fMB\n", s/1024/1024);}'

4776.51MB

index/logs

find vmail -type f -printf '%s %p\n' | grep -v "/storage/m." | awk 
'{s+=$1}END{printf("%.2fMB\n", s/1024/1024);}'

224.40MB

So in the end I use 146,7 Storage + 224,4 index/logs/metadata/overhead 
with 191,27 GB Plain E-Mails


I still can't tell you how much compression begins in because SIS is not 
compressed ;-)


So some without SIS and mdbox have to do this test.

bye



Am 16.03.2016 um 11:52 schrieb Haravikk:

Not sure how you’re seeing such a high ratio; I tried the same commands on my 
system (thanks for these btw) and my savings from compression are around 5% =D

That said I’m dealing with a much smaller volume (3gb) and I’ve only identified 
a half dozen or so attachments that don’t have some kind of compression 
already; most modern mail programs will compress common types like images by 
default, and many modern file-formats have compression built in, and can give 
better results than zlib anyway.

My biggest savings are on mailing list messages (I filter these into their own 
mailbox) since they tend to be longer than typical messages, especially with 
auto-quoting, they also tend to be very busy mailboxes, but I also don’t keep 
them forever.

As an experiment I also tried moving my (uncompressed) messages to a 
compressing file-system (ZFS using lz4) but the savings were similarly small; I 
assume they were probably a bit better, but the extra overhead of the 
file-system eroded it since the savings are so small in my case. I think if 
you’re serious about compression then a compressing file-system is the way to 
go though, but in my case I’m on virtual hosting so there’s not much point in 
layering a ZFS volume on top of shared storage (since it’s ZFS based already 
for integrity/redundancy).

I just thought I’d mention my experience since people are quoting big savings 
that I haven’t seen; I wouldn’t consider my usage all that unusual, maybe some 
of you are receiving a lot more newsletter type traffic (these messages can be 
quite large), uncompressed document type files, or are less selective in which 
messages are retained forever? Just a caution that people looking at 
compression may not see the same savings depending upon their actual content.

Spam is another bad category for compression I’ve found; at least in my case 
the messages are usually very short, and/or contain randomised junk to try to 
confound filters, though I’m pretty aggressive about clearing them (I discard 
messages outright above a certain threshold, and use a script to expunge spam 
messages so that I can expunge messages with higher spam ratings faster (so 
possible false positives stick around longer so they can be caught).


On 16 Mar 2016, at 09:48, Harald Leithner  wrote:

Hi,

use "doveadm" to get all real message

doveadm -f table fetch -A "size.physical" ALL | awk 
'{s+=$2}END{printf("%.2fMB\n", s/1024/1024);}'

189247.67MB .. 185G

use "du" to get size on disc:

In my case
with deduplication:

/srv/stroage/# du -s -h *
53G vmail
75G vmail_sis

without deduplication

/srv/stroage/# du -s -h -l *
53G vmail
209Gvmail_sis

j4i, SIS can't use the zlib plugin so the 75G in my case are not compressed (I 
haven't a filesystem that I trust and has a compression feature). Anyway it has 
a 3:1 ratio in my case.

Maybe I interpret the SIS wrong and SIS couldn't be counted with du -l (count 
links).

But if someone doesn't have SIS this values should be point you into the right 
direction.

bye

Harald

Am 16.03.2016 um 08:50 schrieb Götz Reinicke - IT Koordinator:

Am 15.03.16 um 16:01 schrieb Götz Reinicke - IT Koordinator:

Hi,

may be someone has already done that: Do you have a script(?) tool which
shows the efficiency of the mail compression if zlib is used?

Something that shows the 

Re: overview zlib efficiency?

2016-03-19 Thread Sven Hartge
Teemu Huovila  wrote:
> On 15.03.2016 21:45, Sven Hartge wrote:
 
>> And gzip (or lz4 of implemented someday) (or even blosc:
> liblz4 has been supported since 2.2.11+ http://wiki2.dovecot.org/Plugins/Zlib

Hmm, yes. I don't know how I missed this when I looked at that page
last night. Must have been a coffein underflow error.

S°

-- 
Sigmentation fault. Core dumped.


Re: overview zlib efficiency?

2016-03-18 Thread Anton Chevychalov



During migration from 1.x with maildir to 2.x with dbox a did the 
following trick:



time dsync -R -u t...@example.com backup 
maildir:/var/spool/imap/tmp/Maildir


And got the following results:

|orig |366|28s |
|gz.6 |260|5min|
|bz2.6|202|5min|
|xz.1 |211|1min|
|xz.2 |213|1min50sec|
|xz.3 |201|3min|
|xz.6 |198|5min|
|xz.9 |198|10min|
|lz4  |281|18s |

1-9 - is compression level. I choose lz4 finally.

--
Anton Chevychalov

Götz Reinicke - IT Koordinator писал 2016-03-15 18:01:

Hi,

may be someone has already done that: Do you have a script(?) tool 
which

shows the efficiency of the mail compression if zlib is used?

Something that shows the uncompressed size vrs. the compressed.

Thanks for hints! /Götz


Re: overview zlib efficiency? Summary and added note

2016-03-16 Thread Haravikk
Not sure how you’re seeing such a high ratio; I tried the same commands on my 
system (thanks for these btw) and my savings from compression are around 5% =D

That said I’m dealing with a much smaller volume (3gb) and I’ve only identified 
a half dozen or so attachments that don’t have some kind of compression 
already; most modern mail programs will compress common types like images by 
default, and many modern file-formats have compression built in, and can give 
better results than zlib anyway.

My biggest savings are on mailing list messages (I filter these into their own 
mailbox) since they tend to be longer than typical messages, especially with 
auto-quoting, they also tend to be very busy mailboxes, but I also don’t keep 
them forever.

As an experiment I also tried moving my (uncompressed) messages to a 
compressing file-system (ZFS using lz4) but the savings were similarly small; I 
assume they were probably a bit better, but the extra overhead of the 
file-system eroded it since the savings are so small in my case. I think if 
you’re serious about compression then a compressing file-system is the way to 
go though, but in my case I’m on virtual hosting so there’s not much point in 
layering a ZFS volume on top of shared storage (since it’s ZFS based already 
for integrity/redundancy).

I just thought I’d mention my experience since people are quoting big savings 
that I haven’t seen; I wouldn’t consider my usage all that unusual, maybe some 
of you are receiving a lot more newsletter type traffic (these messages can be 
quite large), uncompressed document type files, or are less selective in which 
messages are retained forever? Just a caution that people looking at 
compression may not see the same savings depending upon their actual content.

Spam is another bad category for compression I’ve found; at least in my case 
the messages are usually very short, and/or contain randomised junk to try to 
confound filters, though I’m pretty aggressive about clearing them (I discard 
messages outright above a certain threshold, and use a script to expunge spam 
messages so that I can expunge messages with higher spam ratings faster (so 
possible false positives stick around longer so they can be caught).

> On 16 Mar 2016, at 09:48, Harald Leithner  wrote:
> 
> Hi,
> 
> use "doveadm" to get all real message
> 
> doveadm -f table fetch -A "size.physical" ALL | awk 
> '{s+=$2}END{printf("%.2fMB\n", s/1024/1024);}'
> 
> 189247.67MB .. 185G
> 
> use "du" to get size on disc:
> 
> In my case
> with deduplication:
> 
> /srv/stroage/# du -s -h *
> 53G vmail
> 75G vmail_sis
> 
> without deduplication
> 
> /srv/stroage/# du -s -h -l *
> 53G vmail
> 209Gvmail_sis
> 
> j4i, SIS can't use the zlib plugin so the 75G in my case are not compressed 
> (I haven't a filesystem that I trust and has a compression feature). Anyway 
> it has a 3:1 ratio in my case.
> 
> Maybe I interpret the SIS wrong and SIS couldn't be counted with du -l (count 
> links).
> 
> But if someone doesn't have SIS this values should be point you into the 
> right direction.
> 
> bye
> 
> Harald
> 
> Am 16.03.2016 um 08:50 schrieb Götz Reinicke - IT Koordinator:
>> Am 15.03.16 um 16:01 schrieb Götz Reinicke - IT Koordinator:
>>> Hi,
>>> 
>>> may be someone has already done that: Do you have a script(?) tool which
>>> shows the efficiency of the mail compression if zlib is used?
>>> 
>>> Something that shows the uncompressed size vrs. the compressed.
>> 
>> Hi,
>> 
>> maybe my question was a bit misleading. But anyway thanks for your
>> feedback regarding your experiences and compression rates.
>> 
>> We already thought about the benefit of less IO and more CPU power,
>> which is no concern.
>> 
>> The mailboxes I checked also go with 40-60% compression rate.
>> 
>> But what I was looking for was a tool or way to see what volume would be
>> used if we where not using compression.
>> 
>> e.g. "du -hs --without-zlib"
>> 
>> Our management would like to see a graph one day which shows the volume
>> uncompressed and compressed ...
>> 
>> Adding zlib with mdbox or maildir - as we do it currently - is from my
>> POV if you have the CPU power a MUST :)
>> 
>>  happy dovecoting - Götz
>> 
>> 
>> 
> 
> -- 
> Harald Leithner
> 
> ITronic
> Wiedner Hauptstraße 120/5.1, 1050 Wien, Austria
> Tel: +43-1-545 0 604
> Mobil: +43-699-123 78 4 78
> Mail: leith...@itronic.at | itronic.at


Re: overview zlib efficiency?

2016-03-16 Thread Teemu Huovila


On 15.03.2016 21:45, Sven Hartge wrote:
> Robert L Mathews  wrote:
> 
>> Also keep in mind that even if it does increase CPU usage, it reduces
>> disk usage. This is probably an excellent tradeoff for most people,
>> since most servers are limited by disk throughput/latency more than
>> CPU power.
> 
> IOPS are harder to scale (meaning: cost more to scale) than CPU power.
> 
> And gzip (or lz4 of implemented someday) (or even blosc:
liblz4 has been supported since 2.2.11+ http://wiki2.dovecot.org/Plugins/Zlib

> http://www.blosc.org/. They claim "Designed to transmit data to the
> processor cache faster than a memcpy() OS call.") is effectively free
> with todays CPUs.
> 
> Grüße,
> Sven.
> 


Re: overview zlib efficiency? Summary and added note

2016-03-16 Thread Harald Leithner

Hi,

use "doveadm" to get all real message

doveadm -f table fetch -A "size.physical" ALL | awk 
'{s+=$2}END{printf("%.2fMB\n", s/1024/1024);}'


189247.67MB .. 185G

use "du" to get size on disc:

In my case
with deduplication:

/srv/stroage/# du -s -h *
53G vmail
75G vmail_sis

without deduplication

/srv/stroage/# du -s -h -l *
53G vmail
209Gvmail_sis

j4i, SIS can't use the zlib plugin so the 75G in my case are not 
compressed (I haven't a filesystem that I trust and has a compression 
feature). Anyway it has a 3:1 ratio in my case.


Maybe I interpret the SIS wrong and SIS couldn't be counted with du -l 
(count links).


But if someone doesn't have SIS this values should be point you into the 
right direction.


bye

Harald

Am 16.03.2016 um 08:50 schrieb Götz Reinicke - IT Koordinator:

Am 15.03.16 um 16:01 schrieb Götz Reinicke - IT Koordinator:

Hi,

may be someone has already done that: Do you have a script(?) tool which
shows the efficiency of the mail compression if zlib is used?

Something that shows the uncompressed size vrs. the compressed.


Hi,

maybe my question was a bit misleading. But anyway thanks for your
feedback regarding your experiences and compression rates.

We already thought about the benefit of less IO and more CPU power,
which is no concern.

The mailboxes I checked also go with 40-60% compression rate.

But what I was looking for was a tool or way to see what volume would be
used if we where not using compression.

e.g. "du -hs --without-zlib"

Our management would like to see a graph one day which shows the volume
uncompressed and compressed ...

Adding zlib with mdbox or maildir - as we do it currently - is from my
POV if you have the CPU power a MUST :)

happy dovecoting - Götz





--
Harald Leithner

ITronic
Wiedner Hauptstraße 120/5.1, 1050 Wien, Austria
Tel: +43-1-545 0 604
Mobil: +43-699-123 78 4 78
Mail: leith...@itronic.at | itronic.at


Re: overview zlib efficiency? Summary and added note

2016-03-16 Thread Götz Reinicke - IT Koordinator
Am 15.03.16 um 16:01 schrieb Götz Reinicke - IT Koordinator:
> Hi,
> 
> may be someone has already done that: Do you have a script(?) tool which
> shows the efficiency of the mail compression if zlib is used?
> 
> Something that shows the uncompressed size vrs. the compressed.

Hi,

maybe my question was a bit misleading. But anyway thanks for your
feedback regarding your experiences and compression rates.

We already thought about the benefit of less IO and more CPU power,
which is no concern.

The mailboxes I checked also go with 40-60% compression rate.

But what I was looking for was a tool or way to see what volume would be
used if we where not using compression.

e.g. "du -hs --without-zlib"

Our management would like to see a graph one day which shows the volume
uncompressed and compressed ...

Adding zlib with mdbox or maildir - as we do it currently - is from my
POV if you have the CPU power a MUST :)

happy dovecoting - Götz





smime.p7s
Description: S/MIME Cryptographic Signature


Re: overview zlib efficiency?

2016-03-15 Thread Andrew McGlashan


On 16/03/2016 9:07 AM, micah wrote:
> Andrew McGlashan  writes:
> 
>> On 16/03/2016 2:01 AM, Götz Reinicke - IT Koordinator wrote:
>>> Hi,
>>>
>>> may be someone has already done that: Do you have a script(?) tool which
>>> shows the efficiency of the mail compression if zlib is used?
>>>
>>> Something that shows the uncompressed size vrs. the compressed.
>>
>> Remember one thing; emails are stored in plain text, the same text that
>> they are normally transmitted b/w servers.
> 
> Emails are not stored in plaintext if you are using dbox/mdbox.

I think it is the best way to store them in pure form.

A.


Re: overview zlib efficiency?

2016-03-15 Thread Dirk Koopman

On 15/03/16 22:07, micah wrote:

Andrew McGlashan  writes:


On 16/03/2016 2:01 AM, Götz Reinicke - IT Koordinator wrote:

Hi,

may be someone has already done that: Do you have a script(?) tool which
shows the efficiency of the mail compression if zlib is used?

Something that shows the uncompressed size vrs. the compressed.

Remember one thing; emails are stored in plain text, the same text that
they are normally transmitted b/w servers.

Emails are not stored in plaintext if you are using dbox/mdbox.


They are on my machine. The fact that (m)dbox puts some "binary" records 
around those emails doesn't stop the files being (human) readable and 
the text extractable.


Re: overview zlib efficiency?

2016-03-15 Thread micah
Andrew McGlashan  writes:

> On 16/03/2016 2:01 AM, Götz Reinicke - IT Koordinator wrote:
>> Hi,
>> 
>> may be someone has already done that: Do you have a script(?) tool which
>> shows the efficiency of the mail compression if zlib is used?
>> 
>> Something that shows the uncompressed size vrs. the compressed.
>
> Remember one thing; emails are stored in plain text, the same text that
> they are normally transmitted b/w servers.

Emails are not stored in plaintext if you are using dbox/mdbox.


Re: overview zlib efficiency?

2016-03-15 Thread Sven Hartge
Robert L Mathews  wrote:

> Also keep in mind that even if it does increase CPU usage, it reduces
> disk usage. This is probably an excellent tradeoff for most people,
> since most servers are limited by disk throughput/latency more than
> CPU power.

IOPS are harder to scale (meaning: cost more to scale) than CPU power.

And gzip (or lz4 of implemented someday) (or even blosc:
http://www.blosc.org/. They claim "Designed to transmit data to the
processor cache faster than a memcpy() OS call.") is effectively free
with todays CPUs.

Grüße,
Sven.

-- 
Sigmentation fault. Core dumped.


Re: overview zlib efficiency?

2016-03-15 Thread Rick Romero

 Quoting Rick Romero :


Quoting Robert L Mathews :


On 3/15/16 10:13 AM, Sven Hartge wrote:


I don't have a script, but I can provide some numbers. I did a test

with

a server for about 10.000 users and 2TB worth of mail, converting from
Maildir++ to mdbox with zlib (level = 6) and had a final size of 1TB,

so

2:1 reduction.


These numbers roughly match my results. About 6 TB of mail compresses
down to about 3 TB.

Also keep in mind that even if it does increase CPU usage, it reduces
disk usage. This is probably an excellent tradeoff for most people,
since most servers are limited by disk throughput/latency more than CPU
power.


Just thought I'd add, because it frustrated me and it's an amusing
anecdote
to this - The new 4k ashift doubled my disk usage of Maildir++ mail.

So logically, if you're migrating from maildir++ to mdbox on 4k sector
system, you may see a sizable decrease in disk usage without a

compression

change.
Rick


With ZFS - Sorry - I dropped that in my edit.


Re: overview zlib efficiency?

2016-03-15 Thread Rick Romero

 Quoting Robert L Mathews :


On 3/15/16 10:13 AM, Sven Hartge wrote:


I don't have a script, but I can provide some numbers. I did a test with
a server for about 10.000 users and 2TB worth of mail, converting from
Maildir++ to mdbox with zlib (level = 6) and had a final size of 1TB, so
2:1 reduction.


These numbers roughly match my results. About 6 TB of mail compresses
down to about 3 TB.

Also keep in mind that even if it does increase CPU usage, it reduces
disk usage. This is probably an excellent tradeoff for most people,
since most servers are limited by disk throughput/latency more than CPU
power.


Just thought I'd add, because it frustrated me and it's an amusing anecdote
to this - The new 4k ashift doubled my disk usage of Maildir++ mail.

So logically, if you're migrating from maildir++ to mdbox on 4k sector
system, you may see a sizable decrease in disk usage without a compression
change.

Rick


Re: overview zlib efficiency?

2016-03-15 Thread Robert L Mathews
On 3/15/16 10:13 AM, Sven Hartge wrote:

> I don't have a script, but I can provide some numbers. I did a test with
> a server for about 10.000 users and 2TB worth of mail, converting from
> Maildir++ to mdbox with zlib (level = 6) and had a final size of 1TB, so
> 2:1 reduction.

These numbers roughly match my results. About 6 TB of mail compresses
down to about 3 TB.

The difference in ongoing CPU use for compression "on the fly" of newly
arriving mail, and decompression of mail from the disk, is unnoticeable
on my servers.

Also keep in mind that even if it does increase CPU usage, it reduces
disk usage. This is probably an excellent tradeoff for most people,
since most servers are limited by disk throughput/latency more than CPU
power.

-- 
Robert L Mathews, Tiger Technologies, http://www.tigertech.net/


Re: overview zlib efficiency?

2016-03-15 Thread Andrew McGlashan


On 16/03/2016 2:01 AM, Götz Reinicke - IT Koordinator wrote:
> Hi,
> 
> may be someone has already done that: Do you have a script(?) tool which
> shows the efficiency of the mail compression if zlib is used?
> 
> Something that shows the uncompressed size vrs. the compressed.

Remember one thing; emails are stored in plain text, the same text that
they are normally transmitted b/w servers.

With that in mind, text, particularly with repeating and common things
like headers (and other things), then you should get significant
reduction in size.

The exception of the size benefits (storage), is when you have emails
that are less than the file system block size (4k ext4 perhaps).  So
many emails are smaller than a block size and for those, zipping is not
much benefit as it won't make a scrap of difference to storage.
However, when you have users that send attachments and sometimes very
large attachments, well, it will save loads of storage on those emails.

Next, if you have a CPU bottleneck, then the extra overhead of
compression may also be an issue; but unless your server is working
hard, compression isn't likely to tax the CPU a great deal.

Cheers
Andrewm



signature.asc
Description: OpenPGP digital signature


Re: overview zlib efficiency?

2016-03-15 Thread Sven Hartge
Götz Reinicke - IT Koordinator  wrote:

> may be someone has already done that: Do you have a script(?) tool
> which shows the efficiency of the mail compression if zlib is used?

> Something that shows the uncompressed size vrs. the compressed.

I don't have a script, but I can provide some numbers. I did a test with
a server for about 10.000 users and 2TB worth of mail, converting from
Maildir++ to mdbox with zlib (level = 6) and had a final size of 1TB, so
2:1 reduction.

Grüße,
Sven.

-- 
Sigmentation fault. Core dumped.


Re: overview zlib efficiency?

2016-03-15 Thread Leonardo Rodrigues

Em 15/03/16 12:01, Götz Reinicke - IT Koordinator escreveu:

Hi,

may be someone has already done that: Do you have a script(?) tool which
shows the efficiency of the mail compression if zlib is used?

Something that shows the uncompressed size vrs. the compressed.



While i dont have the data you're looking for, i do have lots of 
servers running with zlib enabled and, if someone makes the script, i 
can run on some servers and provide the results !




--


Atenciosamente / Sincerily,
Leonardo Rodrigues
Solutti Tecnologia
http://www.solutti.com.br

Minha armadilha de SPAM, NÃO mandem email
gertru...@solutti.com.br
My SPAMTRAP, do not email it


overview zlib efficiency?

2016-03-15 Thread Götz Reinicke - IT Koordinator
Hi,

may be someone has already done that: Do you have a script(?) tool which
shows the efficiency of the mail compression if zlib is used?

Something that shows the uncompressed size vrs. the compressed.

Thanks for hints! /Götz




smime.p7s
Description: S/MIME Cryptographic Signature