Re: [Dovecot] RAID1+md concat+XFS as mailstorage

2012-06-29 Thread Wojciech Puchar
The executive summary is something like: when raid5 fails, because at that 
point you effectively do a raid scrub you tend to suddenly notice a bunch 
of other hidden problems which were lurking and your rebuild fails (this


and no raid will protect you from every failure. You have to do backups.
EOT


Re: [Dovecot] RAID1+md concat+XFS as mailstorage

2012-06-29 Thread Wojciech Puchar

Has anyone tried or benchmarked ZFS, perhaps ZFS+NFS as backing store for


yes. long time ago. ZFS isn't useful for anything more than a toy. I/O 
performance is just bad.


[Dovecot] auth service: out of memory

2012-06-29 Thread Mailing List SVR

Hi,

I have some out of memory errors in my logs (file errors.txt attached)

I'm using dovecot 2.0.19, I can see some memory leaks fix in hg after 
the 2.0.19 release but they seem related to imap-login service,


I attached my config too, is something wrong there? Should I really 
increase the limit based on my settings?


Can these commits fix the reported leak?

http://hg.dovecot.org/dovecot-2.0/rev/6299dfb73732
http://hg.dovecot.org/dovecot-2.0/rev/67f1cef07427

Please note that the auth service is restarted when it reach the limit 
so no real issues,


please advice

thanks
Nicola


cat /var/log/mail.log | grep Out of memory
Jun 28 11:48:24 server1 dovecot: master: Error: service(auth): child 31301 
returned error 83 (Out of memory (service auth { vsz_limit=128 MB }, you may 
need to increase it))
Jun 28 11:50:18 server1 dovecot: auth: Fatal: pool_system_realloc(8192): Out of 
memory
Jun 28 11:50:18 server1 dovecot: master: Error: service(auth): child 10782 
returned error 83 (Out of memory (service auth { vsz_limit=128 MB }, you may 
need to increase it))
Jun 28 11:52:43 server1 dovecot: master: Error: service(auth): child 16854 
returned error 83 (Out of memory (service auth { vsz_limit=128 MB }, you may 
need to increase it))
Jun 28 11:54:01 server1 dovecot: auth: Fatal: block_alloc(4096): Out of memory
Jun 28 11:54:01 server1 dovecot: master: Error: service(auth): child 23378 
returned error 83 (Out of memory (service auth { vsz_limit=128 MB }, you may 
need to increase it))
Jun 28 11:55:09 server1 dovecot: auth: Fatal: pool_system_realloc(8192): Out of 
memory
Jun 28 11:55:09 server1 dovecot: master: Error: service(auth): child 28203 
returned error 83 (Out of memory (service auth { vsz_limit=128 MB }, you may 
need to increase it))
Jun 28 11:56:07 server1 dovecot: master: Error: service(auth): child 32570 
returned error 83 (Out of memory (service auth { vsz_limit=128 MB }, you may 
need to increase it))
Jun 28 11:57:01 server1 dovecot: auth: Fatal: block_alloc(4096): Out of memory
Jun 28 11:57:01 server1 dovecot: master: Error: service(auth): child 5136 
returned error 83 (Out of memory (service auth { vsz_limit=128 MB }, you may 
need to increase it))
Jun 28 11:57:57 server1 dovecot: master: Error: service(auth): child 9245 
returned error 83 (Out of memory (service auth { vsz_limit=128 MB }, you may 
need to increase it))
Jun 28 11:58:52 server1 dovecot: master: Error: service(auth): child 13779 
returned error 83 (Out of memory (service auth { vsz_limit=128 MB }, you may 
need to increase it))
Jun 28 11:59:49 server1 dovecot: master: Error: service(auth): child 18260 
returned error 83 (Out of memory (service auth { vsz_limit=128 MB }, you may 
need to increase it))
Jun 28 12:01:03 server1 dovecot: auth: Fatal: pool_system_realloc(8192): Out of 
memory
Jun 28 12:01:03 server1 dovecot: master: Error: service(auth): child 22181 
returned error 83 (Out of memory (service auth { vsz_limit=128 MB }, you may 
need to increase it))
Jun 28 12:03:24 server1 dovecot: auth: Fatal: pool_system_malloc(3144): Out of 
memory
Jun 28 12:03:24 server1 dovecot: master: Error: service(auth): child 27253 
returned error 83 (Out of memory (service auth { vsz_limit=128 MB }, you may 
need to increase it))

# 2.0.19: /etc/dovecot/dovecot.conf
# OS: Linux 3.2.0-25-generic x86_64 Ubuntu 12.04 LTS ext4
auth_cache_size = 10 M
auth_mechanisms = plain login
auth_socket_path = /var/run/dovecot/auth-userdb
auth_worker_max_count = 128
base_dir = /var/run/dovecot/
default_process_limit = 200
default_vsz_limit = 128 M
disable_plaintext_auth = no
first_valid_gid = 2000
first_valid_uid = 2000
hostname = mail.example.com
last_valid_gid = 2000
last_valid_uid = 2000
listen = *
login_greeting = SVR ready.
mail_location = maildir:/srv/panel/mail/%d/%t/Maildir
mail_plugins =  quota trash autocreate
managesieve_notify_capability = mailto
managesieve_sieve_capability = fileinto reject envelope encoded-character 
vacation subaddress comparator-i;ascii-numeric relational regex imap4flags copy 
include variables body enotify environment mailbox date ihave
passdb {
  args = /etc/dovecot/dovecot-sql.conf.ext
  driver = sql
}
plugin {
  autocreate = Trash
  autocreate2 = Junk
  autocreate3 = Drafts
  autocreate4 = Sent
  autosubscribe = Trash
  autosubscribe2 = Junk
  autosubscribe3 = Drafts
  autosubscribe4 = Sent
  quota = maildir:User quota
  quota_rule = *:storage=300MB
  quota_rule2 = Trash:ignore
  quota_warning = storage=95%% quota-warning 95 %u
  quota_warning2 = storage=80%% quota-warning 80 %u
  sieve = ~/.dovecot.sieve
  sieve_before = /etc/dovecot/sieve/move-spam.sieve
  sieve_dir = ~/sieve
  sieve_max_actions = 32
  sieve_max_redirects = 4
  sieve_max_script_size = 1M
  sieve_quota_max_scripts = 10
  sieve_quota_max_storage = 2M
  trash = /etc/dovecot/dovecot-trash.conf.ext
}
postmaster_address = postmas...@example.com
protocols = imap pop3 sieve
service auth-worker {
  user = $default_internal_user
}
service auth {
  

Re: [Dovecot] Removing specific entry in user/auth cache

2012-06-29 Thread Angel L. Mateo

El 29/06/12 07:32, Timo Sirainen escribió:

On 29.6.2012, at 5.18, Daniel Parthey wrote:


wouldn't it be better to use a syntax similar to other doveadm commands,
with labels for all arguments?

doveadm auth test -u user -p [pass]
doveadm auth cache flush -u [user]
doveadm auth cache stats

This will allow you to syntactically distinguish commands from arguments.
Otherwise you might run into the same kludgy syntax problem again, as soon
as the number of subcommands changes.


The problem was with the auth toplevel command not having subcommands. I don't think there 
are going to be any problems with subcommands. Also there are many commands already that take 
user without the -u parameter. Actually it's only the mail commands that take -u 
parameter at all.

Another potential problem is doveadm user command. I'm wondering if it might be a good idea to move it to 
doveadm auth user or doveadm auth userdb command. There should be also a similar doveadm 
auth passdb command that does a passdb lookup without authentication.



	Other command it could be usefull is to remove a temporal user-server 
association in director. For example, I had a downtime in one server, so 
users normally directed to this server is now been directed to other. 
Now I want a user to get back to his normal server (force it, I know we 
willl get back after a timeout), but I don't want to flush all user 
connections to the backup server.


--
Angel L. Mateo Martínez
Sección de Telemática
Área de Tecnologías de la Información   _o)
y las Comunicaciones Aplicadas (ATICA)  / \\
http://www.um.es/atica_(___V
Tfo: 868887590
Fax: 86337




Re: [Dovecot] auth service: out of memory

2012-06-29 Thread Timo Sirainen
On 29.6.2012, at 9.35, Mailing List SVR wrote:

 I have some out of memory errors in my logs (file errors.txt attached)

How large is your auth process's VSZ when it starts up and has handled a couple 
of logins? It's possible that it's not leaking at all, you're just not giving 
enough memory for its normal operation. Some Linux distros nowadays build 
binaries that eat up a lot of VSZ immediately when they start up.



Re: [Dovecot] Removing specific entry in user/auth cache

2012-06-29 Thread Timo Sirainen
On 29.6.2012, at 10.13, Angel L. Mateo wrote:

   Other command it could be usefull is to remove a temporal user-server 
 association in director. For example, I had a downtime in one server, so 
 users normally directed to this server is now been directed to other. Now I 
 want a user to get back to his normal server (force it, I know we willl get 
 back after a timeout), but I don't want to flush all user connections to the 
 backup server.

There's already doveadm director move command.



Re: [Dovecot] auth service: out of memory

2012-06-29 Thread Mailing List SVR

Il 29/06/2012 09:19, Timo Sirainen ha scritto:

On 29.6.2012, at 9.35, Mailing List SVR wrote:


I have some out of memory errors in my logs (file errors.txt attached)

How large is your auth process's VSZ when it starts up and has handled a couple 
of logins? It's possible that it's not leaking at all, you're just not giving 
enough memory for its normal operation. Some Linux distros nowadays build 
binaries that eat up a lot of VSZ immediately when they start up.



ps aux report this:

dovecot   7454  0.0  0.0  85980  3776 ?S09:36   0:00 
dovecot/auth


before restarting dovecot the auth process was running since about 1 
hour and this is the output from ps aux


dovecot  25002  0.0  0.0  86112  3780 ?S08:24   0:00 
dovecot/auth


thanks
Nicola





Re: [Dovecot] auth service: out of memory

2012-06-29 Thread Timo Sirainen
On 29.6.2012, at 10.39, Mailing List SVR wrote:

 Il 29/06/2012 09:19, Timo Sirainen ha scritto:
 On 29.6.2012, at 9.35, Mailing List SVR wrote:
 
 I have some out of memory errors in my logs (file errors.txt attached)
 How large is your auth process's VSZ when it starts up and has handled a 
 couple of logins? It's possible that it's not leaking at all, you're just 
 not giving enough memory for its normal operation. Some Linux distros 
 nowadays build binaries that eat up a lot of VSZ immediately when they start 
 up.
 
 
 ps aux report this:
 
 dovecot   7454  0.0  0.0  85980  3776 ?S09:36   0:00 dovecot/auth
 
 before restarting dovecot the auth process was running since about 1 hour and 
 this is the output from ps aux
 
 dovecot  25002  0.0  0.0  86112  3780 ?S08:24   0:00 dovecot/auth

So you have 44 MB of VSZ available after startup. You also have 10 MB of auth 
cache, which could in reality take somewhat more than 10 MB. It doesn't leave a 
whole lot available for regular use. I'd increase the auth process's VSZ limit 
and see if it still crashes.

If you want to, you could also test with valgrind if there's a leak:

service auth {
  executable = /usr/bin/valgrind --leak-check=full -q /usr/libexec/dovecot/auth
}

You'd then need to restart the auth process to make valgrind output the leaks.

Re: [Dovecot] auth service: out of memory

2012-06-29 Thread Mailing List SVR

Il 29/06/2012 09:45, Timo Sirainen ha scritto:

On 29.6.2012, at 10.39, Mailing List SVR wrote:


Il 29/06/2012 09:19, Timo Sirainen ha scritto:

On 29.6.2012, at 9.35, Mailing List SVR wrote:


I have some out of memory errors in my logs (file errors.txt attached)

How large is your auth process's VSZ when it starts up and has handled a couple 
of logins? It's possible that it's not leaking at all, you're just not giving 
enough memory for its normal operation. Some Linux distros nowadays build 
binaries that eat up a lot of VSZ immediately when they start up.



ps aux report this:

dovecot   7454  0.0  0.0  85980  3776 ?S09:36   0:00 dovecot/auth

before restarting dovecot the auth process was running since about 1 hour and 
this is the output from ps aux

dovecot  25002  0.0  0.0  86112  3780 ?S08:24   0:00 dovecot/auth

So you have 44 MB of VSZ available after startup. You also have 10 MB of auth 
cache, which could in reality take somewhat more than 10 MB. It doesn't leave a 
whole lot available for regular use. I'd increase the auth process's VSZ limit 
and see if it still crashes.


I increased the limit to 192MB or should I set the limit to 256MB or 
more? I'll wait some days to see if still crash




If you want to, you could also test with valgrind if there's a leak:

service auth {
   executable = /usr/bin/valgrind --leak-check=full -q /usr/libexec/dovecot/auth
}

You'd then need to restart the auth process to make valgrind output the leaks.


for now I prefer to avoid valgrind on a production server if the crash 
persist with the new limit I'll setup a test environment and I'll run 
valgrind there,


thanks
Nicola



[Dovecot] Preferred LDAP Attribute for home/mail location

2012-06-29 Thread Edgar Fuß
Is there, among the dovocot community, any preferred LDAP schema and attribute 
to use for setting the home/mail storage location?

Some people seem to use the qmail schema, some a Jamm schema (whatever that 
is), and Markus Effinger has even created a dovecot schema 
(https://www.effinger.org/blog/2009/01/11/eigenes-ldap-schema-erstellen/). 
There may be more.
I could even create my own given we have been assigned an official OID a decade 
ago anyway.

However, sometimes I prefer to use what most other people do.
I would effectively only need to store the name of the relevant NFS server.


[Dovecot] director directing to wrong server (sometimes)

2012-06-29 Thread Angel L. Mateo

Hello,

I have discovered a strange behaviour with director proxying...

	I have a user, its assigned server is 155.54.211.164. The problem is 
that I don't know why director sent him yesterday to a different server, 
because my server was up all the time. Moreover, I'm using poolmon in 
director servers to check availability of final servers and it didn't 
report any problem with the server.


I have two load balanced director servers. Logs at these servers are:

* logs directing him to the correct backend server
Jun 28 08:38:18 myotis42 dovecot: auth: Debug: master in: 
PASS#0111#011user@um.es#011service=lmtp#011lip=155.54.211.185#011lport=24#011rip=155.54.212.168#011rport=52255
Jun 28 08:38:18 myotis42 dovecot: auth: Debug: 
static(user,155.54.212.168): lookup
Jun 28 08:38:18 myotis42 dovecot: auth: Debug: master out: 
PASS#0111#011user=user#011proxy#011proxy_timeout=150
Jun 28 08:38:18 myotis42 dovecot: lmtp(15889): Debug: auth input: 
user=user proxy proxy_timeout=150 host=155.54.211.164 proxy_refresh=450
Jun 28 08:39:59 myotis42 dovecot: auth: Debug: master in: 
PASS#01118#011user@um.es#011service=lmtp#011lip=155.54.211.185#011lport=24#011rip=155.54.212.166#011rport=40008
Jun 28 08:39:59 myotis42 dovecot: auth: Debug: 
static(user,155.54.212.166): lookup
Jun 28 08:39:59 myotis42 dovecot: auth: Debug: master out: 
PASS#01118#011user=user#011proxy#011proxy_timeout=150
Jun 28 08:39:59 myotis42 dovecot: lmtp(15361): Debug: auth input: 
user=user proxy proxy_timeout=150 host=155.54.211.164 proxy_refresh=450


* now, the other director server sends him to an incorrect backend server
Jun 28 09:01:12 myotis41 dovecot: auth: Debug: 
static(user,155.54.66.38): lookup
Jun 28 09:01:12 myotis41 dovecot: auth: Debug: 
static(user,155.54.66.38): Allowing any password
Jun 28 09:01:12 myotis41 dovecot: auth: Debug: client out: 
OK#01134556#011user=user#011proxy#011proxy_timeout=150#011pass=hidden
Jun 28 09:01:12 myotis41 dovecot: auth: Debug: 
static(user,155.54.66.38): lookup
Jun 28 09:01:12 myotis41 dovecot: auth: Debug: 
static(user,155.54.66.38): Allowing any password
Jun 28 09:01:12 myotis41 dovecot: auth: Debug: client out: 
OK#01152763#011user=user#011proxy#011proxy_timeout=150#011pass=hidden
Jun 28 09:01:12 myotis41 dovecot: imap-login: proxy(user): started 
proxying to 155.54.211.162:143: user=user, method=PLAIN, 
rip=155.54.66.38, lip=155.54.211.186
Jun 28 09:01:12 myotis41 dovecot: imap-login: proxy(user): started 
proxying to 155.54.211.162:143: user=user, method=PLAIN, 
rip=155.54.66.38, lip=155.54.211.186
Jun 28 09:01:13 myotis41 dovecot: auth: Debug: 
static(user,155.54.66.38): lookup
Jun 28 09:01:13 myotis41 dovecot: auth: Debug: 
static(user,155.54.66.38): Allowing any password


* Now, the first director sends him to the incorrect one too
Jun 28 09:33:50 myotis42 dovecot: auth: Debug: master in: 
PASS#01132#011user@um.es#011service=lmtp#011lip=155.54.211.185#011lport=24#011rip=155.54.212.168#011rport=46830
Jun 28 09:33:50 myotis42 dovecot: auth: Debug: 
static(user,155.54.212.168): lookup
Jun 28 09:33:50 myotis42 dovecot: auth: Debug: master out: 
PASS#01132#011user=user#011proxy#011proxy_timeout=150
Jun 28 09:33:50 myotis42 dovecot: lmtp(17284): Debug: auth input: 
user=user proxy proxy_timeout=150 host=155.54.211.162 proxy_refresh=450


	I haven't found any error log for the correct backend server between 
the correct redirection and the incorrect one. In fact, I have lot of 
logs of other users directed to it, and logs of the same director 
directing connections to the correct server.


--
Angel L. Mateo Martínez
Sección de Telemática
Área de Tecnologías de la Información   _o)
y las Comunicaciones Aplicadas (ATICA)  / \\
http://www.um.es/atica_(___V
Tfo: 868887590
Fax: 86337



Re: [Dovecot] RAID1+md concat+XFS as mailstorage

2012-06-29 Thread Charles Marcus

On 2012-06-28 4:35 PM, Ed W li...@wildgooses.com wrote:

On 28/06/2012 17:54, Charles Marcus wrote:

RAID10 also statistically has a much better chance of surviving a
multi drive failure than RAID5 or 6, because it will only die if two
drives in the same pair fail, and only then if the second one fails
before the hot spare is rebuilt.



Actually this turns out to be incorrect... Curious, but there you go!


Depends on what you mean exactly by 'incorrect'...

I'm fairly sure that you do not mean that my comment that 'having a hot 
spare is good' is incorrect, so that leaves my last comment above...


I'm far from expert (Stan? Where are you? Am looking forward to your 
comments here), but...



Search google for a recent very helpful expose on this. Basically RAID10
can sometimes tolerate multi-drive failure, but on average raid6 appears
less likely to trash your data, plus under some circumstances it better
survives recovering from a single failed disk in practice


'Sometimes'... '...under some circumstances...' - hey, it's all a 
crapshoot anyway, all you can do is try to make sure the dice aren't 
loaded against you.



The executive summary is something like: when raid5 fails, because at
that point you effectively do a raid scrub you tend to suddenly notice
a bunch of other hidden problems which were lurking and your rebuild
fails (this happened to me...). RAID1 has no better bad block detection
than assuming the non bad disk is perfect (so won't spot latent
unscrubbed errors), and again if you hit a bad block during the rebuild
you loose the whole of your mirrored pair.


Not true (at least not for real hardware based RAID controllers that I 
have ever worked with)... yes, it may revert to degraded mode, but you 
don't just 'lose' the RAID if the rebuild fails.


You can then run filesystem check tools on the system, hopefully 
find/fix the bad sectors, then rebuild the array - I have had to do/done 
this before myself, so I know that this is possible.


Also, modern enterprise SAS drives and RAID controllers do have hardware 
based algorithms to protect data integrity (much better than consumer 
grade drives at least).



So the vulnerability is not the first failed disk, but discovering
subsequent problems during the rebuild.


True, but this applies to every RAID mode (RAID6 included). Also, one 
big disadvantage of RAID5/6 is the rebuild times (sometimes can take 
many hours, or even days depending on drive sizes) - it is the stress of 
the rebuild that often causes a second drive failure, thereby killing 
your RAID, and RAID10 rebuilds happen *much* faster that RAID5/6 
rebuilds (and are less stressful), so there is much less chance of 
losing another disk during a rebuild.



This certainly correlates with my (admittedly limited) experiences.
Disk array scrubbing on a regular basis seems like a mandatory
requirement (but how many people do..?) to have any chance of
actually repairing a failing raid1/5 array


Regular scrubbing is something I will give some thought to, but again, 
your remarks are not 100% accurate... RAID is not quite so fragile as 
you make it out to be.


--

Best regards,

Charles


Re: [Dovecot] RAID1+md concat+XFS as mailstorage

2012-06-29 Thread Charles Marcus
On 2012-06-29 2:19 AM, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl 
wrote:

Has anyone tried or benchmarked ZFS, perhaps ZFS+NFS as backing store for



yes. long time ago. ZFS isn't useful for anything more than a toy. I/O
performance is just bad.


Please stop with the FUD... 'long time ago'? No elaboration on what 
implementation/platform you 'played with'?


With a proper implementation, ZFS is an excellent, mature, reliable 
option for storage... maybe not quite the fastest/highest performing 
screaming speed demon, but enterprises are concerned with more than just 
raw performance - in fact, data integrity tops the list.


http://www.nexenta.com/corp/nexentastor

http://www.freenas.org/

Yes, the LINUX version has a long way to go (due to stupid licensing 
restrictions it must be rewritten from scratch to get into the kernel), 
but personally I'm chomping at the bit for BTRFS, which looks like it is 
coming closer to usability for production systems (just got a basic fsck 
tool which now just needs to be perfected).


--

Best regards,

Charles


Re: [Dovecot] RAID1+md concat+XFS as mailstorage

2012-06-29 Thread Dr Josef Karthauser
Kelsey Cummings wrote:
 On 06/28/12 05:56, Ed W wrote:
 So given the statistics show us that 2 disk failures are much more
 common than we expect, and that silent corruption is likely occurring
 within (larger) real world file stores,  there really aren't many battle
 tested options that can protect against this - really only RAID6 right
 now and that has significant limitations...
 
 Has anyone tried or benchmarked ZFS, perhaps ZFS+NFS as backing store 
 for spools?  Sorry if I've missed it and this has already come up. 
 We're using Netapp/NFS, and are likely to continue to do so but still 
 curious.

Hi Kelsey,

We're running ZFS here, and have just started using dovecot on it. No stats yet 
to report, but you might be interested in this edge case. One of our server 
started behaving badly... the database would randomly crash and not restart due 
to corrupted indexed. It turns out that the memory had gone bad, and that it 
had been bad for a while. Disk blocks were getting corrupted on read, and some 
on write! Luckly because we were on ZFS, which checksums all data, we were able 
to detect and repair most of the data (some 80mb of bad blocks distributed 
evenly thoughout the entire file system!) automatically, and also know exactly 
which files were unrecoverable (in the end just two or three files!). Also, we 
have hourly snapshots of all the file systems, so we were able to recover older 
versions of those files with minimal loss.

I will never rely on a non-checksumming file system for production use again, 
for data that is existed to persist over time.

Joe

Re: [Dovecot] RAID1+md concat+XFS as mailstorage

2012-06-29 Thread Charles Marcus

On 2012-06-29 1:02 AM, Dr Josef Karthauser j...@tao.org.uk wrote:

I will never rely on a non-checksumming file system for production
use again, for data that is existed to persist over time.


Nice! I'm seriously considering buying a Nexenta Storage device if/when 
our storage needs require it... this just makes me want it more. :)


Out of curiosity, were you using proper ECC memory? Ie, why did the bad 
memory go undetected for so long?


--

Best regards,

Charles


Re: [Dovecot] RAID1+md concat+XFS as mailstorage

2012-06-29 Thread Ed W

On 29/06/2012 12:15, Charles Marcus wrote:

On 2012-06-28 4:35 PM, Ed W li...@wildgooses.com wrote:

On 28/06/2012 17:54, Charles Marcus wrote:

RAID10 also statistically has a much better chance of surviving a
multi drive failure than RAID5 or 6, because it will only die if two
drives in the same pair fail, and only then if the second one fails
before the hot spare is rebuilt.



Actually this turns out to be incorrect... Curious, but there you go!


Depends on what you mean exactly by 'incorrect'...


I'm sorry, this wasn't meant to be an attack on you, I thought I was 
pointing out what is now fairly obvious stuff, but it's only recently 
that the maths has been popularised by the common blogs on the 
interwebs.  Whilst I guess not everyone read the flurry of blog articles 
about this last year, I think it's due to be repeated in increasing 
frequency as we go forward:


The most recent article which prompted all of the above is I think this one:
http://queue.acm.org/detail.cfm?id=1670144
More here (BARF = Battle Against Raid 5/4)
http://www.miracleas.com/BAARF/

There are some badly phrased ZDnet articles also if you google raid 5 
stops working in 2009


Intel have a whitepaper which says:

   Intelligent RAID 6 Theory Overview And Implementation

   RAID 5 systems are commonly deployed for data protection in most
   business environments. However, RAID 5 systems only tolerate a
   single drive failure, and the probability of encountering latent
   defects [i.e. UREs, among other problems] of drives approaches 100
   percent as disk capacity and array width increase.



The upshot is that:
- Drives often fail slowly rather than bang/dead
- You will only scrub the array on a frequency F, which means that 
faults can develop since the last scrub (good on you if you actually 
remembered to set an automatic regular scrub...)
- Once you decide to pull a disk for some reason to replace it, then 
with RAID1/5 (raid1 is a kind of degenerate form of raid5) you are 
exposed in that if a *second* error is detected during the rebuild then 
you are inconsistent and have no way to correctly rebuild your entire array
- My experience is that linux-raid will stop the rebuild if a second 
error is detected during rebuild, but with some understanding it's 
possible to proceed (obviously understanding that data loss has 
therefore occurred).  However, some hardware controllers will kick out 
the whole array if a rebuild error is discovered- some will not, but 
given the probability of a second error being discovered during rebuild 
is significantly non zero, it's worth worrying over this and figuring 
out what you do if it happens...



I'm fairly sure that you do not mean that my comment that 'having a 
hot spare is good' is incorrect,


Well, hotspare seems like a good idea, but the point is that the 
situation will be that you have lost parity protection.  At that point 
you effectively run a disk scrub to rebuild the array.  The probability 
of discovering a second error somewhere on your remaining array is non 
zero and hence your array has lost data.  So it's not about how quickly 
you get the spare in, so much as the significant probability that you 
have two drives with errors, but only one drive of protection


Raid6 increases this protection *quite substantially*, because if a 
second error is found on a stripe, then you still haven't lost data.  
However, a *third* error on a single stripe will lose data.


The bad news: Estimates suggest that drive sizes will become large 
enough that RAID6 is insufficient to give a reasonable probability of 
successful repair of a single failed disk in around 7+ years time.  So 
at that point there becomes a significant probability that the single 
failed disk cannot be successfully replaced in a RAID6 array because of 
the high probability of *two* additional defects becoming discovered on 
the same stripe of the remaining array.  Therefore many folks are 
requesting 3 disk parity to be implemented (RAID7?)



'Sometimes'... '...under some circumstances...' - hey, it's all a 
crapshoot anyway, all you can do is try to make sure the dice aren't 
loaded against you.


And to be clear - RAID5/RAID1 has a very significant probability that 
once your first disk has failed, in the process of replacing that disk 
you will discover an unrecoverable error on your remaining drive and 
hence you have lost some data...



Also, modern enterprise SAS drives and RAID controllers do have 
hardware based algorithms to protect data integrity (much better than 
consumer grade drives at least).


I can't categorically disagree, but I should check carefully your 
claims?  My understanding is that there is minimal additional protection 
from enterprise stuff, and by that I'm thinking of quality gear that I 
can buy from the likes of newegg/ebuyer, not the custom SAN products 
from certain big name providers.  It seems possible that the big name 
SAN providers implement additional 

[Dovecot] doveadm purge -A via doveadm-proxy director fails after some users

2012-06-29 Thread Daniel Parthey

Hi,

we have configured userdb and passdb in the director and try to  
iterate all users and pass the purge command via doveadm proxy to  
port 19000 on the correct director backend host.


A single purge -u usern...@example.org via doveadm-proxy works correctly,
but iterating over some users with -A fails.

Note: users/domains have been anonymized in output:



mail04:~# /usr/bin/doveadm -c  
/etc/dovecot-director/dovecot-director.conf -D purge -A 21
doveadm(root): Debug: Loading modules from directory:  
/usr/lib/dovecot/modules/doveadm
doveadm(root): Debug: Skipping module doveadm_acl_plugin, because  
dlopen() failed:  
/usr/lib/dovecot/modules/doveadm/lib10_doveadm_acl_plugin.so:  
undefined symbol: acl_user_module (this is usually intentional, so  
just ignore this message)
doveadm(root): Debug: Skipping module doveadm_expire_plugin, because  
dlopen() failed:  
/usr/lib/dovecot/modules/doveadm/lib10_doveadm_expire_plugin.so:  
undefined symbol: expire_set_lookup (this is usually intentional, so  
just ignore this message)
doveadm(root): Debug: Skipping module doveadm_quota_plugin, because  
dlopen() failed:  
/usr/lib/dovecot/modules/doveadm/lib10_doveadm_quota_plugin.so:  
undefined symbol: quota_user_module (this is usually intentional, so  
just ignore this message)
doveadm(root): Debug: Skipping module doveadm_zlib_plugin, because  
dlopen() failed:  
/usr/lib/dovecot/modules/doveadm/lib10_doveadm_zlib_plugin.so:  
undefined symbol: i_stream_create_deflate (this is usually  
intentional, so just ignore this message)
doveadm(root): Debug: Skipping module doveadm_fts_plugin, because  
dlopen() failed:  
/usr/lib/dovecot/modules/doveadm/lib20_doveadm_fts_plugin.so:  
undefined symbol: fts_list_backend (this is usually intentional, so  
just ignore this message)
doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.193  
proxy_refresh=86400
doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.193  
proxy_refresh=86400
doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.192  
proxy_refresh=86400
doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.192  
proxy_refresh=86400
doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.190  
proxy_refresh=86400
doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.193  
proxy_refresh=86400
doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.190  
proxy_refresh=86400
doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.193  
proxy_refresh=86400
doveadm(use...@domain2.example.org): Debug: auth input:  
user=use...@domain2.example.org proxy host=10.129.3.193  
proxy_refresh=86400
doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.190  
proxy_refresh=86400
10 / 94doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.190  
proxy_refresh=86400
doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.191  
proxy_refresh=86400
doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.193  
proxy_refresh=86400
doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.190  
proxy_refresh=86400
doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.193  
proxy_refresh=86400
doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.191  
proxy_refresh=86400
doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.191  
proxy_refresh=86400
doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.193  
proxy_refresh=86400
doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.193  
proxy_refresh=86400
doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.192  
proxy_refresh=86400
20 / 94doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.193  
proxy_refresh=86400
doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.193  
proxy_refresh=86400
doveadm(use...@domain1.example.org): Debug: auth input:  
user=use...@domain1.example.org proxy host=10.129.3.193  

Re: [Dovecot] Preferred LDAP Attribute for home/mail location

2012-06-29 Thread Patrick Ben Koetter
* Edgar Fuß e...@math.uni-bonn.de:
 Is there, among the dovocot community, any preferred LDAP schema and
 attribute to use for setting the home/mail storage location?

There are many. Here's another one:

http://www.postfix-buch.com/download/postfix-book.schema.gz


-- 
state of mind ()

http://www.state-of-mind.de

Franziskanerstraße 15  Telefon +49 89 3090 4664
81669 München  Telefax +49 89 3090 4666

Amtsgericht MünchenPartnerschaftsregister PR 563



Re: [Dovecot] lmtp proxy timeout while waiting for reply to DATA reply

2012-06-29 Thread Daniel Parthey
Timo Sirainen wrote:
 On Sat, 2012-04-28 at 13:00 +0200, Daniel Parthey wrote:
 
  we are experiencing similar sporadic data timeout issues with dovecot 2.0.20
  as in http://dovecot.org/pipermail/dovecot/2011-June/059807.html
  at least once a week. Some mails get temporarily deferred in the
  postfix queue since dovecot director lmtp refuses them and the
  mails are delivered at a later time.
 
 What isn't in v2.0 is the larger rewrite of the LMTP proxying
 code in v2.1, which I hope fixes also this timeout problem.

Same problem persists after update to 2.1.7, especially for distribution
lists which contain several target email addresses which are then
pipelined by postfix through a single lmtp proxy connection:

Jun 29 10:14:03 10.129.3.233 postfix/lmtp[29674]: 00318C090: 
to=use...@example.org, orig_to=emai...@example.org, 
relay=127.0.0.1[127.0.0.1]:20024, delay=31, delays=1/0.16/0.01/30, dsn=4.4.0, 
status=deferred (host 127.0.0.1[127.0.0.1] said: 451 4.4.0 Remote server not 
answering (timeout while waiting for reply to DATA reply) (in reply to end of 
DATA command))
Jun 29 10:14:03 10.129.3.233 postfix/lmtp[29674]: 00318C090: 
to=use...@example.org, orig_to=emai...@example.org, 
relay=127.0.0.1[127.0.0.1]:20024, delay=31, delays=1/0.16/0.01/30, dsn=4.4.0, 
status=deferred (host 127.0.0.1[127.0.0.1] said: 451 4.4.0 Remote server not 
answering (timeout while waiting for reply to DATA reply) (in reply to end of 
DATA command))
Jun 29 10:14:03 10.129.3.233 postfix/lmtp[29674]: 00318C090: 
to=use...@example.org, orig_to=emai...@example.org, 
relay=127.0.0.1[127.0.0.1]:20024, delay=31, delays=1/0.16/0.01/30, dsn=4.4.0, 
status=deferred (host 127.0.0.1[127.0.0.1] said: 451 4.4.0 Remote server not 
answering (timeout while waiting for reply to DATA reply) (in reply to end of 
DATA command))
Jun 29 10:14:03 10.129.3.233 postfix/lmtp[29674]: 00318C090: 
to=use...@example.org, orig_to=emai...@example.org, 
relay=127.0.0.1[127.0.0.1]:20024, delay=31, delays=1/0.16/0.01/30, dsn=4.4.0, 
status=deferred (host 127.0.0.1[127.0.0.1] said: 451 4.4.0 Remote server not 
answering (timeout while waiting for reply to DATA reply) (in reply to end of 
DATA command))
Jun 29 10:14:03 10.129.3.233 postfix/lmtp[29674]: 00318C090: 
to=use...@example.org, orig_to=emai...@example.org, 
relay=127.0.0.1[127.0.0.1]:20024, delay=31, delays=1/0.16/0.01/30, dsn=4.4.0, 
status=deferred (host 127.0.0.1[127.0.0.1] said: 451 4.4.0 Remote server not 
answering (timeout while waiting for reply to DATA reply) (in reply to end of 
DATA command))
Jun 29 10:14:03 10.129.3.233 postfix/lmtp[29674]: 00318C090: 
to=use...@example.org, orig_to=emai...@example.org, 
relay=127.0.0.1[127.0.0.1]:20024, delay=31, delays=1/0.16/0.01/30, dsn=4.4.0, 
status=deferred (host 127.0.0.1[127.0.0.1] said: 451 4.4.0 Remote server not 
answering (timeout while waiting for reply to DATA reply) (in reply to end of 
DATA command))
Jun 29 10:14:03 10.129.3.233 postfix/lmtp[29674]: 00318C090: 
to=use...@example.org, orig_to=emai...@example.org, 
relay=127.0.0.1[127.0.0.1]:20024, delay=31, delays=1/0.16/0.01/30, dsn=4.4.0, 
status=deferred (host 127.0.0.1[127.0.0.1] said: 451 4.4.0 Remote server not 
answering (timeout while waiting for reply to DATA reply) (in reply to end of 
DATA command))
Jun 29 10:14:03 10.129.3.233 postfix/lmtp[29674]: 00318C090: 
to=use...@example.org, orig_to=emai...@example.org, 
relay=127.0.0.1[127.0.0.1]:20024, delay=31, delays=1/0.16/0.01/30, dsn=4.4.0, 
status=deferred (host 127.0.0.1[127.0.0.1] said: 451 4.4.0 Remote server not 
answering (timeout while waiting for reply to DATA reply) (in reply to end of 
DATA command))
Jun 29 10:14:03 10.129.3.233 postfix/lmtp[29674]: 00318C090: 
to=use...@example.org, orig_to=emai...@example.org, 
relay=127.0.0.1[127.0.0.1]:20024, delay=31, delays=1/0.16/0.01/30, dsn=4.4.0, 
status=deferred (host 127.0.0.1[127.0.0.1] said: 451 4.4.0 Remote server not 
answering (timeout while waiting for reply to DATA reply) (in reply to end of 
DATA command))
Jun 29 10:14:03 10.129.3.233 postfix/lmtp[29674]: 00318C090: 
to=use...@example.org, orig_to=emai...@example.org, 
relay=127.0.0.1[127.0.0.1]:20024, delay=31, delays=1/0.16/0.01/30, dsn=4.4.0, 
status=deferred (host 127.0.0.1[127.0.0.1] said: 451 4.4.0 Remote server not 
answering (timeout while waiting for reply to DATA reply) (in reply to end of 
DATA command))
Jun 29 10:14:03 10.129.3.233 postfix/lmtp[29674]: 00318C090: 
to=use...@example.org, orig_to=emai...@example.org, 
relay=127.0.0.1[127.0.0.1]:20024, delay=31, delays=1/0.16/0.01/30, dsn=4.4.0, 
status=deferred (host 127.0.0.1[127.0.0.1] said: 451 4.4.0 Remote server not 
answering (timeout while waiting for reply to DATA reply) (in reply to end of 
DATA command))
Jun 29 10:14:03 10.129.3.233 postfix/lmtp[29674]: 00318C090: 
to=use...@example.org, orig_to=emai...@example.org, 
relay=127.0.0.1[127.0.0.1]:20024, delay=31, delays=1/0.16/0.01/30, dsn=4.4.0, 
status=deferred (host 127.0.0.1[127.0.0.1] said: 451 

[Dovecot] Proxy config help please

2012-06-29 Thread Zac Israel
Hello, I am new to dovecot and I am initially trying to setup a basic
imap proxy with password forwarding, I can start the dovecot service,
connect and give it my password, and that is where I hang.  My config
is:

root@imap-test:/etc/dovecot# doveconf -n
# 2.0.19: /etc/dovecot/dovecot.conf
# OS: Linux 3.2.0-24-generic x86_64 Ubuntu 12.04 LTS
auth_debug = yes
auth_verbose = yes
debug_log_path = syslog
first_valid_uid = 100
imap_capability = CAPABILITY IMAP4rev1 ACL BINARY CATENATE CHILDREN
CONDSTORE ENABLE ESEARCH ESORT I18NLEVEL=1 ID IDLE LIST-EXTENDED
LIST-STATUS LITERAL+ LOGIN-REFERRALS MULTIAPPEND NAMESPACE QRESYNC
QUOTA RIGHTS=ektx SASL-IR SEARCHRES SORT THREAD=ORDEREDSUBJECT UIDPLUS
UNSELECT WITHIN XLIST
last_valid_uid = 200
mail_debug = yes
mail_gid = 107
mail_uid = 107
passdb {
  args = proxy=proxy_always nopassword=y host=172.16.0.13 port=143
proxy_timeout=5 starttls=y ssl=any-cert
  driver = static
}
protocols = imap
service imap-login {
  inet_listener imap {
address = *
port = 143
  }
}
ssl = required
ssl_cert = /etc/courier/imapd.pem
ssl_key = /etc/courier/imapd.pem
verbose_ssl = yes

The system at 172.16.0.13 is a zimbra proxy.  I can see in the logs
that it initially complains about my ssl cert, and if I remove
ssl=any-cert it fails because my cert is self signed, so I know it is
talking to the proxy and doing starttls which is a requirement of
zimbra.  Unfortunately I have not found a way to see the full exchange
between dovecot and my zimbra proxy other than tcp dump, which just
shows a small packet exchange.  Please let me know if I can provide
any other information and thanks in advance for any help.

-Zac


Re: [Dovecot] What does namespace inbox {... mean

2012-06-29 Thread Thomas Hochstein
Jonathan Ryshpan schrieb:

 It appears from the wiki that the word following the namespace
 declarator (if this is the right word) should be either public,
 shared, or private, and describes a property of the namespace being
 declared. 

AFAIS the word following the keyword namespace is the name (of the
namespace). The type (public, shared or private) is declared by
using a type definition.

 So what does:
 namespace inbox {...
 mean?

That is a definition of a namespace named inbox.

-thh


Re: [Dovecot] director directing to wrong server (sometimes)

2012-06-29 Thread Daniel Parthey
Hi Angel,

Angel L. Mateo wrote:
 I have a user, its assigned server is 155.54.211.164. The problem
 is that I don't know why director sent him yesterday to a different
 server, because my server was up all the time. Moreover, I'm using
 poolmon in director servers to check availability of final servers
 and it didn't report any problem with the server.

Which version of dovecot are you using?
doveconf -n of director and mailbox instance?

You should monitor the output of
  doveadm director status usern...@example.org
  doveadm director ring status
on each of the directors over time with a timestamp.

This might shed some light on where the user is directed and why,
and ring status will tell which directors can see each other.
doveadm director move can also influence where a user is sent,
but this will be reflected by Current: entry of director status,
there you can also find the time when the entry in hashtable
will expire.

Regards
Daniel
-- 
https://plus.google.com/103021802792276734820


Re: [Dovecot] RAID1+md concat+XFS as mailstorage

2012-06-29 Thread Stan Hoeppner
On 6/28/2012 7:15 AM, Ed W wrote:
 On 28/06/2012 13:01, Костырев Александр Алексеевич wrote:

 somewhere in maillist I've seen RAID1+md concat+XFS being promoted as
 mailstorage.
 Does anybody in here actually use this setup?

 I've decided to give it a try,
 but ended up with not being able to recover any data off survived
 pairs from linear array when _the_first of raid1 pairs got down.

The failure of the RAID1 pair was due to an intentional breakage test.
Your testing methodology was severely flawed.  The result is the correct
expected behavior of your test methodology.  Proper testing will yield a
different result.

One should not be surprised that something breaks when he intentionally
attempts to break it.

 This is the configuration endorsed by Stan Hoeppner.

Yes.  It works very well for metadata heavy workloads, i.e. maildir.

-- 
Stan