Re: choosing a file system

2009-01-18 Thread Andrew McNamara
Yeah, except Postfix encodes the inode of the queue files in its queue
IDs, so it gets very confused if you do this.  Same with restoring
queues from backups.

You should be able to get away with this if, when moving the queue to
another machine, you move the queued mail from hold, incoming, active and
deferred directories into the maildrop directory on the target instance.

This (somewhat old, but still correct, I think) message from Wietse
might shed more light on it:

Date: Thu, 12 Sep 2002 20:33:08 -0400 (EDT)
From: wie...@porcupine.org (Wietse Venema)
Subject: Re: postfix migration

 I want to migrate postfix to another machine. What are also the steps so 
 that I won't lose mails on the process?

This is the safe procedure.

1) On the old machine, stop Postfix.

2) On the old machine, run as super-user:

postsuper -r ALL

   This moves all queue files to the maildrop queue.

3) On the old machine, back up /var/spool/postfix/maildrop

4) On the new machine, make sure Postfix works.

5) On the new machine, stop Postfix.

6) On the new machine, restore /var/spool/postfix/maildrop

7) On the new machine, start Postfix.

There are ways to skip the postsuper -r ALL step, and copy the
incoming + active + deferred + bounce + defer + flush + hold
directories to the new machine, but that would be safe only with
an empty queue on the new machine.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be lockingissues

2008-03-04 Thread Andrew McNamara
it takes long enough to rebuild an array with large drives that the
chances of a second drive failing during the rebuild become noticable.

Worse, the act of rebuilding can prompt a second, marginal disk to fail.
Presumably the mechanics are the head runs through a patch of debris in
an otherwise rarely accessed area of the disk, or the increased load causes
thermal problems.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: LARGE single-system Cyrus installs?

2007-11-26 Thread Andrew McNamara
 Note that ext3 effectively does the same thing as ZFS on fsync() - because
 the journal layer is block based and does no know which block belongs
 to which file, the entire journal must be applied to the filesystem to
 achieve the expected fsync() symantics (at least, with data=ordered,
 it does).

Well, does not know which block belongs to which file sounds weird. :)

With data=ordered, the journal holds only metadata. If you fsync() a
file, ordered means that ext3 syncs the data blocks first (with no
overhead, just like any other filesystem, of course it knows what blocks
to write), then the journal.

Now, yes, the journal possibly contains metadata updates for other files
too, and the ordered semantics requires the data blocks of those files
to be synced as well, before the journal sync.

I'm not sure if a fsync() flushes the whole journal or just up to the
point it's necessary (that is, up to the last update on the file you're
fsync()ing).

The ext3 journalling layer only knows about blocks. When using
data=ordered, only metadata *blocks* are tracked by the journalling layer.
The journalling layer does not know which data blocks correspond to
which metadata block, so everything is forced out.

Most other journalling file systems operate within the filesystem
abstraction, and journal atomic filesystem operations, which leaves them
better able to implemented sane fsync() symantics.

data=writeback is what some (most) other journalled filesystems do.
Metadata updates are allowed to hit the disk _before_ data updates. So,
on fsync(), the FS writes all data blocks (still required by fsync()
semantics), then the journal (or part of it), but if updates of other
files metadata are included in the journal sync, there's not need to
write the corresponding data blocks. They'll be written later, and
they'll hit the disk _after_ the metadata changes.

This is possible because those other journals operate at the filesystem,
not block level.

If power fails in between, you can have a file whose size/time is
updated, but contents not. That's the problem with data=writeback, but
it should be noted that's pretty normal for other journalled
filesystems, too. It applies only to files that were not fsync()'ed.

And, in this case, you're no worse off than you would have been with a
traditional filesystem such as UFS.

I think that if you're running into performance problems, and your
system is doing a lot of fsync(), data=orderer is the worst option.

You're assuming fsync() behaviour changes with the other data=
options - have you looked into it? I'm wary because the ext3 guys have
a long history of simply not getting what fsync() is for, what it's
supposed to do, and why it's important. I recently asked Andrew Morton
whether fsync() behaviour changed with data= options, but he couldn't
remember, and I haven't had time to look into it myself.

data=journal is fsync()-friendly in one sense, it does write
*everything* out, but in one nice sequential (thus extremely fast) shot.
Later, data blocks will be written again to the right places. It doubles
the I/O bandwith requirements, but if you have a lot of bandwidth, it
may be a win. We're talking sequential write bandwidth, which is hardly
a problem.

This is true, right up until the point you fill the journal... 8-)

data=writeback is fsync() friendly in the sense that it writes only the
data blocks of the fsync()'ed file plus (all) metadata. It's the lowest
overhead option.

If you have a heavy sustained write traffic _and_ lots of fsync()'s,
then data=writeback may be the only option.

I think some people are scared by data=writeback, but they don't realize
it's just what other journalled FS do. I'm not familiar with ReiserFS,
it think it's metadata-only as well.

Certainly data journalling is the exception, rather than the rule. Off
the top of my head, I can't think of another mainstream filesystem that
does it (aside from the various log-structured filesystems such as Waffle
and Reiser4).

data=ordered is good, for general purpose systems. For any application
that uses fsync(), it's useless overhead.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: LARGE single-system Cyrus installs?

2007-11-26 Thread Andrew McNamara
 Certainly data journalling is the exception, rather than the rule.
 Off the top of my head, I can't think of another mainstream
 filesystem that does it (aside from the various log-structured
 filesystems such as Waffle and Reiser4).

AFAIK you get it with UFS + gjournal, dunno if that counts as main 
stream though :)

Gjournal sounds like it's block level, like ext3, so would suffer
the same sorts of shortcomings in this application.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: LARGE single-system Cyrus installs?

2007-11-19 Thread Andrew McNamara
 In production releases of ZFS fsync() essentially triggers sync() (fixed in 
 Solaris Next).  
[...]
Skiplist requires two fsync calls per transaction (single
untransactioned actions are also one transaction), and it
also locks the entire file for the duration of said 
transaction, so you can't have two writes happening at
once.  I haven't built Cyrus on our Solaris box, so I don't
know if it uses fcntl there, it certainly does on the Linux
systems, but it can fall back to flock if fcntl isn't
available.

Note that ext3 effectively does the same thing as ZFS on fsync() - because
the journal layer is block based and does no know which block belongs
to which file, the entire journal must be applied to the filesystem to
achieve the expected fsync() symantics (at least, with data=ordered,
it does).

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: mail coming without MX; how ?

2007-04-26 Thread Andrew McNamara
if you check the domain infoservices.in  with  dnsstuff.com  you can
see no MX for that domain.
But still mail is coming at [EMAIL PROTECTED]we are using it for
our official purposes and infoservices.in is our official site too.

I wounder how mail is still coming with out MX ? could any one kindly
explain ?

Most Mail Transport Agents will fall back to the A record if no MX
records are found. This precident was set by sendmail, and woe betide
any implementation that ignores precident, but it would be foolish to
count on all MTAs behaving this way.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: mail coming without MX; how ?

2007-04-26 Thread Andrew McNamara
 Most Mail Transport Agents will fall back to the A record if no MX
 records are found. This precident was set by sendmail, and woe betide
 any implementation that ignores precident, but it would be foolish to
 count on all MTAs behaving this way.

With SMTP to you _can_ count on this behaviour. Quoting RFC 2821:

  5. Address Resolution and Mail Handling
  [...]
  The lookup first attempts to locate an MX
  record associated with the name. [...] If
  no MX records are found, but an A RR is
  found, the A RR is treated as if it was
  associated with an implicit MX RR, with a
  preference of 0, pointing to that host.

RFC 2821 is relatively new (certainly newer than most MTAs), and while
the popular ones have made some effort to comply with it, many others
still struggle to comply with RFC 821.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: improving concurrency/performance (fwd)

2005-11-09 Thread Andrew McNamara
 This guy is having a problem with cyrus-imap and ext3 - when multiple
 processes are attempting to write to the one filesystem (but not the one
 file), performance drops to next to nothing when only five processes are
 writing. An strace shows most of the time is being spent in fdatasync
 and fsync.

Actually, the thread just got off topic quickly -- I'm running this on
reiserfs, not ext3.  ...And I've got it mounted with data=writeback, too.
But thanks for the info, Andrew.

Sorry, my confusion. But it might be worth asking the reiserfs guys.

My experience has been that if you are fsync'ing files, then even modern
disks only get around 10 fsync's per second (because not only does the
file data get writen out, but typically the inode, the directory entry,
the free block table and maybe even all the directory entries up to root).

Journalling can help, because the commited data is writen sequentially
to the journal, rather than being scattered all over the disk, but the
journalled operations still need to be applied to the filesystem sooner
or later.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: improving concurrency/performance (fwd)

2005-11-08 Thread Andrew McNamara
I forwarded John's message to Andrew Morton, linux kernel maintainer, and
this is his reply (it was cc'ed to the list, but, not being a subscriber,
I presume it bounced):

--- Forwarded Message

Date:Tue, 08 Nov 2005 15:21:31 -0800
From:Andrew Morton [EMAIL PROTECTED]
To:  Andrew McNamara [EMAIL PROTECTED]
cc:  John Madden [EMAIL PROTECTED],
 info-cyrus@lists.andrew.cmu.edu
Subject: Re: improving concurrency/performance (fwd)

Andrew McNamara [EMAIL PROTECTED] wrote:

 This guy is having a problem with cyrus-imap and ext3 - when multiple
 processes are attempting to write to the one filesystem (but not the one
 file), performance drops to next to nothing when only five processes are
 writing. An strace shows most of the time is being spent in fdatasync
 and fsync.
 
 ...

Yes, on ext3, an fsync() syncs the entire filesystem.  It has to, because
all the metadata for each file is shared - it's just a string of
journallable blocks.  Similar story with the data, in ordered mode.

So effectively, fsync()ing five files one time each is performing 25 fsync()s.

One fix (which makes the application specific to ext3 in ordered-data or
journalled-data mode) is to perform a single fsync(), with the
understanding that this has the side-effect of fsyncing all the other
files.  That's an ugly solution and is rather hard to do if the workload
consists of five separate processes!

So I'd recommending mounting the filesystem with the `-o data=writeback'
mode.  This way, each fsync(fd) will sync fd's data only.  That's much
better than the default data-ordered mode, wherein a single fsync() will
sync all the other file's data too.

In data=writeback mode it is still the case that fsync(fd) will sync the
other file's metadata, but that's a single linear write to the journal and
the additional cost should be low.

Bottom line: please try data=writeback, let me know.


 --- Forwarded Message
 
 Date:Tue, 08 Nov 2005 09:25:54 -0500
 From:John Madden [EMAIL PROTECTED]
 To:  Jure =?iso-8859-1?Q?Pe=E8ar?= [EMAIL PROTECTED]
 cc:  info-cyrus@lists.andrew.cmu.edu
 Subject: Re: improving concurrency/performance
 
  As expected, these are from locking operations. 0x8 is file descriptor,
  which, if I read lsof output correctly, points to config/socket/imap-0.lock
  (what would that be?) and 0x7 is F_SETLKW which reads as set lock or wait
  for it to be released in the manual page.
 
 Yup, that's exactly the sort of thing I was suspecting -- the performance I
 was seeing just didn't make sense.
 
 imap-0.lock is in /var/imap/socket for me.  I believe it's one of the lock
 file s created when cyrus is started, so it wouldn't make any sense for
 imapd to ever be spinning on it.
 
 The delays I was seeing ocurred when multiple imapd's were writing to the
 spool at the same time.  I do see a lot of this though:
 
 fcntl(6, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
 
 It looks like the lock to open a file in the target mailbox.  But again,
 very l ow actual throughput and still little or no iowait.  However, adding
 a -c to the strace, the top three syscalls are:
 
 % time seconds  usecs/call callserrors syscall
 - -- --- --- - - 
  52.680.5147201243   414   fdatasync
  29.870.291830 846   345   fsync
   4.190.040898  27  1519   fcntl
 
 Makes me wonder why the fsync's are taking so long since the disk is
 performing so well.  Anyone know if that's actually typical?
 
 -- 
 John Madden
 UNIX Systems Engineer
 Ivy Tech Community College of Indiana
 [EMAIL PROTECTED]
 
 - 
 Cyrus Home Page: http://asg.web.cmu.edu/cyrus
 Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
 
 --- End of Forwarded Message

--- End of Forwarded Message


Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Cyrus, NFS and mail spools

2004-09-08 Thread Andrew McNamara
Ken Murchison wrote:
As far as I'm concerned, NFS still is not an option for Cyrus for all of 
the reasons that have been outlined in the past.  Cyrus 2.3 *might* work 
with NFS, but I'm not making any guarantees.

For what it's worth, we've been running Cyrus 2.1 in production on
NFS for about a year now. Approximately six Cyrus instances running
under Solaris share a high-availability NetApp filler, shifting about
1TB of mail per week without problem.

We had to make a few small modifications to Cyrus. I think these have
all been discussed on the list at some time - things like not holding
files open across rmdir calls. 

I would suggest the specific combination of NFS client and NFS server was
important - I doubt any other combination would have been as successful.

One important detail - we are using local locking (undocumented NFS
mount option llock). When network locking is enabled (default), the
Solaris NFS client disables all client-side caching of locked files,
which results in excessive I/O rates. Using llock allows client-side
caching of locked files, but makes it absolutely critical that only one
Cyrus instance accesses a given volume at any time, and we go to great
lengths to ensure this is the case.

I'm not sure we would make the same choice again, but when project was
initiated SANs were not mature enough, and we had extensive experience
in running the Solaris/NetApp combination in demanding applications
(among other things, a very busy multi-terabyte Oracle instance).

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: v2.1.14 freeze from LIST %

2004-07-26 Thread Andrew McNamara
 Are you sure this is actually going across the wire?  This is a new one.

 Yes, just tested over a terminal connection:
 * OK IMAP4 Ready *** 0001dcc5
 0 LOGIN  
 0 OK You are so in
 1 LIST  %

This is not a cyrus imap server (we don't return 'You are so in' in
response to a login command).  Perhaps you should contact your IMAP server
vendor.

It's the Perdition IMAP proxy, if I remember correctly. After the
successful login response, he should be talking directly to whatever is
behind Perdition (presumably cyrus).

The original correspondent should try talking directly to his IMAP server,
if that's possible, to rule Perdition out of the equation.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Open file limits

2004-07-06 Thread Andrew McNamara
   Can anyone share experiences with running out of open files on Linux?
I am using a 2.4.26 kernel, and the system wide open file limit is 
rather large.  Do I need to set anything other than this?  The default 
limit of 1024 is in effect for both cyrus and root.

Off the top of my head, there are four main areas where you run into
trouble (only three relevent to linux):

 1. total open fd's in the system
 2. user fd limits (ulimit - typically 1024)
 3. the select() call (typically 1024)
 4. old stdio implementations (256)

The first two you probably know about, although you may not know about the
third and fouth. The select() function usually has a limit of 1024 file
descriptors - this is because it uses an implementation-defined bitmap
to signal interest and status of each file descriptor. The FD_SETSIZE
constant (defined in sys/types.h) tells you the size of the bitmap.

The fourth will bite you on what I'll rudely call legacy unix systems,
eg Solaris. I haven't checked versions after Solaris 8, but the fd
field in the stdio structure was traditionally an unsigned char value,
and in the bad old days, apps would mess around inside this structure.
Presumably because they have customers with grungy old apps, Sun has
retained this historical anacronism.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Cyrus imapd dying on me

2004-01-26 Thread Andrew McNamara
(gdb) bt
#0  0x805fc02 in index_fetchreply ()
Cannot access memory at address 0xbfbfdbd0.

The stack is corrupt - while not conclusive, I'd considering a hardware
problem (power supply drooping, ram flakey, overheating, etc). Have a look
at:

http://www.bitwizard.nl/sig11/

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/


Migrating mailboxes between Cyrus instances?

2003-09-18 Thread Andrew McNamara
I've been googling around without much luck - I'm looking for suggestions
on migrating users between cyrus instances (for load balancing). I'm
thinking it should really be done at the IMAP level, rather than dicking
around inside the message store, but I'm worried there might be message
attributes than can't be readily copied via the imap protocol. Anyone
have any advice?

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/


Re: flock vs fnctl

2003-07-27 Thread Andrew McNamara
Thanks for the info... I switched over to using flock() and I can confirm
that it is now being used instead of fnctl().  The problem is that I still
see the same problem as before with regards to over 16500 instances of the
following:

stat(/var/imap/mailboxes.db, 0x00011FFF9C98) = 0
flock(6, LOCK_UN)   = 0
flock(6, LOCK_SH)   = 0
fstat(6, 0x00011FFF9D38)= 0

At least it uses flock() now :-)  It is interesting to see that this only
occurs after all the mkdir/copy/unlink operations have been completed.  I
don't know what it is trying to do, but it is quite painful... It adds at
least another minute to the operation of the IMAP RENAME command after
everything has been renamed!

This sounds like the same problem I complained about on the list in
the thread, subject Very slow deletion of user mailboxes?, posted 9th
July. I haven't had a chance to investigate further.

http://asg.web.cmu.edu/archive/message.php?mailbox=archive.info-cyrusmsg=23597

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/


Very slow deletion of user mailboxes?

2003-07-09 Thread Andrew McNamara
/test9772__foo_net_au) Err#17 EEXIST
18679:  close(9)= 0
18679:  munmap(0xFEFF, 203) = 0
18679:  close(10)   = 0
18679:  munmap(0xFED7, 16384)   = 0
18679:  close(11)   = 0
18679:  munmap(0xFED6, 16384)   = 0
18679:  close(13)   = 0
18679:  fcntl(5, F_SETLKW, 0xFFBEEA48)  = 0
18679:  fstat(5, 0xFFBEEB70)= 0
18679:  stat(/mail/vmb1/var/mailboxes.db, 0xFFBEEAE8) = 0
18679:  fcntl(5, F_SETLKW, 0xFFBEEB60)  = 0
18679:  unlink(/mail/vmb1/var/user/A/test9772__foo_net_au.seen) Err#2 ENOENT
18679:  fstat(8, 0xFFBEEBE8)= 0
18679:  time()  = 1057743572
18679:  getpid()= 18679 [7702]
18679:  putmsg(8, 0xFFBEE2A0, 0xFFBEE294, 0)= 0
18679:  open(/var/run/syslog_door, O_RDONLY)  = 9
18679:  door_info(9, 0xFFBEE1D8)= 0
18679:  getpid()= 18679 [7702]
18679:  door_call(9, 0xFFBEE1C0)= 0
18679:  close(9)= 0
18679:  unlink(/mail/vmb1/var/user/A/test9772__foo_net_au.sub) Err#2 ENOENT
18679:  open64(/mail/vmb1/var/quota/A/, O_RDONLY|O_NDELAY) = 9
18679:  fcntl(9, F_SETFD, 0x0001)   = 0
18679:  fstat64(9, 0xFFBEEBB8)  = 0
18679:  getdents64(9, 0x0014A288, 1048) = 960
18679:  getdents64(9, 0x0014A288, 1048) = 960
18679:  getdents64(9, 0x0014A288, 1048) = 960
18679:  getdents64(9, 0x0014A288, 1048) = 960
18679:  getdents64(9, 0x0014A288, 1048) = 928
18679:  unlink(/mail/vmb1/var/quota/A/user.test9772__foo_net_au) = 0
18679:  getdents64(9, 0x0014A288, 1048) = 1016
18679:  getdents64(9, 0x0014A288, 1048) = 976
18679:  getdents64(9, 0x0014A288, 1048) = 976
18679:  getdents64(9, 0x0014A288, 1048) = 968
18679:  getdents64(9, 0x0014A288, 1048) = 976
18679:  getdents64(9, 0x0014A288, 1048) = 992
18679:  getdents64(9, 0x0014A288, 1048) = 984
18679:  getdents64(9, 0x0014A288, 1048) = 960
18679:  getdents64(9, 0x0014A288, 1048) = 992
18679:  getdents64(9, 0x0014A288, 1048) = 984
18679:  getdents64(9, 0x0014A288, 1048) = 952
18679:  getdents64(9, 0x0014A288, 1048) = 968
18679:  getdents64(9, 0x0014A288, 1048) = 976
18679:  getdents64(9, 0x0014A288, 1048) = 984
18679:  getdents64(9, 0x0014A288, 1048) = 960
18679:  getdents64(9, 0x0014A288, 1048) = 968
18679:  getdents64(9, 0x0014A288, 1048) = 1008
18679:  getdents64(9, 0x0014A288, 1048) = 984
18679:  getdents64(9, 0x0014A288, 1048) = 952
18679:  getdents64(9, 0x0014A288, 1048) = 960
18679:  getdents64(9, 0x0014A288, 1048) = 992
18679:  getdents64(9, 0x0014A288, 1048) = 984
18679:  getdents64(9, 0x0014A288, 1048) = 1000
18679:  getdents64(9, 0x0014A288, 1048) = 968
18679:  getdents64(9, 0x0014A288, 1048) = 376
18679:  getdents64(9, 0x0014A288, 1048) = 0
18679:  close(9)= 0
18679:  fcntl(5, F_SETLKW, 0xFFBEE9C8)  = 0
18679:  fstat(5, 0xFFBEEAF0)= 0
18679:  stat(/mail/vmb1/var/mailboxes.db, 0xFFBEEA68) = 0
18679:  fcntl(5, F_SETLKW, 0xFFBEEAE0)  = 0
18679:  fcntl(5, F_SETLKW, 0xFFBEE9C8)  = 0
18679:  fstat(5, 0xFFBEEAF0)= 0
18679:  stat(/mail/vmb1/var/mailboxes.db, 0xFFBEEA68) = 0

[...last 4 lines repeated about 10k times, then...]

18679:  fcntl(5, F_SETLKW, 0xFFBEEAE0)  = 0
18679:  fcntl(5, F_SETLKW, 0xFFBEE9C8)  = 0
18679:  fstat(5, 0xFFBEEAF0)= 0
18679:  stat(/mail/vmb1/var/mailboxes.db, 0xFFBEEA68) = 0
18679:  fcntl(5, F_SETLKW, 0xFFBEEAE0)  = 0
18679:  open64(/mail/vmb1/var/sieve/t/test9772__foo_net_au, O_RDONLY|O_NDELAY) Err#2 
ENOENT
18679:  poll(0xFFB8, 1, 0)  = 0
18679:  write(1,  2   O K   C o m p l e t.., 16)  = 16
18679:  time()  = 1057743623
18679:  poll(0xFFB8, 1, 180)(sleeping...)

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/


Re: Very slow deletion of user mailboxes?

2003-07-09 Thread Andrew McNamara
We're having a problem where deletion of a user by the admin is taking
about 30 seconds. I've attached an abreviated truss of the imapd while
this is taking place. It shows a long string of:

A more detailed truss:

fstat(5, 0xFFBEEAF0)= 0
stat(/mail/vmb1/var/mailboxes.db, 0xFFBEEA68) = 0
fcntl(5, F_SETLKW, 0xFFBEEAE0)  = 0
typ=F_UNLCK  whence=SEEK_SET start=0 len=0 sys=0  pid=68157445
fcntl(5, F_SETLKW, 0xFFBEE9C8)  = 0
typ=F_RDLCK  whence=SEEK_SET start=0 len=0 sys=33152 pid=0

All the locks have start=0, len=0.

mailboxes.db is a bit under 2MB, and contains around 10k entries (as
counted by ctl_mboxlist -d | wc -l).

I wonder if the fact that there are 10k entries in mailboxes.db, and
10k iterations through the above loop is a coincidence? It would suggest
something pathological has happened to our skiplist, and it's devolving
to a linear search?

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/


Re: ctl_cyrusdb -r performance over NFS

2003-03-26 Thread Andrew McNamara
A SAN is just a different transport mechanism between the host and the 
drives -- the protocol is the same old SCSI that has been around for 
years.  That said, there is less interoperability than one would like at 
this point.

The basic problem with NFS is that either you violate standard Unix 
filesystem semantics or you have pitiful performance (by disabling 
client-side caching).  Add to that the idea that locking was an 
after-thought (the design goal of a stateless filesystem doesn't exactly 
fit with maintaining locks) and it is really a mess.

As long as you only have one machine writing to the data, you don't have 
to worry so much about broken filesystem semantics (which is why your 
Oracle instance works), but you still have lousy performance.

You assume I don't already know this... 8-)

BTW, I don't think it's violate unix semantics OR pitiful perfomance - 
the protocol is flawed in ways that make that violate unix semantics
AND pitiful perfomance. In particular, lost, out of order or replayed
requests are not fully addressed by the stateless design.

For what it's worth, we go to extraordinary lengths to ensure only one
host hits a given NFS volume at a time, we spend silly amounts of money to
keep the latency down and the bandwidth up, and we use the best quality
NFS implementations we can.

I'm not convined that SAN (where storage is a euphemism for disk) is
really the answer to anything. Network attached storage (where storage
is a euphemism for file server) is a far more convenient model. We just
need a better protocol. If only Plan9 had gained a critical mass... 8-)

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/


Core from lmtp, postfix 2.0.3 and Cyrus-SASL 2.1.12, Solaris 8

2003-02-20 Thread Andrew McNamara
I upgraded a previously working postfix from cyrus-sasl-2.1.7 to
cyrus-sasl-2.1.12, and while SASL works fine for smtpd, the lmtp process
dumps core within the SASL libraries. The resulting core isn't really
useful because I'm not getting any symbols resolved from the sasl lib
(dynamlic linking?):

#0  0xff36ff8c in ?? () from /opt/sasl/lib/libsasl2.so.2
#1  0xff36ff88 in ?? () from /opt/sasl/lib/libsasl2.so.2
#2  0xfef60fe0 in ?? () from /opt/sasl/plugins/liblogin.so.2
#3  0xff366438 in ?? () from /opt/sasl/lib/libsasl2.so.2
#4  0x1996c in lmtp_sasl_authenticate (state=0x7f9c0, why=0x84ac8)
at lmtp_sasl_glue.c:499
#5  0x19cb8 in lmtp_sasl_helo_login (state=0x7f9c0) at lmtp_sasl_proto.c:118
#6  0x1770c in lmtp_lhlo (state=0x7f9c0) at lmtp_proto.c:249
#7  0x16bf4 in deliver_message (request=0x82d08, unused_argv=0xffbefee0)
at lmtp.c:381
#8  0x16d28 in lmtp_service (client_stream=0x81c50, 
unused_service=0xffbeff74 tvmb1, argv=0xffbefee0) at lmtp.c:453
#9  0x19e9c in single_server_wakeup (fd=531536) at single_server.c:250
#10 0x1a00c in single_server_accept_local (unused_event=1, context=0x9 )
at single_server.c:292
#11 0x26ef4 in event_loop (delay=299008) at events.c:586
#12 0x1a8d0 in single_server_main (argc=7, argv=0xffbefec4, 
service=0x16cf0 lmtp_service) at single_server.c:639
#13 0x16f14 in main (argc=7, argv=0xffbefec4) at lmtp.c:542

Any pointers gratefully received (particularly suggestions on how to get
gdb to look at the libsasl2.so.2 symbols)?

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/



Re: Core from lmtp, postfix 2.0.3 and Cyrus-SASL 2.1.12, Solaris 8

2003-02-20 Thread Andrew McNamara
I upgraded a previously working postfix from cyrus-sasl-2.1.7 to
cyrus-sasl-2.1.12, and while SASL works fine for smtpd, the lmtp process
dumps core within the SASL libraries. The resulting core isn't really
useful because I'm not getting any symbols resolved from the sasl lib..

Sigh - SASL was picking up the plugins from the 2.1.7 build. Once the
install tree was cleaned and the 2.1.12 plugins put in the right place,
it's now working like a bought one.

The missing symbols in the sasl lib were being caused by a prehistoric
gdb build. A new build of gdb got three fatal internal errors while
starting, but gave me enough information to suspect the API had changed.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/



Re: Core from lmtp, postfix 2.0.3 and Cyrus-SASL 2.1.12, Solaris 8

2003-02-20 Thread Andrew McNamara
 The missing symbols in the sasl lib were being caused by a prehistoric
 gdb build. A new build of gdb got three fatal internal errors while
 starting, but gave me enough information to suspect the API had changed.

There shouldn't have been an API change from 2.1.7 to 2.1.12, what
functions had the change?

Note that I'm talking about the internal API between the plugins and the
sasl lib.  I didn't look closely at the code - when I began to suspect it
was my problem, I checked the date stamps on the plugins and went no
further investigating the core.

I've rebuilt various bits, so the core is no longer valid, but from memory,
the last three frames were a call from libsasl2 into liblogin (plugin) and
a call back into libsasl2 by the plugin. GDB couldn't resolve the liblogin
symbols for some reason (maybe it couldn't find the object file), but the
parameters made sense on their way in, but not on their way out.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/



Re: Cyrus IMAPd 2.1.10 Released

2002-11-17 Thread Andrew McNamara
I believe a set of plaintext documentation can be maintained with RCS,
CVS or SCCS without problems by a distanced dev team, while XSLT will
require proper usage by the author manuals etc...

Yep. The reality is it's not us who chose the doc tools, but the people
who actually update the doco.

Volunteering your favourite documentation tools isn't nearly as valuable
as volunteering your time to help keep the documentation up to date... 8-)

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/



Re: Updating /seen from concurrent sessions

2002-11-15 Thread Andrew McNamara
I don't use OE but I experienced the same (or similar) problem with 
mozilla: since it uses many concurrent connections to the server, seen 
messages came back as unseen various times, and it was very annoying.
Switching to skiplist almost solves the problem (at least it did for 
me): since I switched to skiplist I had seen messages come back as 
unseen only 2 or 3 times.

I suspect there is a bug in the flat-file seen implementation. Each
process opens the seen file and holds this file descriptor open. Then one
process wants to update the file. It does this by writing a new file,
and renaming it into place. But all the other processes still have the
now unlinked and out of date copy open.

With skiplist, this problem no longer occurs (the skiplist database
makes changes made by other processes visible immediately). However,
another problem remains: updates are defered for performance reasons. So
one session will update the seen list, and the other processes will not
see the change (unless they and the updating process execute certain
commands, such as NOOP).

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/



Re: Updating /seen from concurrent sessions

2002-11-14 Thread Andrew McNamara
Try using skiplist for the seen.db
It doesn't really solve the problem but it masks it well enough.

From my understanding, changing to skiplist really shouldn't change
the visible behavior at all. But I've been wrong before.

I'll try to test it here and let you know. My reading of the code suggests
it shouldn't change the specific problem I'm seeing.

What's the general feeling on the skiplist implementation used in
conjunction with Sun and NetApp's NFS (we're locked in to using this
combination for various reasons)? Would you be more or less likely to
trust it over db3?

Another question - it looks to me like I have to recompile to switch
database types - is this true? The code looks like it would be flexible
enough to allow a run-time config option to chose the method with very
little modification?

It would be possible to flush the seen state more often; it's just a
question of how often and when should other imapds look for it. 

If the imapd already can cope with asynchronous events, I would flush the
state after a second or two of inactivity from the client. Failing that,
I would probably flush the state before replying to the client (yes,
this would hurt performance, although probably not much, particularly
if we skip the fsync()).

But this just fixes the OE problem - Cyrus would still have a problem
(as far as I can see): all the other copies accessing that mailbox
will still have their old seen files open (maybe using skiplist fixes
this). The flat-file seen implementation needs to check to see if the
file has been renamed under it (and do what?).

To be honest, the flat file seen implementation is way more complicated
than I would have thought was worthwhile. My preference would be to
not hold the file open, and simply re-write the whole file each time we
updated it, renaming the replacement into place (to make the operation
atomic - this is also the only synchronous operation). My experience has
been that unix is quite happy doing naive things like this while the
file remains small (say less than 10k).

I implemented a Postfix map that works this way - for lookups, it simply
does a linear read/search of the file. For update, it writes a new file,
and moves it into place. Generally this performed much better than
more complex schemes such as the Sleepycat DB's - particularly when you
consider memory footprint (this was on a machine with about 100k users,
handling 10's of messages per second).

I've never actually seen this problem happen whenever I've fooled around
with OE so I've never looked at the code to figure out what to do.

I get the impression it's a specific OE usage pattern that triggers
it. I've had it described to me as send a mail, click the send/check
button, which sounds common enough to me. 

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/



Re: Cyrus IMAPd 2.1.10 Released

2002-11-14 Thread Andrew McNamara
I feel that moving back to only plaintext is a step backwards.  I don't
know much about SGML myself, so I'm not sure I'd want to be stuck
maintaining that, but it sounds interesting enough (and it would be nice
to have general tools for keeping the documentation formatted, instead of
worrying when htmlstrip would next break).

You could do worse than look at the Python documentation. The production
doco is current LaTeX with a bunch of custom macros. HTML, PDF, etc are
generated off the master LaTex markup. There is a background project to
use SGML (I think), but it's not there yet.

Our company (not me personally) looked at doco tools a while back and came
to the conclusion that LaTeX was still the best choice out of a bad lot -
SGML was the next closest, although the tools were still rather imature.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/



Re: Updating /seen from concurrent sessions

2002-11-14 Thread Andrew McNamara
 of stat() under Solaris - so rather than keep a file open,
and stat it periodically to see if it's changed under you, you can close
and reopen the file (resulting in simpler code, but similar performance).

However, updates can be an order of magnitude more frequent if we're going 
to write for every flag change. Cyrus is written with the expectation that 
you will have thousands of simultaneous clients working on tens or hundreds 
of thousands of mailboxes.

And an excellent design goal that is... 8-)

I'm guessing, but I suspect OE updates the \Seen flag each time it
downloads a message, and presumably this occurs each time a user selects
a message. So you may only see an update every couple of seconds from
each client - obviously that adds up.

BTW, there may be paid consulting oportunities for people with
demonstratable advanced Cyrus hacking skills in this project. If anyone
is interested, let me know.

 I implemented a Postfix map that works this way - for lookups, it simply
 does a linear read/search of the file. For update, it writes a new file,
 and moves it into place. Generally this performed much better than
 more complex schemes such as the Sleepycat DB's - particularly when you
 consider memory footprint (this was on a machine with about 100k users,
 handling 10's of messages per second).

It doesn't scale when there are frequent updates. That's why we have the 
database abstraction, so we can choose the file format that does the job 
most effectively. cyrusdb_flat does exactly this, and it works ok when you 
don't need frequent updates. Seen state has frequent updates.

Actually, it scaled better than initially expected - this map type
was used specifically for tables that changed very frequently (the
pop-before-smtp pre-auth mechanism being a case in point). The only
synchronous operation was the rename(). The lookup read()'s would have
been pulling the data from the buffer cache, and sequential searches
beat more complex schemes every time when the dataset is small (less
than 100kB was the figure we found when comparing to things like libdb).
The saving in resident set size was critical too - the machine had 4G
of RAM, and no more could be fitted.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/



Re: Updating /seen from concurrent sessions

2002-11-14 Thread Andrew McNamara
A lot of problems also result when people try to run the application on 
more than one computer hitting the same NFS server. But things that drive 
us application writers mad is the idea that rename() can return failure but 
have actually happened; and if you're trying to write a reliable 
application, you don't want to rely on the fact that the chance of this is 
minimized, since you know it's going to happen and you're going to be sorry.

That's certainly the NFS flaw that comes to mind. I happen to agree with
you that it's not enough to simply minimise the chances of something
untoward happening. 

I would hope it would work with a single server with multiple processes. 
But I really haven't thought about all the possibilities with NFS. (The 
return error and succeed problem is just one that springs to mind, and 
I've never audited the code thinking about that.)

Okay. Your comments are valued.

Great, now I need to do bookkeeping to do this. Plus on most Unix 
filesystems, rename() is a more expensive operation than 1 fsync() and 
probably even 2 fsync()s. And how am I suppose to programmatically 
determine whether or not a given version is valid?

Mmm. It was a half-baked idea that came from the observation that the
flat-file \Seen code was doing renames() anyway.

Linux ext2 has this metadata problem. ext3 and reiserfs are both suppose to 
force metadata to disk when fsync() is called, similiar to how softupdates 
on BSD, Veritas, or most other modern filesystems. I'm willing to bet that 
I've wasted more time than you have worrying about the semantics of fsync() 
on various Unix filesystems.

Quite possibly. I've certainly wasted enough time on them over the years.
It's hard to prove what a given O/S is doing is correct, even when you
have inside knowledge.

You need to do the stat() regardless if you want the latest data. By 
keeping the file open, you potentially amortize the cost of an open(), 
another fstat (find out the file descriptor of your open'd fd) and an 
mmap(). All of these have various different costs depending on your 
platform and your Unix.

Mmap is the killer - it often involves a lot of expensive setup within the
kernel. I'd tend to think that if you were using mmap() for read access to
the file, it probably should be modified in place, rather than renamed.
The flat-file \Seen implementation both mmap()'s and renames() and this
looks to me like the source of it's pain. But then you need some sort of
cheap synchronization scheme.

BTW, have you looked at Andrew Tridgell's Trivial Database? It uses mmaped
files and spin-locks to achieve good write performance, although I don't
think resilience in the face of crashes was a high priority. However the
architecture-dependent spin lock code may be handy if you ever decide
to follow this route.

You have one database and weren't fsync()ing the data. Cyrus has thousands 
of active databases and cares about the reliability of the data.

As it should.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/



Re: Updating /seen from concurrent sessions

2002-11-14 Thread Andrew McNamara
BTW, have you looked at Andrew Tridgell's Trivial Database? It uses mmaped
files and spin-locks to achieve good write performance, although I don't
think resilience in the face of crashes was a high priority. However the
architecture-dependent spin lock code may be handy if you ever decide
to follow this route.

I intended to include this URL:

http://sourceforge.net/projects/tdb/

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/



Updating /seen from concurrent sessions

2002-11-13 Thread Andrew McNamara
Outlook Express users are complaining that their message \Seen status is
lost. Snooping traffic, I see that OE is opening a second connection. In
duplicating OE's behaviour by hand, I'm finding that I'm more confused
than ever about Cyrus's behaviour. We're using Cyrus 2.1.9 on NetApp
mounted disk.

First up, a description of what OE is doing (in all it's stupidity) - when
you read a message, it uses BODY.PEEK so as not to update the \Seen flag.
It then apparently closes that connection, opens another, and sets the
\Seen status with xxx UID STORE 49 +FLAGS.SILENT (\Seen). It then
forgets that connection, and opens a new connection and starts using
that for accessing the mailbox.

At the Cyrus end, this is useless - the Cyrus that accepts the \Seen
update, but holds off updating the .seen file (presumably as a performance
optimisation). So the second session doesn't see it until OE eventually
closes the first connection. Even sending a NOOP on the \Seen-updating
thread would have been enough to trigger Cyrus into updating the .seen
file. Cyrus probably needs to update the .seen file if no other activity
occurs for a second or two.

But - there appears to be a second problem: even if OE had sent a NOOP
(or Cyrus decided to write the file), the second session doesn't see
the update - it's still holding open an old .seen file, now unlinked
(by the rename() that the \Seen-updating thread did). Cyrus needs to
periodically stat() the .seen file and compare it's inode number to that
of the file it holds open - if the differ, it needs to reopen the file).

The RFC isn't entirely helpful on who's at fault here - section 5.5 talks
about multiple commands being allowed, provided ambiguity doesn't result.
In this case, provided OE waits for the STORE command to complete, I guess
it's within it's rights to check the message status from another session.
It's largely irrelevant anyway - OE is out there, and getting it fixed
would be a hiding to nothing.

I realise this is an old known problem, but I've spent some time searching
list archives, and other sources looking for an answer. Any help anyone
can provide will be gratefully received.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/