Re: MESSAGE quota resource implemention

2011-09-02 Thread Bron Gondwana
On Fri, Sep 02, 2011 at 09:37:24AM +1000, Rob Mueller wrote:
 
 Actually, really I'd like to create a new UNIQUEID - and store
 all the files in paths based on uniqueid rather than on folder
 name.  This would not only mean renames could be fast and
 atomic, but that delayed delete would be fast.  The downside is
 a more opaque filesystem layout.  Oh, another upside - file path
 limitations don't exist so much any more.
 
 While UNIQUEID is nice, the opaqueness is annoying. Personally, I
 liked the idea that we talked about a while back which I think was:
 
 $spooldir/a/user/aardvark/user.aardvark/
 $spooldir/a/user/aardvark/user.aardvark.drafts/
 $spooldir/a/user/aardvark/user.aardvark.trash/
 $spooldir/a/user/aardvark/user.aardvark.foo/
 $spooldir/a/user/aardvark/user.aardvark.foo.bar/
 $spooldir/a/user/aardvark/user.aardvark.abc.xyz/

I still have a patch somewhere that does that.

 So you end up with every folder for a user in one dir. This solves
 the current messy handling of sub-dirs (eg currently you have to
 create the intermediate dir /abc/ even though there's no entry in
 mailboxes.db for it), and makes renaming any folder very cheap still
 (because you can do it with a single dir rename, rather than having
 to move each message file), but you don't go completely opaque.

It does give you a 256 character folder tree limit on many operating
systems.

 Delete handling is still easier, just rename to
 DELETED.$oldname.$UNIQUEID or something like that, because it's
 cheap to rename anyway.

Sorry, make that 239, once you make space for timestamps and dots.

 Of course it means re-organising folder layout for every
 installation out there, but maybe we need to bump major versions
 anyway, cyrus 3.0 here we come :)

tools/rehash - I have one of those that can handle this plus all the
other existing formats as well.  It's on a git branch somewhere.

Bron.


Re: MESSAGE quota resource implemention

2011-09-02 Thread Greg Banks
On 01/09/11 22:22, Bron Gondwana wrote:
 On Thu, Sep 01, 2011 at 01:27:00PM +0200, Julien Coloos wrote:
 Le 01/09/2011 03:03, Greg Banks a écrit :

 
 But more generally, the update the quotaroot is atomic-safe,
 because the mailbox doesn't add/remove things from the quotaroot
 racily - but quota -f IS racy.  The only way around that would
 be a dual pass mark and collect thing, where you marked each
 mailbox as we're re-calculating, so don't update the quota usage
 in the first sweep, then came back and removed the mark as you
 read the value.  Tricky, but doable.  The downside is that a
 crash part way through can leave you in a broken state.  But
 that's true of everything.
 
 But I agree that in case of underflow detection throwing a warning
 in syslog might help draw the attention when logs are analysed.
 
 when.  haha.
 
 (maybe at a few sites that care... but for the vast majority of
  sites, if you're depending on them reading syslog you've already
  lost.  Software that understands that and is robust in the face
  of errors is much nicer for the poor suckers on the receiving end
  of all this)

If the software was robust, underflow would not happen and we would not
need to test for it and handle it.  Thus the log messages are not
operational messages intended for the sysadmin, but warnings about
internal Cyrus problems intended for Cyrus developers, and syslog is a
suitable place for them.

How's about this for a strategy?

When a quota resource is first enabled, (i.e. the limit is changed from
UNLIMITED to some finite value), the usage is stored as some special
value which I'll call INDETERMINATE.

Changing a mailbox' quotaroot to an existing quotaroot, resets all the
quotaroot's usages to INDETERMINATE.

Usage deltas applied to the INDETERMINATE value silently yield
INDETERMINATE back.

Usage deltas which would underflow a finite (i.e. not INDETERMINATE)
usage value are clamped with a warning about an internal consistency
problem.

A regularly run process such as cyr_expire finds quotaroots which have
one or more INDETERMINATE usages and runs quota -f on them.

In this model, INDETERMINATE value acts like your dual pass mark.


 
 In the 1st hunk in cmd_append(), at this point in the code I
 believe totalsize = 0, so you could easily pass 0 for the new last
 argument as well.
 Yes at this point totalsize is still 0.
 The current code, which handles MULTIAPPEND, does a preliminary
 check to see if mailbox is not overquota, then receives all
 messages, and finally checks that all messages can fit in the
 mailbox.
 Since we know for sure at least one message is coming, I though it
 could be a good idea to early-check that at least one more message
 would fit in the mailbox before receiving anything. Otherwise we are
 staging new file only to trash it at the end (when overquota).

As far as I can see the only point of that first append_check() call is
to fail early in the case of a permission fail.

 So
 actually I wonder if the code could be updated to check (LITERAL
 parameter) that each message about to be received would fit in the
 mailbox, instead of checking only once at the end ?

Literals, and binary, and urls...

I do think there's an argument to made for moving the maintenance of the
totalsize parameter into the struct appendstate, and once you do that
it's possible to do quota/permission checks at finer grained points.
But why bother?  Why slow down the common case of not exceeding quota?

 
 There are some bugs about this already.  In particularl the opposite
 case, checking that we actually want to append something before
 aborting a sieve delivery - because it may be discarded or redirected
 or even duplicate suppressed anyway.  Something to keep in mind.
 

 
 I'm thinking that there's now three places in the code which take
 a mailbox* and fill out an array of quota diffs, interpreting the
 contents of the struct mailbox.  That should really be
 centralised.
 I'm not sure to see what you have in mind here.
 Are you talking about the places where the QUOTA_STORAGE and
 QUOTA_MESSAGE entries of the quota diff array are computed
 relatively to the 'quota_mailbox_used' and 'exists' index fields of
 the struct mailbox ? 

Yes.


 One place please :)  Ideally I'd like to absorb more of the quota
 stuff into mailbox.c.  Greg and I have some debate about this - how
 much is too much for that file to be doing.  Probably it should be
 abstracted into a couple of layers of stuff - but I really do like
 the consistency of having just a couple of function calls:
 
 mailbox_append_index_record; and
 mailbox_update_index_record
 
 which do all the consistency checking and counter updating inside.
 Plus of course a mailbox_check_quota thing that takes a set of
 quota checks to do and sees if there will be space for the
 planned changes!

After implementing the X-ANNOTATION-STORAGE quota, I'm almost completely
convinced I got it very wrong, and should have left all the quota
updating in 

Re: MESSAGE quota resource implemention

2011-09-02 Thread Bron Gondwana
On Fri, Sep 02, 2011 at 07:36:20PM +1000, Greg Banks wrote:
 If the software was robust, underflow would not happen and we would not
 need to test for it and handle it.  Thus the log messages are not
 operational messages intended for the sysadmin, but warnings about
 internal Cyrus problems intended for Cyrus developers, and syslog is a
 suitable place for them.

In theory, yes - assuming you can keep blackbox control over all the
filesystems that Cyrus is operating over, and the sysadmin never restores
from backup or otherwise screws with any of the underlying files.

 How's about this for a strategy?
 
 When a quota resource is first enabled, (i.e. the limit is changed from
 UNLIMITED to some finite value), the usage is stored as some special
 value which I'll call INDETERMINATE.

What about 'getquota'?  I don't support any solution which leaves getquota
returning bogus values or failing to respond.  That's just icky and
confusing.

I don't think you can avoid two passes, and I don't even think you can
avoid two values during if you really want to be good about it.

Anyway - moving a new folder into a quotaroot is NOT racy.  You just need
to read quota_mailbox_used on the mailbox, then lock the old quota root,
subtract N bytes and unlock it - then lock the NEW quota root, add N bytes
and unlock it again.  No problem.

The only issue is updating WITHOUT changing the quotaroot, which is an
issue because a particular mailbox doesn't know if it's already been counted
in the new quota value or not, so if it should be updating the value.

There are pure ways to do this, that guarantee consistency.  I think the best
way is probably some sort of A/B thing, where you label the quotaroot as A
or B in the mailbox - AND in the quota root.  So the initial state looks like
this:

ROOT: A
A: $usage
B: INVALID

When you want to run a quota -f you set 'B' to zero, and then run the update
logic over all mailboxes, updating B with the value as you go, and setting
the quotaroot in the mailbox to be in state 'B', so it also updates B.
Any mailbox in state 'B' will update both A AND B, because the root is still
in state A.  Mailboxes in state 'A' will only update 'A', because they match
the root.

When you have finished quota -f, both values are being updated simultaneously
by all mailboxes.  You also have two fields which you can compare, and a
guarantee that they were both atomically updated, so if they're not the same
then there was definitely corruption, not just a race condition.  So you can
report that.

Then you update the quota root to say 'ROOT: B', and 'A' invalid.  Or even
just A: zero.  If anything continues to update the wrong field then you
also have corruption (probably a mailbox outside the quotaroot pointing to
it, which is pretty silly)

That's a real, robust solution.  But it's pretty heavy engineering.

 As far as I can see the only point of that first append_check() call is
 to fail early in the case of a permission fail.

It's nice to fail before the client starts uploading.  Failing any later
than that is kinda pointless, because you're not saving the client
bandwidth - which may matter.  The disk IO is unlikely to matter to the
server, since it's a rare case.

 convinced I got it very wrong, and should have left all the quota
 updating in mailbox_commit_quota().  I was trying hard to avoid adding a
 field to the index header to track the storage used by all the
 annotations for the mailbox and for messages in the mailbox; but I'm
 really not happy with the results :(

Well, we're not committed to keeping it that way - it's not as if it's
even in the wild except for some really early adopters of the master
branch, who deserve whatever pain they get (mostly, that's us - and we
know enough to be able to clean up any mess)

Bron.


Re: SASL lib 2.1.24

2011-09-02 Thread Alexey Melnikov

Hi Torsten,
Sorry, I don't remember if I replied to that.

Torsten Schlabach wrote:
[...]

3. Pre-sort the list of currently 26 patches in the Debian package for SASL lib 2.1.23 and identify those who are NOT Debian adjustments and provide them for review (what format?)  


diff is fine.


I will happily work with Debian developers to get the 2.1.24 packaged so it may 
eventually show up on Debian backports for 6.0 as well as Ubuntu Natty 
backports and be included in Debian 7.0 / Ubuntu 11.10 hopefully.

But this process can only start after the Cyrus project will have released 2.1.24. Before 
that the Debian / Ubuntu devleopers will not be interested in the subject. New 
upstream release is usually the trigger they need.

But prior to releasing 2.1.24 it may indeed make sense to melt down the list of 
Debian patches. I guess they have been made for a reason.

Are there any other open bugs / features / patches which would need to make it 
into 2.1.24?

I've just committed most of what I wanted to see in the next release to 
the HEAD. There are 2-3 minor build related things outstanding.



I think there was some MaxOS stuff ...


Yes.


Is anyone keeping a list?

Not very formally. I have my private wish/todo list. I can share it with 
people if there is interest.




Re: ANN: BROWSER-ID a new SASL Authentication mechanism under development

2011-09-02 Thread Alexey Melnikov

Hi Austin,

Austin King wrote:


At Mozilla, we're experimenting with a new SASL plugin for BrowserID[1].

BrowserID is a decentralized identity system that makes it possible
for users to prove ownership of email addresses in a secure manner,
without requiring per-site passwords[2].


Is there a SASL-related spec for this, or at least an example of the 
SASL exchange?



I'm looking for feedback on implementing a SASL authentication mechanism.
I've got roughly the happy case working with pluginviewer and OpenLDAP.

Don protective eye-ware and visit:
https://github.com/ozten/sasl-browserid

Any feedback is appreciated, but specifically:
* Code review / contributions
* Preferred distribution channel
* Licensing
* Enterprise or Academic Use Cases
* Next steps and Timing

Once this plugin is production quality, what is the best way to 
distribute it? Should

we try to get it upstream into Cyrus SASL,



downstream it into OS distributions, or
just provide it for download from a website?


My personal preferences are to try to get it into the upstream. The next 
step down is a patch in contrib. Separate download is of course always 
an option.


I will need to have a look at the build dependencies. Complicated 
dependencies are not a showstopper, but at least we should eliminate 
circular dependencies (if any).


Licensing - is there any preferred licensing for the code? This 
partially depends on
the target distribution channel. We want to balance this decision with 
input from
your community. plugins_common is currently a dependency. We'll 
re-write that

to get it out of the repo (unless it's not an issue).


I think CMU BSD-style license is the best. Then it makes your code 
compatible with Cyrus SASL.


Use Cases - Is this plugin worth building? We're finding we need it 
for our LDAP
directories which are used from web applications. Authentication using 
SASL
seems more secure than using proxy authentication. BrowserID is an 
awkward
auth mechanism in that it originates from JavaScript in web content. 
Are there other
valid user cases (webmail?) where this plugin could see some real 
world use? Perhaps

webmail...?

Next Steps - I see centrally registering auth mechanisms, RFCs for 
mechanism communication,

etc are mentioned. Is this still common practice?


Very much so. I can help you with this as well, as I've written some 
SASL-related RFCs.



Other feedback can come in bugs [3], pull requests, etc

thanks,
ozten

[1] https://browserid.org
[2] http://lloyd.io/how-browserid-works
[3] https://github.com/ozten/sasl-browserid/issues 


Best Regards,
Alexey

--
Internet Messaging Team Lead, http://www.isode.com
JID: same as my email address
twitter: aamelnikov



Re: ANN: BROWSER-ID a new SASL Authentication mechanism under development

2011-09-02 Thread Austin King

On 09/02/2011 05:17 AM, Alexey Melnikov wrote:

Hi Austin,

Austin King wrote:


At Mozilla, we're experimenting with a new SASL plugin for BrowserID[1].

BrowserID is a decentralized identity system that makes it possible
for users to prove ownership of email addresses in a secure manner,
without requiring per-site passwords[2].


Is there a SASL-related spec for this, or at least an example of the 
SASL exchange?

I can definitely use your help!
https://github.com/ozten/sasl-browserid/blob/master/docs/sasl-browserid-design.md

I'll be documenting this better over time and just started talking to 
our security team about

a architecture review.



Once this plugin is production quality, what is the best way to 
distribute it? Should

we try to get it upstream into Cyrus SASL,



downstream it into OS distributions, or
just provide it for download from a website?


My personal preferences are to try to get it into the upstream. The 
next step down is a patch in contrib. Separate download is of course 
always an option.

Great, eventually having source in Cyrus SASL tree makes a lot of sense.


I will need to have a look at the build dependencies. Complicated 
dependencies are not a showstopper, but at least we should eliminate 
circular dependencies (if any).
The plugin depends on curl and yajl 2 [1] for the browserid.org 
verification call.
The plugin also depends on mysql to maintain a session cache. This is 
useful for web oriented uses of the plugin.


I'm not sure there are any long-lived connection use cases, but if so 
they would not need a session, so mysql is optional.


The session backend could be generalized to be like auxprop (other 
backends besides mysql), but I'll only build out one backend in the 
short term.


Next Steps - I see centrally registering auth mechanisms, RFCs for 
mechanism communication,

etc are mentioned. Is this still common practice?


Very much so. I can help you with this as well, as I've written some 
SASL-related RFCs.
Again, much appreciated. If you like IRC, we're in 
ircs://irc.mozilla.org/#identity

ozten is my nick.

thanks,
Austin

[1] http://lloyd.github.com/yajl/