Re: MESSAGE quota resource implemention
On Fri, Sep 02, 2011 at 09:37:24AM +1000, Rob Mueller wrote: Actually, really I'd like to create a new UNIQUEID - and store all the files in paths based on uniqueid rather than on folder name. This would not only mean renames could be fast and atomic, but that delayed delete would be fast. The downside is a more opaque filesystem layout. Oh, another upside - file path limitations don't exist so much any more. While UNIQUEID is nice, the opaqueness is annoying. Personally, I liked the idea that we talked about a while back which I think was: $spooldir/a/user/aardvark/user.aardvark/ $spooldir/a/user/aardvark/user.aardvark.drafts/ $spooldir/a/user/aardvark/user.aardvark.trash/ $spooldir/a/user/aardvark/user.aardvark.foo/ $spooldir/a/user/aardvark/user.aardvark.foo.bar/ $spooldir/a/user/aardvark/user.aardvark.abc.xyz/ I still have a patch somewhere that does that. So you end up with every folder for a user in one dir. This solves the current messy handling of sub-dirs (eg currently you have to create the intermediate dir /abc/ even though there's no entry in mailboxes.db for it), and makes renaming any folder very cheap still (because you can do it with a single dir rename, rather than having to move each message file), but you don't go completely opaque. It does give you a 256 character folder tree limit on many operating systems. Delete handling is still easier, just rename to DELETED.$oldname.$UNIQUEID or something like that, because it's cheap to rename anyway. Sorry, make that 239, once you make space for timestamps and dots. Of course it means re-organising folder layout for every installation out there, but maybe we need to bump major versions anyway, cyrus 3.0 here we come :) tools/rehash - I have one of those that can handle this plus all the other existing formats as well. It's on a git branch somewhere. Bron.
Re: MESSAGE quota resource implemention
On 01/09/11 22:22, Bron Gondwana wrote: On Thu, Sep 01, 2011 at 01:27:00PM +0200, Julien Coloos wrote: Le 01/09/2011 03:03, Greg Banks a écrit : But more generally, the update the quotaroot is atomic-safe, because the mailbox doesn't add/remove things from the quotaroot racily - but quota -f IS racy. The only way around that would be a dual pass mark and collect thing, where you marked each mailbox as we're re-calculating, so don't update the quota usage in the first sweep, then came back and removed the mark as you read the value. Tricky, but doable. The downside is that a crash part way through can leave you in a broken state. But that's true of everything. But I agree that in case of underflow detection throwing a warning in syslog might help draw the attention when logs are analysed. when. haha. (maybe at a few sites that care... but for the vast majority of sites, if you're depending on them reading syslog you've already lost. Software that understands that and is robust in the face of errors is much nicer for the poor suckers on the receiving end of all this) If the software was robust, underflow would not happen and we would not need to test for it and handle it. Thus the log messages are not operational messages intended for the sysadmin, but warnings about internal Cyrus problems intended for Cyrus developers, and syslog is a suitable place for them. How's about this for a strategy? When a quota resource is first enabled, (i.e. the limit is changed from UNLIMITED to some finite value), the usage is stored as some special value which I'll call INDETERMINATE. Changing a mailbox' quotaroot to an existing quotaroot, resets all the quotaroot's usages to INDETERMINATE. Usage deltas applied to the INDETERMINATE value silently yield INDETERMINATE back. Usage deltas which would underflow a finite (i.e. not INDETERMINATE) usage value are clamped with a warning about an internal consistency problem. A regularly run process such as cyr_expire finds quotaroots which have one or more INDETERMINATE usages and runs quota -f on them. In this model, INDETERMINATE value acts like your dual pass mark. In the 1st hunk in cmd_append(), at this point in the code I believe totalsize = 0, so you could easily pass 0 for the new last argument as well. Yes at this point totalsize is still 0. The current code, which handles MULTIAPPEND, does a preliminary check to see if mailbox is not overquota, then receives all messages, and finally checks that all messages can fit in the mailbox. Since we know for sure at least one message is coming, I though it could be a good idea to early-check that at least one more message would fit in the mailbox before receiving anything. Otherwise we are staging new file only to trash it at the end (when overquota). As far as I can see the only point of that first append_check() call is to fail early in the case of a permission fail. So actually I wonder if the code could be updated to check (LITERAL parameter) that each message about to be received would fit in the mailbox, instead of checking only once at the end ? Literals, and binary, and urls... I do think there's an argument to made for moving the maintenance of the totalsize parameter into the struct appendstate, and once you do that it's possible to do quota/permission checks at finer grained points. But why bother? Why slow down the common case of not exceeding quota? There are some bugs about this already. In particularl the opposite case, checking that we actually want to append something before aborting a sieve delivery - because it may be discarded or redirected or even duplicate suppressed anyway. Something to keep in mind. I'm thinking that there's now three places in the code which take a mailbox* and fill out an array of quota diffs, interpreting the contents of the struct mailbox. That should really be centralised. I'm not sure to see what you have in mind here. Are you talking about the places where the QUOTA_STORAGE and QUOTA_MESSAGE entries of the quota diff array are computed relatively to the 'quota_mailbox_used' and 'exists' index fields of the struct mailbox ? Yes. One place please :) Ideally I'd like to absorb more of the quota stuff into mailbox.c. Greg and I have some debate about this - how much is too much for that file to be doing. Probably it should be abstracted into a couple of layers of stuff - but I really do like the consistency of having just a couple of function calls: mailbox_append_index_record; and mailbox_update_index_record which do all the consistency checking and counter updating inside. Plus of course a mailbox_check_quota thing that takes a set of quota checks to do and sees if there will be space for the planned changes! After implementing the X-ANNOTATION-STORAGE quota, I'm almost completely convinced I got it very wrong, and should have left all the quota updating in
Re: MESSAGE quota resource implemention
On Fri, Sep 02, 2011 at 07:36:20PM +1000, Greg Banks wrote: If the software was robust, underflow would not happen and we would not need to test for it and handle it. Thus the log messages are not operational messages intended for the sysadmin, but warnings about internal Cyrus problems intended for Cyrus developers, and syslog is a suitable place for them. In theory, yes - assuming you can keep blackbox control over all the filesystems that Cyrus is operating over, and the sysadmin never restores from backup or otherwise screws with any of the underlying files. How's about this for a strategy? When a quota resource is first enabled, (i.e. the limit is changed from UNLIMITED to some finite value), the usage is stored as some special value which I'll call INDETERMINATE. What about 'getquota'? I don't support any solution which leaves getquota returning bogus values or failing to respond. That's just icky and confusing. I don't think you can avoid two passes, and I don't even think you can avoid two values during if you really want to be good about it. Anyway - moving a new folder into a quotaroot is NOT racy. You just need to read quota_mailbox_used on the mailbox, then lock the old quota root, subtract N bytes and unlock it - then lock the NEW quota root, add N bytes and unlock it again. No problem. The only issue is updating WITHOUT changing the quotaroot, which is an issue because a particular mailbox doesn't know if it's already been counted in the new quota value or not, so if it should be updating the value. There are pure ways to do this, that guarantee consistency. I think the best way is probably some sort of A/B thing, where you label the quotaroot as A or B in the mailbox - AND in the quota root. So the initial state looks like this: ROOT: A A: $usage B: INVALID When you want to run a quota -f you set 'B' to zero, and then run the update logic over all mailboxes, updating B with the value as you go, and setting the quotaroot in the mailbox to be in state 'B', so it also updates B. Any mailbox in state 'B' will update both A AND B, because the root is still in state A. Mailboxes in state 'A' will only update 'A', because they match the root. When you have finished quota -f, both values are being updated simultaneously by all mailboxes. You also have two fields which you can compare, and a guarantee that they were both atomically updated, so if they're not the same then there was definitely corruption, not just a race condition. So you can report that. Then you update the quota root to say 'ROOT: B', and 'A' invalid. Or even just A: zero. If anything continues to update the wrong field then you also have corruption (probably a mailbox outside the quotaroot pointing to it, which is pretty silly) That's a real, robust solution. But it's pretty heavy engineering. As far as I can see the only point of that first append_check() call is to fail early in the case of a permission fail. It's nice to fail before the client starts uploading. Failing any later than that is kinda pointless, because you're not saving the client bandwidth - which may matter. The disk IO is unlikely to matter to the server, since it's a rare case. convinced I got it very wrong, and should have left all the quota updating in mailbox_commit_quota(). I was trying hard to avoid adding a field to the index header to track the storage used by all the annotations for the mailbox and for messages in the mailbox; but I'm really not happy with the results :( Well, we're not committed to keeping it that way - it's not as if it's even in the wild except for some really early adopters of the master branch, who deserve whatever pain they get (mostly, that's us - and we know enough to be able to clean up any mess) Bron.
Re: SASL lib 2.1.24
Hi Torsten, Sorry, I don't remember if I replied to that. Torsten Schlabach wrote: [...] 3. Pre-sort the list of currently 26 patches in the Debian package for SASL lib 2.1.23 and identify those who are NOT Debian adjustments and provide them for review (what format?) diff is fine. I will happily work with Debian developers to get the 2.1.24 packaged so it may eventually show up on Debian backports for 6.0 as well as Ubuntu Natty backports and be included in Debian 7.0 / Ubuntu 11.10 hopefully. But this process can only start after the Cyrus project will have released 2.1.24. Before that the Debian / Ubuntu devleopers will not be interested in the subject. New upstream release is usually the trigger they need. But prior to releasing 2.1.24 it may indeed make sense to melt down the list of Debian patches. I guess they have been made for a reason. Are there any other open bugs / features / patches which would need to make it into 2.1.24? I've just committed most of what I wanted to see in the next release to the HEAD. There are 2-3 minor build related things outstanding. I think there was some MaxOS stuff ... Yes. Is anyone keeping a list? Not very formally. I have my private wish/todo list. I can share it with people if there is interest.
Re: ANN: BROWSER-ID a new SASL Authentication mechanism under development
Hi Austin, Austin King wrote: At Mozilla, we're experimenting with a new SASL plugin for BrowserID[1]. BrowserID is a decentralized identity system that makes it possible for users to prove ownership of email addresses in a secure manner, without requiring per-site passwords[2]. Is there a SASL-related spec for this, or at least an example of the SASL exchange? I'm looking for feedback on implementing a SASL authentication mechanism. I've got roughly the happy case working with pluginviewer and OpenLDAP. Don protective eye-ware and visit: https://github.com/ozten/sasl-browserid Any feedback is appreciated, but specifically: * Code review / contributions * Preferred distribution channel * Licensing * Enterprise or Academic Use Cases * Next steps and Timing Once this plugin is production quality, what is the best way to distribute it? Should we try to get it upstream into Cyrus SASL, downstream it into OS distributions, or just provide it for download from a website? My personal preferences are to try to get it into the upstream. The next step down is a patch in contrib. Separate download is of course always an option. I will need to have a look at the build dependencies. Complicated dependencies are not a showstopper, but at least we should eliminate circular dependencies (if any). Licensing - is there any preferred licensing for the code? This partially depends on the target distribution channel. We want to balance this decision with input from your community. plugins_common is currently a dependency. We'll re-write that to get it out of the repo (unless it's not an issue). I think CMU BSD-style license is the best. Then it makes your code compatible with Cyrus SASL. Use Cases - Is this plugin worth building? We're finding we need it for our LDAP directories which are used from web applications. Authentication using SASL seems more secure than using proxy authentication. BrowserID is an awkward auth mechanism in that it originates from JavaScript in web content. Are there other valid user cases (webmail?) where this plugin could see some real world use? Perhaps webmail...? Next Steps - I see centrally registering auth mechanisms, RFCs for mechanism communication, etc are mentioned. Is this still common practice? Very much so. I can help you with this as well, as I've written some SASL-related RFCs. Other feedback can come in bugs [3], pull requests, etc thanks, ozten [1] https://browserid.org [2] http://lloyd.io/how-browserid-works [3] https://github.com/ozten/sasl-browserid/issues Best Regards, Alexey -- Internet Messaging Team Lead, http://www.isode.com JID: same as my email address twitter: aamelnikov
Re: ANN: BROWSER-ID a new SASL Authentication mechanism under development
On 09/02/2011 05:17 AM, Alexey Melnikov wrote: Hi Austin, Austin King wrote: At Mozilla, we're experimenting with a new SASL plugin for BrowserID[1]. BrowserID is a decentralized identity system that makes it possible for users to prove ownership of email addresses in a secure manner, without requiring per-site passwords[2]. Is there a SASL-related spec for this, or at least an example of the SASL exchange? I can definitely use your help! https://github.com/ozten/sasl-browserid/blob/master/docs/sasl-browserid-design.md I'll be documenting this better over time and just started talking to our security team about a architecture review. Once this plugin is production quality, what is the best way to distribute it? Should we try to get it upstream into Cyrus SASL, downstream it into OS distributions, or just provide it for download from a website? My personal preferences are to try to get it into the upstream. The next step down is a patch in contrib. Separate download is of course always an option. Great, eventually having source in Cyrus SASL tree makes a lot of sense. I will need to have a look at the build dependencies. Complicated dependencies are not a showstopper, but at least we should eliminate circular dependencies (if any). The plugin depends on curl and yajl 2 [1] for the browserid.org verification call. The plugin also depends on mysql to maintain a session cache. This is useful for web oriented uses of the plugin. I'm not sure there are any long-lived connection use cases, but if so they would not need a session, so mysql is optional. The session backend could be generalized to be like auxprop (other backends besides mysql), but I'll only build out one backend in the short term. Next Steps - I see centrally registering auth mechanisms, RFCs for mechanism communication, etc are mentioned. Is this still common practice? Very much so. I can help you with this as well, as I've written some SASL-related RFCs. Again, much appreciated. If you like IRC, we're in ircs://irc.mozilla.org/#identity ozten is my nick. thanks, Austin [1] http://lloyd.github.com/yajl/