Extended attributes Linux interface
Hello, There were previously discussions, started by Emmanuel, concerning the extended attributes, including on the various available APIs and which to support etc. At the time I read them I was catching up with a lot of mail and had written down a small note about a potential security implication that crossed my mind if we used the Linux interface. Perhaps someone can (dis)confirm: Strings are used instead of IDs to distinguish the class of an extended attribute, i.e. "system" etc. My question is then: must those be limited to ASCII or can they support arbitrary bytes, or UTF-8? If unicode strings are possible, I think that it'd be possible for a string to look like "system" but to actually be something else to an auditing administrator, unless all tools clearly showed those non-ASCII bytes in an escaped format. Of course, if the kernel wanted to match "system", it wouldn't match then, but the fact that it may _appear_ to be correct to an admin may introduce a security issue if extended permissions were ever implemented on top of that system. Perhaps that this problem could also exist with the key names in case they're part of permission descriptions? Thanks, -- Matt
Re: fs-independent quotas
Ignatios Souvatzis writes: > On Wed, Oct 19, 2011 at 06:09:27PM +, David Holland wrote: > > support to other filesystems (tempfs, perhaps v7fs) or even add other > > filesystems that have or may have their own native quota handling > > (zfs, Hammer, you name it). > > zfs - does it really have quota? Yes, it does, as of zfs filesystem V4. http://hub.opensolaris.org/bin/view/Community+Group+zfs/faq#HCanIsetquotasonZFSfilesystems3F
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 06:54:54PM +0200, Manuel Bouyer wrote: > On Thu, Oct 20, 2011 at 04:39:21PM +, David Holland wrote: > > > We're talking a few MB of ram here, isn't it ? the kernel can certainly > > > allocate this without troubles (other subsystems do). > > > > The proplib'd and XMLified complete dump for 50,000 users will > > probably make a blob of between 10 and 20 MB. (Note: this is an > > estimate; I haven't checked the size by trying it. It might be larger. > > I'd be surprised if it were much smaller.) > > I tested with a few 10s or users; my estimate is about 35MB for 50k users. > > > > > I don't see why it's desirable to manifest such large objects when > > it's easily avoidable. > > We don't agree on "easily". FYI: I just went around, and around, and around on this with the configuration framework a proprietary kernel subsystem. If you just take the position that _any_ write to _any_ part of the data invalidates all cursors it is not so bad. The user application has to be coded to deal with that, but it keeps the complexity out of the kernel. Thor
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 09:41:25PM +0200, Manuel Bouyer wrote: > > > > I don't see that you can do anything with an unmounted filesystem in > > repquota. Unless the quota files for the filesystem are on a different > > (and mounted) volume, it won't be able to read them, and it doesn't > > have any code to mount the filesystem temporarily to do that. > > Hum, you're right, it seems I broke this. I'll have a look at fixing > it, it's a bug. I just remembered, there's another reasons why requota reads the quota file directly: I didn't implement the getall command for ffs-quota1. It has nothing to do with xml or filesystem-independant code :) it's just I didn't know (and still don't know) how to properly read a whole file, especially when it's sparce, from the kernel. I could try a dqget for the 2^21-1 ids, but there's probably a better way to do that. I guess I could find the hupper bound from the quota vnode's size. Maybe that would be enough ... -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 05:35:16PM +, David Holland wrote: > > I can't parse this, can you explain ? The tools needs to be aware of the > > format to do something usefull with the data, isn't it ? > > The tools can and should work with a filesystem-independent abstract > schema. This should be independent of any filesystem's on-disk quota > format, just as the structures are independent of any > filesystem's on-disk directory layout. the current proplib-based schema is independant of the on-disk format (as it's just another representation of the same set of data that you proposed). > > > that's plain wrong. If it's quota1 you can use the quota1 code in > > sys/ufs/ufs (just as it would have done before quota2). > > No, it is not wrong. It cannot use the quota1 code in ufs; the whole > premise of the proposed lfs renovation is to unhook lfs from ufs. The > ufs code is a big blob, not a library of components; you can't just > use parts of it, or at least not easily. > > I can copy the ufs quota1 structures and some of the ufs quota code, > yes; but then I have struct lfs_dqblk, and I need to interface it to > the rest of the system, and as things currently stand that forces me > to clone all the ffs-quota1-specific quota code all over everywhere. So, if I understand you properly, your lfs code won't use the quota1 on-disk format but some new format based on a lfs_dqblk structure. Then it's a brand new disk format, the right thing to do is to use the convertion functions from common/lib/libquota/ (as the ufs/quota1 and ufs/quota2 code already do) and convert from here to your on-disk format. You can't claim a data representation isn't filesystem-independant because it doesn't correspond to you on-disk representation. As it's filesystem-independant it has (by definition) to be converted to every on-disk representation. > > The lfs/ufs split would have been committed ages ago if the quota > system hadn't gotten in the way. This is why, last spring, when yo > were designing quota2, I was asking you to fix things above the FS to > be FS-independent. But you didn't; instead it got worse. I tried at > the time to explain the situation and the premises, and why the quota > system should be FS-independent at and above the VFS level, but I got > ignored and then sucked away by real life. Well, I don't remember the details of that time but what I retained is that you didn't like xml. Now you're saying "I move lfs out of ufs and I can't use quota1 for lfs". Yes, of course as quota1 is tightly coupled to ufs, and my project was not to make quota1 filesystem-independant - it was to add a new on-disk quota for ffs with some better properties. You can't blame me for not making quota1 (or even quota2) reusable outside of ufs when my goal was to get a new on-disk format for ffs. That's just not the same work. Now, I don't think the current quota1 code is that much tied to ufs. If you want to use the same dqblk for your on-disk format (but then it's on-disk format, you can't claim it's fs-independant), code can certainly be reorganised to make it reusable outside of ufs. But that's orthogonal to filesystem-independant format representation. > > Now I'm trying to fix it. > > > > Likewise, if I were to go add quota support to v7fs, or try to hook up > > > whatever quota support zfs has, or commit Hammer and try to get > > > whatever quota support *it* has working, or add ext2 quota support, or > > > write a new fs with quota support, or whatever, I'd have to make still > > > more copies of the logic to cope with all the different formats and > > > layouts. > > > > Of course if you have new on-disk format you need to do some conversion, > > whatever "filesystem independant" format you use. > > But I think you could still reuse sys/ufs/ufs/quota2_subr.c to do the > > convertion from plist to some binary representation. > > I could cut and paste it, maybe. That's not particularly desirable. Now that I understand where you want to go, it's not the right thing to do. Use the code in common/lib/libquota and write convertion routines for your filesystem. You can call it a 'cut-n-paste' from quota2_subr.c, but as quota2_subr.c is about converting the filsystem-independant data to the quota2 on-disk format, and you use a different on-disk format you can't blame it for not fitting your needs. > > > > This is not a good idea, not scalable, and not sensible, especially > > > when a filesystem-independent (read "format-independent" if you like) > > > interface is both perfectly possible and simpler. > > > > I strongly believe the plist representation is format-independent. > > It has exactly the same informations as what you propose. > > Right now, I'm not sure if it is or not. I'm only sure that it's > highly complicated It's not more complicated than the table representation you proposed (beside being xml-based, but that's all whe have now). > (unnecessarily so) and underdocumented. Mea
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 07:59:08PM +0200, Ignatios Souvatzis wrote: > > > How would this fit in, if at all? > > > > That's a good question. My first instinct is that like the other stuff > > zfs does that it does in its own semantically-incompatible way, it > > would require its own tools. But I guess the quota system could be > > made to report the limits if the sub-filesystems are specifically > > assigned to users somehow. Or something like that... > > The problem is that sub-file-system per user, or one per workgroup and > one subsubfs per user, are only special cases of what you can do. It's > really a filesystem, mounted at some point in the tree, and can be used > to limit finer-grained than what you can express with user and group > quota, although it can emulate their functionality. That's sort of what I suspected. I dunno, it probably won't work. -- David A. Holland dholl...@netbsd.org
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 03:16:09PM +, David Holland wrote: > On Thu, Oct 20, 2011 at 11:57:04AM +0200, Ignatios Souvatzis wrote: > > > support to other filesystems (tempfs, perhaps v7fs) or even add other > > > filesystems that have or may have their own native quota handling > > > (zfs, Hammer, you name it). > > > > zfs - does it really have quota? > > I don't know... but if not, there are plenty of other fses. > > > All the demos I've seen talk about sub-filesystem limits; you create > > per-user sub-filesystems if you want to emulate per-user quota. > > > > (Correct me if I'm wrong.) > > > > How would this fit in, if at all? > > That's a good question. My first instinct is that like the other stuff > zfs does that it does in its own semantically-incompatible way, it > would require its own tools. But I guess the quota system could be > made to report the limits if the sub-filesystems are specifically > assigned to users somehow. Or something like that... The problem is that sub-file-system per user, or one per workgroup and one subsubfs per user, are only special cases of what you can do. It's really a filesystem, mounted at some point in the tree, and can be used to limit finer-grained than what you can express with user and group quota, although it can emulate their functionality. -is
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 12:56:17PM +0200, Manuel Bouyer wrote: > > > > So, a few months back we got a new improved quota format for FFS. > > > > Unfortunately, one of the side effects of this was to sprinkle > > > > specific knowledge of the new format through all the userlevel quota > > > > tools and quota support logic. To be fair, this was alongside the > > > > existing specific knowledge of the old quota format; nonetheless, it's > > > > messy and unscalable. > > > > > > of course there's been changes to the tools, as there's a new format. > > > > The tools ought to be format-independent. > > I can't parse this, can you explain ? The tools needs to be aware of the > format to do something usefull with the data, isn't it ? The tools can and should work with a filesystem-independent abstract schema. This should be independent of any filesystem's on-disk quota format, just as the structures are independent of any filesystem's on-disk directory layout. > > > You'll have to explain this. lfs is some variant of ffs, I see no > > > reasons why it coudln't use the new format. > > > > It could use whatever format it wants. To the extent it currently > > supports quotas, I think it's limited to the old-style quotas, that > > is, quota1. But there's no way to plug it in without taking the > > fs-dependent code currently in all the tools and access pathway and > > making a third or perhaps a third and fourth copy of all the logic. > > that's plain wrong. If it's quota1 you can use the quota1 code in > sys/ufs/ufs (just as it would have done before quota2). No, it is not wrong. It cannot use the quota1 code in ufs; the whole premise of the proposed lfs renovation is to unhook lfs from ufs. The ufs code is a big blob, not a library of components; you can't just use parts of it, or at least not easily. I can copy the ufs quota1 structures and some of the ufs quota code, yes; but then I have struct lfs_dqblk, and I need to interface it to the rest of the system, and as things currently stand that forces me to clone all the ffs-quota1-specific quota code all over everywhere. The lfs/ufs split would have been committed ages ago if the quota system hadn't gotten in the way. This is why, last spring, when yo were designing quota2, I was asking you to fix things above the FS to be FS-independent. But you didn't; instead it got worse. I tried at the time to explain the situation and the premises, and why the quota system should be FS-independent at and above the VFS level, but I got ignored and then sucked away by real life. Now I'm trying to fix it. > > Likewise, if I were to go add quota support to v7fs, or try to hook up > > whatever quota support zfs has, or commit Hammer and try to get > > whatever quota support *it* has working, or add ext2 quota support, or > > write a new fs with quota support, or whatever, I'd have to make still > > more copies of the logic to cope with all the different formats and > > layouts. > > Of course if you have new on-disk format you need to do some conversion, > whatever "filesystem independant" format you use. > But I think you could still reuse sys/ufs/ufs/quota2_subr.c to do the > convertion from plist to some binary representation. I could cut and paste it, maybe. That's not particularly desirable. > > This is not a good idea, not scalable, and not sensible, especially > > when a filesystem-independent (read "format-independent" if you like) > > interface is both perfectly possible and simpler. > > I strongly believe the plist representation is format-independent. > It has exactly the same informations as what you propose. Right now, I'm not sure if it is or not. I'm only sure that it's highly complicated (unnecessarily so) and underdocumented. Meanwhile, you've also been arguing that the quota2 on-disk structures are format-independent, so forgive me if I take this all with a grain of salt. > > > This is exactly the format described in quotactl(2). > > > > No, what's described in quotactl(2) is something about commands and > > arguments... and while there is a substructure that looks something > > like this, the fact remains that it's a *sub*structure > > Yes, but you still need a way to pass commands. You didn't talk about this. No, because I had something like the old quotactl(2) in mind - an ordinary call passing a filesystem identifier, a command code, and an argument. > > and the schema > > is not tabular. > > I don't understant what you mean here. there's a set of values associated > with an id, I can't see the difference with what your proposing. There's a complicated hierarchical structure of arrays and maps/dictionaries, as opposed to a single flat table with columns. Or, put another way, the schema I proposed is (I think) in third normal form, and yours isn't. Another way to put it is that your schema requires proplib to manage it, with all the attendant complexity, whereas mine works perfectly
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 04:39:21PM +, David Holland wrote: > > We're talking a few MB of ram here, isn't it ? the kernel can certainly > > allocate this without troubles (other subsystems do). > > The proplib'd and XMLified complete dump for 50,000 users will > probably make a blob of between 10 and 20 MB. (Note: this is an > estimate; I haven't checked the size by trying it. It might be larger. > I'd be surprised if it were much smaller.) I tested with a few 10s or users; my estimate is about 35MB for 50k users. > > I don't see why it's desirable to manifest such large objects when > it's easily avoidable. We don't agree on "easily". > > > > There are two design truisms for database stuff that apply here: > > > first, you always end up wanting cursors, and second, you always end > > > up wanting bulk get (and not just single get) from those cursors. So > > > it's usually a good idea to anticipate this and design it all in up > > > front. > > > > Maybe ... I know that in the end I want the whole set of data and not > > just a part of it. > > Yes, probably. The cursor API I've floated so far is not general > enough to support much else. Although it could be made more general. > > > But if you believe it's needed this can easily be added to the > > existing quotactl(2) (it would just be a new command). > > Yes, perhaps it could... but why? What's to be gained by using a > baroque proplib encoding of what can otherwise be handled as an array > of simple structs? it's an easily machine-parsable text. That's probably the reason why it's used in other parts of the kernel too. > > I remember asking this question when you first proposed the proplib > interface last spring, and never really got a clear answer. I see it as being the common format used for non-performance-critical kernel/userland communication. It has been adopted by other kernel subsystems, there's prior art there. > > > > > > The reason to wrap the position in a cursor abstraction is to allow > > > > > flexibility about how the position is represented. > > > > > > > > But then the cursor would still be stored in userland ? > > > > > > That's the idea, like reading a file with pread(). > > > > > > I think the kernel should know, or at least be able to know, how many > > > cursors are currently open; but I don't think there's any need to keep > > > the cursor state itself in the kernel. > > > > So you want a quotaopen/quotaclose, with a file descriptor (or something > > similar) ? > > The proposed API already has explicit open and close for cursors; what > I'm saying is that this should be exposed to the kernel. (Open already > has to be, to initialize the cursor position; close should be, so the > filesystem can if necessary know if there are cursors open at any > given time. Otherwise you can get into trouble; see for example nfsd > and readdir.) So you're close to have something like a file descriptor. -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 06:00:28PM +0200, Manuel Bouyer wrote: > > > It's certainly less trouble to send back to userland the whole set of > > > data - especially if what userland wants is the whole set of data > > > (I can't see what a partial read of quota would be usefull for). > > > > No, no it really isn't. Suppose there are, say, 50,000 users, so to > > send back the whole works you have to accumulate 100,000 quota entries > > in a gigantic blob... a machine with 50,000 users will have enough RAM > > for this but that doesn't mean that allocating a contiguous chunk of > > kernel memory that large is easy or desirable. Far better to read it > > out a couple hundred at a time. > > We're talking a few MB of ram here, isn't it ? the kernel can certainly > allocate this without troubles (other subsystems do). The proplib'd and XMLified complete dump for 50,000 users will probably make a blob of between 10 and 20 MB. (Note: this is an estimate; I haven't checked the size by trying it. It might be larger. I'd be surprised if it were much smaller.) I don't see why it's desirable to manifest such large objects when it's easily avoidable. > > There are two design truisms for database stuff that apply here: > > first, you always end up wanting cursors, and second, you always end > > up wanting bulk get (and not just single get) from those cursors. So > > it's usually a good idea to anticipate this and design it all in up > > front. > > Maybe ... I know that in the end I want the whole set of data and not > just a part of it. Yes, probably. The cursor API I've floated so far is not general enough to support much else. Although it could be made more general. > But if you believe it's needed this can easily be added to the > existing quotactl(2) (it would just be a new command). Yes, perhaps it could... but why? What's to be gained by using a baroque proplib encoding of what can otherwise be handled as an array of simple structs? I remember asking this question when you first proposed the proplib interface last spring, and never really got a clear answer. > > > > The reason to wrap the position in a cursor abstraction is to allow > > > > flexibility about how the position is represented. > > > > > > But then the cursor would still be stored in userland ? > > > > That's the idea, like reading a file with pread(). > > > > I think the kernel should know, or at least be able to know, how many > > cursors are currently open; but I don't think there's any need to keep > > the cursor state itself in the kernel. > > So you want a quotaopen/quotaclose, with a file descriptor (or something > similar) ? The proposed API already has explicit open and close for cursors; what I'm saying is that this should be exposed to the kernel. (Open already has to be, to initialize the cursor position; close should be, so the filesystem can if necessary know if there are cursors open at any given time. Otherwise you can get into trouble; see for example nfsd and readdir.) -- David A. Holland dholl...@netbsd.org
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 03:47:26PM +, David Holland wrote: > On Thu, Oct 20, 2011 at 05:23:14PM +0200, Manuel Bouyer wrote: > > > That's way more complicated than necessary. Think of it as like > > > VOP_READDIR - you get passed a position, you send back some number of > > > items, and update the position. > > > > Depending on how the data are stored on disk, the notion of position > > (which also implies some ordering) can be difficult to handle, > > especially if the data we're reading can change between two calls, > > causing the position do become invalid. > > ...yes, but this is just one of those things you have to cope with > when doing filesystems. It's no different from readdir in that regard. > > > It's certainly less trouble to send back to userland the whole set of > > data - especially if what userland wants is the whole set of data > > (I can't see what a partial read of quota would be usefull for). > > No, no it really isn't. Suppose there are, say, 50,000 users, so to > send back the whole works you have to accumulate 100,000 quota entries > in a gigantic blob... a machine with 50,000 users will have enough RAM > for this but that doesn't mean that allocating a contiguous chunk of > kernel memory that large is easy or desirable. Far better to read it > out a couple hundred at a time. We're talking a few MB of ram here, isn't it ? the kernel can certainly allocate this without troubles (other subsystems do). > > There are two design truisms for database stuff that apply here: > first, you always end up wanting cursors, and second, you always end > up wanting bulk get (and not just single get) from those cursors. So > it's usually a good idea to anticipate this and design it all in up > front. Maybe ... I know that in the end I want the whole set of data and not just a part of it. But if you believe it's needed this can easily be added to the existing quotactl(2) (it would just be a new command). > > > > The reason to wrap the position in a cursor abstraction is to allow > > > flexibility about how the position is represented. > > > > But then the cursor would still be stored in userland ? > > That's the idea, like reading a file with pread(). > > I think the kernel should know, or at least be able to know, how many > cursors are currently open; but I don't think there's any need to keep > the cursor state itself in the kernel. So you want a quotaopen/quotaclose, with a file descriptor (or something similar) ? -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 05:23:14PM +0200, Manuel Bouyer wrote: > > That's way more complicated than necessary. Think of it as like > > VOP_READDIR - you get passed a position, you send back some number of > > items, and update the position. > > Depending on how the data are stored on disk, the notion of position > (which also implies some ordering) can be difficult to handle, > especially if the data we're reading can change between two calls, > causing the position do become invalid. ...yes, but this is just one of those things you have to cope with when doing filesystems. It's no different from readdir in that regard. > It's certainly less trouble to send back to userland the whole set of > data - especially if what userland wants is the whole set of data > (I can't see what a partial read of quota would be usefull for). No, no it really isn't. Suppose there are, say, 50,000 users, so to send back the whole works you have to accumulate 100,000 quota entries in a gigantic blob... a machine with 50,000 users will have enough RAM for this but that doesn't mean that allocating a contiguous chunk of kernel memory that large is easy or desirable. Far better to read it out a couple hundred at a time. There are two design truisms for database stuff that apply here: first, you always end up wanting cursors, and second, you always end up wanting bulk get (and not just single get) from those cursors. So it's usually a good idea to anticipate this and design it all in up front. > > The reason to wrap the position in a cursor abstraction is to allow > > flexibility about how the position is represented. > > But then the cursor would still be stored in userland ? That's the idea, like reading a file with pread(). I think the kernel should know, or at least be able to know, how many cursors are currently open; but I don't think there's any need to keep the cursor state itself in the kernel. -- David A. Holland dholl...@netbsd.org
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 03:08:16PM +, David Holland wrote: > That's way more complicated than necessary. Think of it as like > VOP_READDIR - you get passed a position, you send back some number of > items, and update the position. Depending on how the data are stored on disk, the notion of position (which also implies some ordering) can be difficult to handle, especially if the data we're reading can change between two calls, causing the position do become invalid. It's certainly less trouble to send back to userland the whole set of data - especially if what userland wants is the whole set of data (I can't see what a partial read of quota would be usefull for). > > If you want to take the trouble to guarantee strict transactional > consistency, you can easily enough by checking generation numbers and > failing with a particular errno if things have changed; but I don't > think there's any real need for that level of strict consistency for > quotas. Much less so than for readdir, at least, and we manage to cope > with readdir the way it is. I agree with this. > > The reason to wrap the position in a cursor abstraction is to allow > flexibility about how the position is represented. But then the cursor would still be stored in userland ? -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 11:57:04AM +0200, Ignatios Souvatzis wrote: > > support to other filesystems (tempfs, perhaps v7fs) or even add other > > filesystems that have or may have their own native quota handling > > (zfs, Hammer, you name it). > > zfs - does it really have quota? I don't know... but if not, there are plenty of other fses. > All the demos I've seen talk about sub-filesystem limits; you create > per-user sub-filesystems if you want to emulate per-user quota. > > (Correct me if I'm wrong.) > > How would this fit in, if at all? That's a good question. My first instinct is that like the other stuff zfs does that it does in its own semantically-incompatible way, it would require its own tools. But I guess the quota system could be made to report the limits if the sub-filesystems are specifically assigned to users somehow. Or something like that... -- David A. Holland dholl...@netbsd.org
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 06:43:47AM +0200, Emmanuel Dreyfus wrote: > > It seems to me that quotas are fundamentally a special-purpose > > key/value store; that is, you look up quota information for a > > particular thing (the key) and get back the quota settings and current > > usage information (the value). > > If you are going to add a generic key/value store mechanism for all > filesystems, you can consider fs-independent extended attrbiutes as > well. I am not adding a generic key/value store mechanism. I am representing the quota data as a specific key/value store. A generic key/value store mechanism for all filesystems would be a very large, messy, and semantically nebulous project... -- David A. Holland dholl...@netbsd.org
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 04:53:53PM +0200, Manuel Bouyer wrote: > > You don't need to track state in the kernel, you just need to keep a > > generation ID. Have the caller pass a "starting index" and > > "requested count" parameter, and have the kernel include "number of > > matches", "total matches", and "generation ID" in the response. Let > > the kernel limit the maximum number of matches to return per request > > if you like. If the generation ID changes while the caller is > > fetching records you simply restart the process. > > I still don't see how it's going to work in details. Either > the kernel have to restart reading all quotas on each call to > return requested range, or it has to cache the previous read > of all quotas. > Once the kernel has read all the quotas, it can as well return > all the data to the caller, and let the caller deal with the > iteration. That's way more complicated than necessary. Think of it as like VOP_READDIR - you get passed a position, you send back some number of items, and update the position. If you want to take the trouble to guarantee strict transactional consistency, you can easily enough by checking generation numbers and failing with a particular errno if things have changed; but I don't think there's any real need for that level of strict consistency for quotas. Much less so than for readdir, at least, and we manage to cope with readdir the way it is. The reason to wrap the position in a cursor abstraction is to allow flexibility about how the position is represented. -- David A. Holland dholl...@netbsd.org
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 10:48:16AM -0400, Jared McNeill wrote: > Heyas Manuel -- > > You don't need to track state in the kernel, you just need to keep a > generation ID. Have the caller pass a "starting index" and > "requested count" parameter, and have the kernel include "number of > matches", "total matches", and "generation ID" in the response. Let > the kernel limit the maximum number of matches to return per request > if you like. If the generation ID changes while the caller is > fetching records you simply restart the process. I still don't see how it's going to work in details. Either the kernel have to restart reading all quotas on each call to return requested range, or it has to cache the previous read of all quotas. Once the kernel has read all the quotas, it can as well return all the data to the caller, and let the caller deal with the iteration. -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
proposed additions to sys/conf/std
I propose adding pseudo-device drvctl and/or options BUFQ_PRIOCSCAN to src/sys/conf/std. The reasons I even bring this up: - Many kernels are missing drvctl and thus do not support disk wedges (this is arguably due to a flaw in the design of disk wedges, but that's a another bikeshed). - BUFQ_PRIOCSCAN is superior to BUFQ_DISKSORT, and in fact BUFQ_DISKSORT is actually inferior to BUFQ_FCFS in terms of interactive disk I/O responsiveness. There are many kernels that default to BUFQ_DISKSORT due to not explicitly adding BUFQ_PRIOCSCAN. The ominous " # "it's commonly used" is NOT a good reason to enable options here. " line has me a bit apprehensive. However, pseudo-device cpuctl is there already. There are some options that are there for historical reasons, so this is sort of a slippery slope. Do we need a new config file for standard-but-optional-options? Jonathan Kollasch
Re: fs-independent quotas
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, 20 Oct 2011, Manuel Bouyer wrote: also it doesn't support cursors. This can easily be implemented in userland, without changes to the quotactl(2) interface. I've trouble seeing how this can be sanely implemented at the quotactl(2) level (I don't like the idea of the kernel keeping states about what a specific userland process is doing). Heyas Manuel -- You don't need to track state in the kernel, you just need to keep a generation ID. Have the caller pass a "starting index" and "requested count" parameter, and have the kernel include "number of matches", "total matches", and "generation ID" in the response. Let the kernel limit the maximum number of matches to return per request if you like. If the generation ID changes while the caller is fetching records you simply restart the process. Cheers, Jared -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (NetBSD) iQEcBAEBAgAGBQJOoDSxAAoJEKdMfxFXhnem68IIAIGRUe0LA7chV89RvUBIS8Ji 5N3d/6bGk2bchbmLzHFav3TdR6IZREdOpQ9/sgawNeucP22c2evKh2431ASojsGL Cd4SbFY6eTWxnC+HFeTVNDVbM7TsIxvIXzpFHZfIPwJGwny6gh4TzyAhP1Ep1wW4 E4AVRrsG5cZyYcFZdxW8/0CL9nNyFU9L2uMNnGldiwkv42lwbsQXoeLI1MfutYLY 6aZy29UWyE4ehwdmB2KrSjqqR6+Pbbzj9AsOEAhlhXotvk7wXPBCfN9WhkstwOUa uFRmWDqNW5EFr0Q3nbPfTQvD3bYoCNbvHlL27VT+ZB8TsFRk2EO8PROxka8EMOk= =jXDw -END PGP SIGNATURE-
Re: fs-independent quotas
On Wed, Oct 19, 2011 at 10:20:23PM +, David Holland wrote: > On Wed, Oct 19, 2011 at 09:22:02PM +0200, Manuel Bouyer wrote: > > > So, a few months back we got a new improved quota format for FFS. > > > Unfortunately, one of the side effects of this was to sprinkle > > > specific knowledge of the new format through all the userlevel quota > > > tools and quota support logic. To be fair, this was alongside the > > > existing specific knowledge of the old quota format; nonetheless, it's > > > messy and unscalable. > > > > of course there's been changes to the tools, as there's a new format. > > The tools ought to be format-independent. I can't parse this, can you explain ? The tools needs to be aware of the format to do something usefull with the data, isn't it ? > > > > We may want to add more quota formats (e.g. the different and > > > incompatible new quota format FreeBSD added last year) or add quota > > > support to other filesystems (tempfs, perhaps v7fs) or even add other > > > filesystems that have or may have their own native quota handling > > > (zfs, Hammer, you name it). Also, my planned lfs-renovation is > > > currently hung up on the VFS-level quota interface, because I don't > > > want to rip out the existing maybe-partial support for quotas but > > > can't plug new code into the existing framework. > > > > You'll have to explain this. lfs is some variant of ffs, I see no reasons > > why it coudln't use the new format. > > It could use whatever format it wants. To the extent it currently > supports quotas, I think it's limited to the old-style quotas, that > is, quota1. But there's no way to plug it in without taking the > fs-dependent code currently in all the tools and access pathway and > making a third or perhaps a third and fourth copy of all the logic. that's plain wrong. If it's quota1 you can use the quota1 code in sys/ufs/ufs (just as it would have done before quota2). > > Likewise, if I were to go add quota support to v7fs, or try to hook up > whatever quota support zfs has, or commit Hammer and try to get > whatever quota support *it* has working, or add ext2 quota support, or > write a new fs with quota support, or whatever, I'd have to make still > more copies of the logic to cope with all the different formats and > layouts. Of course if you have new on-disk format you need to do some conversion, whatever "filesystem independant" format you use. But I think you could still reuse sys/ufs/ufs/quota2_subr.c to do the convertion from plist to some binary representation. > > This is not a good idea, not scalable, and not sensible, especially > when a filesystem-independent (read "format-independent" if you like) > interface is both perfectly possible and simpler. I strongly believe the plist representation is format-independent. It has exactly the same informations as what you propose. > > > in fact the new format is fs-independant. > > Yes, in the sense that one could add the format to other file systems; > but no, in the sense that other file systems already have their own > quota formats and we need to be able to interoperate. You have to do some convertion, of the same level as with what you propose. > > > But this is just what the current propib format is ! a set of tables > > with key/values pair ! > > That's great, that'll make the changes I need to make that much > easier. But it doesn't seem particularly familiar relative to the code > I've been working on. Or maybe you don't need to change it at all. > > > > the quota *type* > > > > > >- the quota value is: > > > the configured hard limit > > > the configured soft limit > > > the configured grace period > > > the current usage > > > the current grace expiry time (if any) > > > > This is exactly the format described in quotactl(2). > > No, what's described in quotactl(2) is something about commands and > arguments... and while there is a substructure that looks something > like this, the fact remains that it's a *sub*structure Yes, but you still need a way to pass commands. You didn't talk about this. > and the schema > is not tabular. I don't understant what you mean here. there's a set of values associated with an id, I can't see the difference with what your proposing. > > > > The quota *class* is the thing the quota is imposed on; this is > > > currently either "user" or "group". There is no likely prospect of > > > additional quota classes appearing. > > > > I don't think we should limit ourselve to these class. I could see > > per-host or per-hostgroup quotas for networked filesystems for example. > > I'm not limiting it to anything, but I'll believe in more quota > classes when I see them. Per-host quotas (even if they make sense, > which I question) aren't going to work very well with a 32-bit id, for > example. right, that's where a plist is a win. > > Whereas, as I pointed out before, th
Re: fs-independent quotas
On Wed, Oct 19, 2011 at 06:09:27PM +, David Holland wrote: > support to other filesystems (tempfs, perhaps v7fs) or even add other > filesystems that have or may have their own native quota handling > (zfs, Hammer, you name it). zfs - does it really have quota? All the demos I've seen talk about sub-filesystem limits; you create per-user sub-filesystems if you want to emulate per-user quota. (Correct me if I'm wrong.) How would this fit in, if at all? -is
Re: dtrace ioctls
On Wed, Oct 19, 2011 at 10:22:08PM +, David Holland wrote: > On Wed, Oct 19, 2011 at 10:01:33PM +0100, David Laight wrote: > > > > Hmmm... the sun code is passing the structure by value > > Is it? The non-sun code appears to be calling an ioctl that's defined > to take a pointer to a pointer to a structure. Or maybe I'm totally > misreading ioccom.h? Maybe I was asleep David -- David Laight: da...@l8s.co.uk