Re: PUFFS ADVLOCK is too greedy
On Fri, Nov 11, 2011 at 11:11:35AM +0100, Emmanuel Dreyfus wrote: Here is where the extra ADVLOCK happen: sys_exit - exit1 - fd_free - fd_close - VOP_ADVLOCK In a nutshell, NetBSD clears locks on all file descriptor when a process exit, that include read-only descriptors, and FUSE filesystems seem to be unconfortable with that. I am not sure of where this should be fixed: should we avoid unlocking read-only files in fd_close()? That seems a good idea performance-wise. I think that is required by POSIX. Any close() of a file by a process removes ALL locks held by that process (on that file), not just those acquired through that fd. Yes that has nasty side effects, especially in threaded processes, or when library routines open/close files that might have locks on them in other parts of the process.) Similarly there isn't a file locking scheme that can be used between threads os a single process. David -- David Laight: da...@l8s.co.uk
Re: MAXNAMLEN vs NAME_MAX
On Sun, 13 Nov 2011 23:08:30 + David Holland dholland-t...@netbsd.org wrote: I was recently talking to some people who'd been working with some (physicists, I think) doing data-intensive simulation of some kind, and that reminded me: for various reasons, many people who are doing serious data collection or simulation tend to encode vast amounts of metadata in the names of their data files. Arguably this is a bad way of doing things, but there are reasons for it and not so many clear alternatives... anyway, 256 character filenames often aren't enough in that context. It's only my opinion, but they really should be using multiple files or a database for the metadata with as necessary a link to an actual file for data. But I also tend to think the same of software relying on extended attributes, resource forks and the like (with the possible exception of a specialized facility for extended permissions :) (This sort of usage also often involves things like 50,000 files in one directory, so the columnizing behavior of ls is far from the top of the list of relevant issues.) This reminds me, does anyone know about the current state of UFS_DIRHASH? I remember reading about some issues with it and ending up disabling it on my kernels, yet huge directories can occur in a number of scenarios (probably a more pressing issue than extending file names, actually)... The 255 limit was just because that's how many bytes a one byte length field permitted, not because anyone thought names that long made sense. But if you're going to increase it, why stop at 511? That number means nothing - the next logical limit would be 65535 wouldn't it? Well... yes but there are other considerations. As you noted, going past one physical sector is problematic; going past one filesystem block very problematic. Plus, as long as MMU pages remain 4K, allocating contiguous kernel virtual space for path buffers (since if NAME_MAX were raised to 64K, PATH_MAX would have to be at least that large) could start to be a problem. I agree, especially with all the software that allocates path/file name buffers on the stack (but even on the heap it could be a general memory waste with 64KB, other than the memory management performance issues). -- Matt
Re: fs-independent quotas
On Sun, Nov 13, 2011 at 07:42:18PM -0500, Mouse wrote: The arguments that ufs_quota_entry (or whatever its name is) will be good enough for any future filesystem is just not true. You have asserted that. I also explained why, I think. Proof by repeated assertion is...unconvincing. I can explain again, but I don't like writing the same thing over and over again. I have more interesting things to do. -- Manuel Bouyer bou...@antioche.eu.org NetBSD: 26 ans d'experience feront toujours la difference --
Re: fs-independent quotas
On Sun, Nov 13, 2011 at 10:36:55PM +0100, Manuel Bouyer wrote: On Sat, Oct 29, 2011 at 05:14:30PM +, David Holland wrote: [...] 3. Abolish the proplib-based transport encoding. Since it turns out that the use of proplib for quotactl(2) is only to encode struct ufs_quota_entry for transport across the user/kernel boundary, converting it back on the other side, it seems to me that it's a completely pointless complication and a poor use of proplib. It's also messy, even compared to other proplib usage elsewhere. (Regarding claims of easier/better compatibility, see below.) Ho no, not again ! The arguments that ufs_quota_entry (or whatever its name is) will be good enough for any future filesystem is just not true. We already have been in this argument. Yes, and your hypothetical examples haven't come close to convincing me. And I agree there's no point thrashing back and forth any further; this is why I've asked core to decide. 3. There's already been some discussion of the compat issues in this thread. Basically it boils down to: if you send a program material that it's not expecting to receive, it won't be able to cope with it and will fail/die/crash. This is true whether the material is binary or a proplib bundle or text or whatever else. With a binary it'll probably crash. With a text-based format it will notice the syntax error return an error code. This is a big difference, especially for kernel. Neither is good enough if you're providing backwards compatibility; it has to *work*. This is the standard we're committed to, and I continue to think there's no particular advantage for proplib in this regard, particularly for this particular kind of data. (I don't think any semistructured or self-describing data model, including the perfect one I'd replace proplib with if I could wave a wand to do so, provides any particular advantage for procedure call compatibility. Sure, you can tag data bundles with version codes and such, but we can and do already do that by tagging the call itself and have lots of support architecture in place for doing it that way. The advantages appear when you're dealing with irregularly structured material, like when there are large numbers of optional fields or optional parameters and so forth.) 4. If using split on the output of quota/repquota/whatnot isn't good enough (in some specific places we may want a machine-readable-output option) then the best way to access the quota system from Perl or Python (or Ruby or Lua or ...) is with a set of bindings for the new proposed libquota. This should be straightforward to set up once the new libquota is in place. I think the current quotactl(8) should just Are you going to provide those bindings ? I'm interested in perl. I don't do Perl. I might be persuaded to do Python bindings, but it would probably be more effective to enlist someone who already knows how to do this and won't therefore make newbie mistakes with the interpreter. Anyway, the hard part is making the library interface available; wrapping Perl or Python around it should be entirely trivial. - As far as I can tell there is not and never has been support for manipulating quotas on unmounted filesystems. There was in quoya(1), repquota(8) and edquota(8), and I've been using it with netbsd-5. Just read the code in the netbsd-5 branch to see it. I've looked. I don't know what you're seeing, but all I see is code for directly manipulating the quota files. There's no logic for mounting anything to reach them. So it'll work if the quota files are on / and the volume is for /home regardless of whether /home is mounted... but only because it doesn't have to touch /home in this case. I don't think that feature is worth preserving either, since it's purely a side-effect of having visible quota files. And I don't see the point; it's not like mounting the volume to run edquota will cause a catastrophe. -- David A. Holland dholl...@netbsd.org
Re: MAXNAMLEN vs NAME_MAX
On Mon, Nov 14, 2011 at 04:03:09AM -0500, Matthew Mondor wrote: I was recently talking to some people who'd been working with some (physicists, I think) doing data-intensive simulation of some kind, and that reminded me: for various reasons, many people who are doing serious data collection or simulation tend to encode vast amounts of metadata in the names of their data files. Arguably this is a bad way of doing things, but there are reasons for it and not so many clear alternatives... anyway, 256 character filenames often aren't enough in that context. It's only my opinion, but they really should be using multiple files or a database for the metadata with as necessary a link to an actual file for data. Perhaps, but telling people they should be working a different way usually doesn't help. (Have you ever done any stuff like this? Even if you have only a few settings and only a couple hundred output files, there's still no decent way to arrange it but name the output files after the settings.) (This sort of usage also often involves things like 50,000 files in one directory, so the columnizing behavior of ls is far from the top of the list of relevant issues.) This reminds me, does anyone know about the current state of UFS_DIRHASH? I remember reading about some issues with it and ending up disabling it on my kernels, yet huge directories can occur in a number of scenarios (probably a more pressing issue than extending file names, actually)... I don't know. At best it's not really a complete solution, anyway... Well... yes but there are other considerations. As you noted, going past one physical sector is problematic; going past one filesystem block very problematic. Plus, as long as MMU pages remain 4K, allocating contiguous kernel virtual space for path buffers (since if NAME_MAX were raised to 64K, PATH_MAX would have to be at least that large) could start to be a problem. I agree, especially with all the software that allocates path/file name buffers on the stack (but even on the heap it could be a general memory waste with 64KB, other than the memory management performance issues). Pathname buffers generally shouldn't be (and in NetBSD, aren't) on the stack regardless. Even at only 1K each, it's really easy to blow a 4k kernel stack with them. (In practice you can generally get away with one; but two, like you need for rename, link, symlink, etc. is too many.) Or I guess you don't mean in the kernel, do you... -- David A. Holland dholl...@netbsd.org
Re: bumping ARG_MAX
On Sun, Nov 13, 2011 at 11:17:52PM +, David Holland wrote: pkgsrc has grown to the point where the following happens: valkyrie% pwd /usr/pkgsrc valkyrie% grep foo */*/Makefile /usr/bin/grep: Argument list too long. Exit 1 Use: grep -r --include Makefile foo . But don't forget the '.' - should be the default with -r (or at least an error). David -- David Laight: da...@l8s.co.uk
Re: MAXNAMLEN vs NAME_MAX
On Mon, Nov 14, 2011 at 04:03:09AM -0500, Matthew Mondor wrote: On Sun, 13 Nov 2011 23:08:30 + David Holland dholland-t...@netbsd.org wrote: I was recently talking to some people who'd been working with some (physicists, I think) doing data-intensive simulation of some kind, and that reminded me: for various reasons, many people who are doing serious data collection or simulation tend to encode vast amounts of metadata in the names of their data files. Arguably this is a bad way of doing things, but there are reasons for it and not so many clear alternatives... anyway, 256 character filenames often aren't enough in that context. It's only my opinion, but they really should be using multiple files or a database for the metadata with as necessary a link to an actual file for data. Or use '/' to separate the fields in their long filename :-) (But then they'll hit the 32k/64k limit on subdirectories ...) Thinks... MD5 hash the user-specified filename and use that for the 'real' name. Add some special fudgery so that readdir() works. Then use some kind of overlay mount. David -- David Laight: da...@l8s.co.uk
Re: bumping ARG_MAX
valkyrie% grep foo */*/Makefile Use: grep -r --include Makefile foo . That (a) will include Makefiles at other depths than two (which may not be a problem in the specific example of pkgsrc, but in general makes it non-equivalent), (b) is grep-specifc, and (c) will walk the whole tre to full depth even if there aren't any more Makefiles for it to find. I think the point still stands. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: bumping ARG_MAX
On Mon, Nov 14, 2011 at 05:39:02PM +, David Laight wrote: pkgsrc has grown to the point where the following happens: valkyrie% pwd /usr/pkgsrc valkyrie% grep foo */*/Makefile /usr/bin/grep: Argument list too long. Exit 1 Use: grep -r --include Makefile foo . But don't forget the '.' - should be the default with -r (or at least an error). Or use: find . -name Makefile -print | xargs grep foo Or use: grep foo [a-m]*/*/Makefile; grep foo [n-z]*/*/Makefile or whatever. That's completely not the point... -- David A. Holland dholl...@netbsd.org
Re: VOP_GETATTR: locking protocol change proposal
hi, The vnode locking requirement currently allows to call VOP_GETATTR() on an unlocked vnode. This is orthogonal to all other operations that read data or metadata and want at least a shared lock. It also asks for trouble as the attributes may change while the operation is in progress. With the attached diff the locking protocol requests at least a shared lock and all calls to VOP_GETATTR() outside of file systems respect it. The calls from file systems need review (NFS server is suspicious at least). I will commit this diff around Oct 14 if noone objects. postgresql assumes instant lseek(SEEK_END) to get the size of their heap files. http://rhaas.blogspot.com/2011/11/linux-lseek-scalability.html as fsync etc keeps the vnode lock during i/o, it might cause severe performance regression. YAMAMOTO Takashi -- Juergen Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)