Re: MAXNAMLEN vs NAME_MAX
On Sun, 13 Nov 2011 23:08:30 + David Holland dholland-t...@netbsd.org wrote: I was recently talking to some people who'd been working with some (physicists, I think) doing data-intensive simulation of some kind, and that reminded me: for various reasons, many people who are doing serious data collection or simulation tend to encode vast amounts of metadata in the names of their data files. Arguably this is a bad way of doing things, but there are reasons for it and not so many clear alternatives... anyway, 256 character filenames often aren't enough in that context. It's only my opinion, but they really should be using multiple files or a database for the metadata with as necessary a link to an actual file for data. But I also tend to think the same of software relying on extended attributes, resource forks and the like (with the possible exception of a specialized facility for extended permissions :) (This sort of usage also often involves things like 50,000 files in one directory, so the columnizing behavior of ls is far from the top of the list of relevant issues.) This reminds me, does anyone know about the current state of UFS_DIRHASH? I remember reading about some issues with it and ending up disabling it on my kernels, yet huge directories can occur in a number of scenarios (probably a more pressing issue than extending file names, actually)... The 255 limit was just because that's how many bytes a one byte length field permitted, not because anyone thought names that long made sense. But if you're going to increase it, why stop at 511? That number means nothing - the next logical limit would be 65535 wouldn't it? Well... yes but there are other considerations. As you noted, going past one physical sector is problematic; going past one filesystem block very problematic. Plus, as long as MMU pages remain 4K, allocating contiguous kernel virtual space for path buffers (since if NAME_MAX were raised to 64K, PATH_MAX would have to be at least that large) could start to be a problem. I agree, especially with all the software that allocates path/file name buffers on the stack (but even on the heap it could be a general memory waste with 64KB, other than the memory management performance issues). -- Matt
Re: MAXNAMLEN vs NAME_MAX
On Mon, Nov 14, 2011 at 04:03:09AM -0500, Matthew Mondor wrote: I was recently talking to some people who'd been working with some (physicists, I think) doing data-intensive simulation of some kind, and that reminded me: for various reasons, many people who are doing serious data collection or simulation tend to encode vast amounts of metadata in the names of their data files. Arguably this is a bad way of doing things, but there are reasons for it and not so many clear alternatives... anyway, 256 character filenames often aren't enough in that context. It's only my opinion, but they really should be using multiple files or a database for the metadata with as necessary a link to an actual file for data. Perhaps, but telling people they should be working a different way usually doesn't help. (Have you ever done any stuff like this? Even if you have only a few settings and only a couple hundred output files, there's still no decent way to arrange it but name the output files after the settings.) (This sort of usage also often involves things like 50,000 files in one directory, so the columnizing behavior of ls is far from the top of the list of relevant issues.) This reminds me, does anyone know about the current state of UFS_DIRHASH? I remember reading about some issues with it and ending up disabling it on my kernels, yet huge directories can occur in a number of scenarios (probably a more pressing issue than extending file names, actually)... I don't know. At best it's not really a complete solution, anyway... Well... yes but there are other considerations. As you noted, going past one physical sector is problematic; going past one filesystem block very problematic. Plus, as long as MMU pages remain 4K, allocating contiguous kernel virtual space for path buffers (since if NAME_MAX were raised to 64K, PATH_MAX would have to be at least that large) could start to be a problem. I agree, especially with all the software that allocates path/file name buffers on the stack (but even on the heap it could be a general memory waste with 64KB, other than the memory management performance issues). Pathname buffers generally shouldn't be (and in NetBSD, aren't) on the stack regardless. Even at only 1K each, it's really easy to blow a 4k kernel stack with them. (In practice you can generally get away with one; but two, like you need for rename, link, symlink, etc. is too many.) Or I guess you don't mean in the kernel, do you... -- David A. Holland dholl...@netbsd.org
Re: MAXNAMLEN vs NAME_MAX
On Mon, Nov 14, 2011 at 04:03:09AM -0500, Matthew Mondor wrote: On Sun, 13 Nov 2011 23:08:30 + David Holland dholland-t...@netbsd.org wrote: I was recently talking to some people who'd been working with some (physicists, I think) doing data-intensive simulation of some kind, and that reminded me: for various reasons, many people who are doing serious data collection or simulation tend to encode vast amounts of metadata in the names of their data files. Arguably this is a bad way of doing things, but there are reasons for it and not so many clear alternatives... anyway, 256 character filenames often aren't enough in that context. It's only my opinion, but they really should be using multiple files or a database for the metadata with as necessary a link to an actual file for data. Or use '/' to separate the fields in their long filename :-) (But then they'll hit the 32k/64k limit on subdirectories ...) Thinks... MD5 hash the user-specified filename and use that for the 'real' name. Add some special fudgery so that readdir() works. Then use some kind of overlay mount. David -- David Laight: da...@l8s.co.uk
Re: MAXNAMLEN vs NAME_MAX
On Tue, Sep 27, 2011 at 03:48:29PM +0700, Robert Elz wrote: | But it is better long term to move forward and allow for longer | names. Why? I was recently talking to some people who'd been working with some (physicists, I think) doing data-intensive simulation of some kind, and that reminded me: for various reasons, many people who are doing serious data collection or simulation tend to encode vast amounts of metadata in the names of their data files. Arguably this is a bad way of doing things, but there are reasons for it and not so many clear alternatives... anyway, 256 character filenames often aren't enough in that context. (This sort of usage also often involves things like 50,000 files in one directory, so the columnizing behavior of ls is far from the top of the list of relevant issues.) If we do end up going through and doing a full set of compat functions again, I think we should raise the limit to 1024... and maybe bump PATH_MAX to say 4096 too. The 255 limit was just because that's how many bytes a one byte length field permitted, not because anyone thought names that long made sense. But if you're going to increase it, why stop at 511? That number means nothing - the next logical limit would be 65535 wouldn't it? Well... yes but there are other considerations. As you noted, going past one physical sector is problematic; going past one filesystem block very problematic. Plus, as long as MMU pages remain 4K, allocating contiguous kernel virtual space for path buffers (since if NAME_MAX were raised to 64K, PATH_MAX would have to be at least that large) could start to be a problem. -- David A. Holland dholl...@netbsd.org
Re: MAXNAMLEN vs NAME_MAX
Certainly the original 14 byte limit was occasionally a nuisance (but even that was better than 8+3 which was typical), but longer than 255? I've run into the 255 limit. On only a few occasions, but definitely more than zero. (About three times, I think.) In my case it is usually files named after URLs; I will typically put (for example) http://www3.telus.net/~bhilpert/tmp/touchTone1969.gif in a file called www3.telus.net%~bhilpert%tmp%touchTone1969.gif. I regularly see (though seldom want to fetch) URLs long enough to blow out a 255-character limit under this transformation. I'm sure other people have their own uses for long pathname components, too, though I don't know of any offhand. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: MAXNAMLEN vs NAME_MAX
On Sep 27, 3:48pm, k...@munnari.oz.au (Robert Elz) wrote: -- Subject: Re: MAXNAMLEN vs NAME_MAX | Date:Mon, 26 Sep 2011 09:46:09 -0400 | From:chris...@zoulas.com (Christos Zoulas) | Message-ID: 20110926134609.8322a97...@rebar.astron.com | | | But it is better long term to move forward and allow for longer | | names. | | Why? | | Certainly the original 14 byte limit was occasionally a nuisance (but | even that was better than 8+3 which was typical), but longer than 255? | | Even using a utf-8 encoded filename, at 5 bytes/character, that's still | a 51 character filename, which is longer than rational - names = 40 | characters mean that on a standard 80 column display, ls can't even show | 2 columns of names. Heh. | The 255 limit was just because that's how many bytes a one byte length | field permitted, not because anyone thought names that long made sense. | But if you're going to increase it, why stop at 511? That number | means nothing - the next logical limit would be 65535 wouldn't it? Eats too much space on the stack I think. | 511 is already too big to keep directory entries within one disc block | when using small blocks, so you've already lost the idempotent directory | entry write feature that was part of the original design (both v7fs and | ffs) (not just file system block writes, but actual drive writing, you | can't fix this just by making a larger minimum frag size, only by changing | drive electronics and/or microcode). I think we are already moving away from 512 byte sectors... Anyway this is optional, most filesystems only can support upt to 255 byte filenames as others have pointed out. There was also a mess about the use of MAXNAMLEN vs. NAME_MAX in the kernel which I have fixed now in the following way: 1. Each filesystem defines its own FS_MAXNAMLEN and uses it to advertise its own limit. 2. MAXNAMLEN == NAME_MAX == 511 as the maximum limit supported. 3. KERNEL_NAME_MAX == 255 is the current internal limit of what the kernel will return to userland for all filesystems. christos
RE: MAXNAMLEN vs NAME_MAX
| Even using a utf-8 encoded filename, at 5 bytes/character, that's | still a 51 character filename, which is longer than rational - names | = 40 characters mean that on a standard 80 column display, ls can't | even show | 2 columns of names. FWIW: the basic multilingual plane of Unicode (the 2-byte subset) requires a max of 3 bytes per char in UTF-8. And even the additional planes that have been defines -- of which only a small fraction is in use, and nearly all of that is for archaic scripts -- takes you only up to 10. as the highest code point, well below what can be encoded in 4 bytes. So realistically 255 bytes max gives you over 80 characters max. paul
Re: MAXNAMLEN vs NAME_MAX
On Sat, Sep 24, 2011 at 08:10:49PM -0400, Christos Zoulas wrote: Because it will create a horrible mess for anything that tries to allocate a struct dirent and use it. Imagine having an old library with new binaries or vice-verse. Is it possible to add a named 'pad' field? David -- David Laight: da...@l8s.co.uk
Re: MAXNAMLEN vs NAME_MAX
On Sep 26, 8:42am, da...@l8s.co.uk (David Laight) wrote: -- Subject: Re: MAXNAMLEN vs NAME_MAX | On Sat, Sep 24, 2011 at 08:10:49PM -0400, Christos Zoulas wrote: | | Because it will create a horrible mess for anything that tries to allocate | a struct dirent and use it. Imagine having an old library with new binaries | or vice-verse. | | Is it possible to add a named 'pad' field? We could... But it is better long term to move forward and allow for longer names. christos
Re: MAXNAMLEN vs NAME_MAX
MAXNAMLEN = 511 NAME_MAX = 255 [...] We want to make them consistent. Do you want to increase NAME_MAX, or decrease MAXNAMLEN? My opinion is that [versioning userland] is not worth the trouble. The only programs that can fail are ones that do things like: char name[NAME_MAX]; strcpy(name, d-d_name); This sounds as though you are contemplating increasing NAME_MAX. sizeof(d-d_name) does not change. It is just that d_namelen can be 255 (NAME_MAX). Only programs that use NAME_MAX to store directory entries can fail. Not quite. Such things can also find their way into code in subtler ways. For example, I've writen code that knows it can store a directory entry length in an unsigned char (which amounts to assuming NAME_MAX = UCHAR_MAX). I think all the recent examples of that I've written have been FFS-specific and therefore safe (if I'm reading things right, FFS uses a single octet to store directory entry length on disk), but I'm probably not the only one who's done such stuff. My vote is to bump without versioning, what's yours? I probably agree with you. But what's the motivation for increasing NAME_MAX rather than decreasing MAXNAMLEN? /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: MAXNAMLEN vs NAME_MAX
On Sep 25, 12:40am, jeanyves.mig...@free.fr (Jean-Yves Migeon) wrote: -- Subject: Re: MAXNAMLEN vs NAME_MAX | My vote is to bump without versioning, what's yours? | | Hmm, what do you want to do there? Increase NAME_MAX or decrease MAXNAMLEN? | | I would do the latter; ffs, ext2 and lfs all seem to use 255 for | MAXNAMLEN. So, I cast my vote for bump without versioning. If you decrease MAXNAMLEN you *must* version! Anyway we came from there, and there is no reason to move backwards. The change proposed is to make NAME_MAX match MAXNAMLEN without bumping. christos
Re: MAXNAMLEN vs NAME_MAX
On 25.09.2011 00:57, Christos Zoulas wrote: On Sep 25, 12:40am, jeanyves.mig...@free.fr (Jean-Yves Migeon) wrote: -- Subject: Re: MAXNAMLEN vs NAME_MAX | My vote is to bump without versioning, what's yours? | | Hmm, what do you want to do there? Increase NAME_MAX or decrease MAXNAMLEN? | | I would do the latter; ffs, ext2 and lfs all seem to use 255 for | MAXNAMLEN. So, I cast my vote for bump without versioning. If you decrease MAXNAMLEN you *must* version! Anyway we came from there, and there is no reason to move backwards. The change proposed is to make NAME_MAX match MAXNAMLEN without bumping. Yup, I forgot about getdents(2) compat. BTW, why would it be necessary to version? d_name is the last element of struct dirent; I can't see how d_name content could be bigger than 256 (including NULL) anyway, so only those that copy d_name string with MAXNAMLEN size directly (instead of using _PC_NAME_MAX, NAME_MAX or strlen(3)) are in trouble, no? -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: MAXNAMLEN vs NAME_MAX
On Sep 25, 1:28am, jeanyves.mig...@free.fr (Jean-Yves Migeon) wrote: -- Subject: Re: MAXNAMLEN vs NAME_MAX | On 25.09.2011 00:57, Christos Zoulas wrote: | On Sep 25, 12:40am, jeanyves.mig...@free.fr (Jean-Yves Migeon) wrote: | -- Subject: Re: MAXNAMLEN vs NAME_MAX | | | My vote is to bump without versioning, what's yours? | | | | Hmm, what do you want to do there? Increase NAME_MAX or decrease MAXNAMLEN? | | | | I would do the latter; ffs, ext2 and lfs all seem to use 255 for | | MAXNAMLEN. So, I cast my vote for bump without versioning. | | If you decrease MAXNAMLEN you *must* version! Anyway we came from there, | and there is no reason to move backwards. The change proposed is to make | NAME_MAX match MAXNAMLEN without bumping. | | Yup, I forgot about getdents(2) compat. | | BTW, why would it be necessary to version? d_name is the last element of | struct dirent; I can't see how d_name content could be bigger than 256 | (including NULL) anyway, so only those that copy d_name string with | MAXNAMLEN size directly (instead of using _PC_NAME_MAX, NAME_MAX or | strlen(3)) are in trouble, no? Because it will create a horrible mess for anything that tries to allocate a struct dirent and use it. Imagine having an old library with new binaries or vice-verse. christos