Re: MAXNAMLEN vs NAME_MAX

2011-11-14 Thread Matthew Mondor
On Sun, 13 Nov 2011 23:08:30 +
David Holland dholland-t...@netbsd.org wrote:

 I was recently talking to some people who'd been working with some
 (physicists, I think) doing data-intensive simulation of some kind,
 and that reminded me: for various reasons, many people who are doing
 serious data collection or simulation tend to encode vast amounts of
 metadata in the names of their data files. Arguably this is a bad way
 of doing things, but there are reasons for it and not so many clear
 alternatives... anyway, 256 character filenames often aren't enough in
 that context.

It's only my opinion, but they really should be using multiple files or
a database for the metadata with as necessary a link to an actual
file for data.
But I also tend to think the same of software relying on extended
attributes, resource forks and the like (with the possible exception of
a specialized facility for extended permissions :)

 (This sort of usage also often involves things like 50,000 files in
 one directory, so the columnizing behavior of ls is far from the top
 of the list of relevant issues.)

This reminds me, does anyone know about the current state of
UFS_DIRHASH?  I remember reading about some issues with it and ending up
disabling it on my kernels, yet huge directories can occur in a number
of scenarios (probably a more pressing issue than extending file names,
actually)...

   The 255 limit was just because that's how many bytes a one byte length
   field permitted, not because anyone thought names that long made sense.
   But if you're going to increase it, why  stop at 511?  That number
   means nothing - the next logical limit would be 65535 wouldn't it?
 
 Well... yes but there are other considerations. As you noted, going
 past one physical sector is problematic; going past one filesystem
 block very problematic. Plus, as long as MMU pages remain 4K,
 allocating contiguous kernel virtual space for path buffers (since if
 NAME_MAX were raised to 64K, PATH_MAX would have to be at least that
 large) could start to be a problem.

I agree, especially with all the software that allocates path/file name
buffers on the stack (but even on the heap it could be a general memory
waste with 64KB, other than the memory management performance issues).
-- 
Matt


Re: MAXNAMLEN vs NAME_MAX

2011-11-14 Thread David Holland
On Mon, Nov 14, 2011 at 04:03:09AM -0500, Matthew Mondor wrote:
   I was recently talking to some people who'd been working with some
   (physicists, I think) doing data-intensive simulation of some kind,
   and that reminded me: for various reasons, many people who are doing
   serious data collection or simulation tend to encode vast amounts of
   metadata in the names of their data files. Arguably this is a bad way
   of doing things, but there are reasons for it and not so many clear
   alternatives... anyway, 256 character filenames often aren't enough in
   that context.
  
  It's only my opinion, but they really should be using multiple files or
  a database for the metadata with as necessary a link to an actual
  file for data.

Perhaps, but telling people they should be working a different way
usually doesn't help. (Have you ever done any stuff like this? Even if
you have only a few settings and only a couple hundred output files,
there's still no decent way to arrange it but name the output files
after the settings.)

   (This sort of usage also often involves things like 50,000 files in
   one directory, so the columnizing behavior of ls is far from the top
   of the list of relevant issues.)
  
  This reminds me, does anyone know about the current state of
  UFS_DIRHASH?  I remember reading about some issues with it and ending up
  disabling it on my kernels, yet huge directories can occur in a number
  of scenarios (probably a more pressing issue than extending file names,
  actually)...

I don't know. At best it's not really a complete solution, anyway...

   Well... yes but there are other considerations. As you noted, going
   past one physical sector is problematic; going past one filesystem
   block very problematic. Plus, as long as MMU pages remain 4K,
   allocating contiguous kernel virtual space for path buffers (since if
   NAME_MAX were raised to 64K, PATH_MAX would have to be at least that
   large) could start to be a problem.
  
  I agree, especially with all the software that allocates path/file name
  buffers on the stack (but even on the heap it could be a general memory
  waste with 64KB, other than the memory management performance issues).

Pathname buffers generally shouldn't be (and in NetBSD, aren't) on the
stack regardless. Even at only 1K each, it's really easy to blow a 4k
kernel stack with them. (In practice you can generally get away with
one; but two, like you need for rename, link, symlink, etc. is too
many.)

Or I guess you don't mean in the kernel, do you...

-- 
David A. Holland
dholl...@netbsd.org


Re: MAXNAMLEN vs NAME_MAX

2011-11-14 Thread David Laight
On Mon, Nov 14, 2011 at 04:03:09AM -0500, Matthew Mondor wrote:
 On Sun, 13 Nov 2011 23:08:30 +
 David Holland dholland-t...@netbsd.org wrote:
 
  I was recently talking to some people who'd been working with some
  (physicists, I think) doing data-intensive simulation of some kind,
  and that reminded me: for various reasons, many people who are doing
  serious data collection or simulation tend to encode vast amounts of
  metadata in the names of their data files. Arguably this is a bad way
  of doing things, but there are reasons for it and not so many clear
  alternatives... anyway, 256 character filenames often aren't enough in
  that context.
 
 It's only my opinion, but they really should be using multiple files or
 a database for the metadata with as necessary a link to an actual
 file for data.

Or use '/' to separate the fields in their long filename :-)
(But then they'll hit the 32k/64k limit on subdirectories ...)

Thinks...   MD5 hash the user-specified filename and use that for
the 'real' name. Add some special fudgery so that readdir() works.
Then use some kind of overlay mount.

David

-- 
David Laight: da...@l8s.co.uk


Re: MAXNAMLEN vs NAME_MAX

2011-11-13 Thread David Holland
On Tue, Sep 27, 2011 at 03:48:29PM +0700, Robert Elz wrote:
| But it is better long term to move forward and allow for longer
| names.
  
  Why?

I was recently talking to some people who'd been working with some
(physicists, I think) doing data-intensive simulation of some kind,
and that reminded me: for various reasons, many people who are doing
serious data collection or simulation tend to encode vast amounts of
metadata in the names of their data files. Arguably this is a bad way
of doing things, but there are reasons for it and not so many clear
alternatives... anyway, 256 character filenames often aren't enough in
that context.

(This sort of usage also often involves things like 50,000 files in
one directory, so the columnizing behavior of ls is far from the top
of the list of relevant issues.)

If we do end up going through and doing a full set of compat functions
again, I think we should raise the limit to 1024... and maybe bump
PATH_MAX to say 4096 too.

  The 255 limit was just because that's how many bytes a one byte length
  field permitted, not because anyone thought names that long made sense.
  But if you're going to increase it, why  stop at 511?  That number
  means nothing - the next logical limit would be 65535 wouldn't it?

Well... yes but there are other considerations. As you noted, going
past one physical sector is problematic; going past one filesystem
block very problematic. Plus, as long as MMU pages remain 4K,
allocating contiguous kernel virtual space for path buffers (since if
NAME_MAX were raised to 64K, PATH_MAX would have to be at least that
large) could start to be a problem.

-- 
David A. Holland
dholl...@netbsd.org


Re: MAXNAMLEN vs NAME_MAX

2011-09-27 Thread Mouse
 Certainly the original 14 byte limit was occasionally a nuisance (but
 even that was better than 8+3 which was typical), but longer than
 255?

I've run into the 255 limit.  On only a few occasions, but definitely
more than zero.  (About three times, I think.)

In my case it is usually files named after URLs; I will typically put
(for example) http://www3.telus.net/~bhilpert/tmp/touchTone1969.gif in
a file called www3.telus.net%~bhilpert%tmp%touchTone1969.gif.  I
regularly see (though seldom want to fetch) URLs long enough to blow
out a 255-character limit under this transformation.

I'm sure other people have their own uses for long pathname components,
too, though I don't know of any offhand.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: MAXNAMLEN vs NAME_MAX

2011-09-27 Thread Christos Zoulas
On Sep 27,  3:48pm, k...@munnari.oz.au (Robert Elz) wrote:
-- Subject: Re: MAXNAMLEN vs NAME_MAX

| Date:Mon, 26 Sep 2011 09:46:09 -0400
| From:chris...@zoulas.com (Christos Zoulas)
| Message-ID:  20110926134609.8322a97...@rebar.astron.com
| 
|   | But it is better long term to move forward and allow for longer
|   | names.
| 
| Why?
| 
| Certainly the original 14 byte limit was occasionally a nuisance (but
| even that was better than 8+3 which was typical), but longer than 255?
| 
| Even using a utf-8 encoded filename, at 5 bytes/character, that's still
| a 51 character filename, which is longer than rational - names = 40
| characters mean that on a standard 80 column display, ls can't even show
| 2 columns of names.

Heh.

| The 255 limit was just because that's how many bytes a one byte length
| field permitted, not because anyone thought names that long made sense.
| But if you're going to increase it, why  stop at 511?  That number
| means nothing - the next logical limit would be 65535 wouldn't it?

Eats too much space on the stack I think.

| 511 is already too big to keep directory entries within one disc block
| when using small blocks, so you've already lost the idempotent directory
| entry write feature that was part of the original design (both v7fs and
| ffs) (not just file system block writes, but actual drive writing, you
| can't fix this just by making a larger minimum frag size, only by changing
| drive electronics and/or microcode).

I think we are already moving away from 512 byte sectors...

Anyway this is optional, most filesystems only can support upt to 255 byte
filenames as others have pointed out. There was also a mess about the use
of MAXNAMLEN vs. NAME_MAX in the kernel which I have fixed now in the following
way:

1. Each filesystem defines its own FS_MAXNAMLEN and uses it to advertise its
   own limit.
2. MAXNAMLEN == NAME_MAX == 511 as the maximum limit supported.
3. KERNEL_NAME_MAX == 255 is the current internal limit of what the kernel
   will return to userland for all filesystems.

christos


RE: MAXNAMLEN vs NAME_MAX

2011-09-27 Thread Paul_Koning
| Even using a utf-8 encoded filename, at 5 bytes/character, that's 
| still a 51 character filename, which is longer than rational - names 
| = 40 characters mean that on a standard 80 column display, ls can't 
| even show
| 2 columns of names.

FWIW: the basic multilingual plane of Unicode (the 2-byte subset) requires a 
max of 3 bytes per char in UTF-8.  And even the additional planes that have 
been defines -- of which only a small fraction is in use, and nearly all of 
that is for archaic scripts -- takes you only up to 10. as the highest code 
point, well below what can be encoded in 4 bytes.  So realistically 255 bytes 
max gives you over 80 characters max.

paul



Re: MAXNAMLEN vs NAME_MAX

2011-09-26 Thread David Laight
On Sat, Sep 24, 2011 at 08:10:49PM -0400, Christos Zoulas wrote:
 
 Because it will create a horrible mess for anything that tries to allocate
 a struct dirent and use it. Imagine having an old library with new binaries
 or vice-verse.

Is it possible to add a named 'pad' field?

David

-- 
David Laight: da...@l8s.co.uk


Re: MAXNAMLEN vs NAME_MAX

2011-09-26 Thread Christos Zoulas
On Sep 26,  8:42am, da...@l8s.co.uk (David Laight) wrote:
-- Subject: Re: MAXNAMLEN vs NAME_MAX

| On Sat, Sep 24, 2011 at 08:10:49PM -0400, Christos Zoulas wrote:
|  
|  Because it will create a horrible mess for anything that tries to allocate
|  a struct dirent and use it. Imagine having an old library with new binaries
|  or vice-verse.
| 
| Is it possible to add a named 'pad' field?

We could... But it is better long term to move forward and allow for longer
names.

christos


Re: MAXNAMLEN vs NAME_MAX

2011-09-24 Thread Mouse
 MAXNAMLEN = 511
 NAME_MAX = 255

 [...]  We want to make them consistent.

Do you want to increase NAME_MAX, or decrease MAXNAMLEN?

 My opinion is that [versioning userland] is not worth the trouble.
 The only programs that can fail are ones that do things like:
  char name[NAME_MAX];
  strcpy(name, d-d_name);

This sounds as though you are contemplating increasing NAME_MAX.

 sizeof(d-d_name) does not change. It is just that d_namelen can be 
 255 (NAME_MAX).  Only programs that use NAME_MAX to store directory
 entries can fail.

Not quite.  Such things can also find their way into code in subtler
ways.  For example, I've writen code that knows it can store a
directory entry length in an unsigned char (which amounts to assuming
NAME_MAX = UCHAR_MAX).  I think all the recent examples of that I've
written have been FFS-specific and therefore safe (if I'm reading
things right, FFS uses a single octet to store directory entry length
on disk), but I'm probably not the only one who's done such stuff.

 My vote is to bump without versioning, what's yours?

I probably agree with you.  But what's the motivation for increasing
NAME_MAX rather than decreasing MAXNAMLEN?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: MAXNAMLEN vs NAME_MAX

2011-09-24 Thread Christos Zoulas
On Sep 25, 12:40am, jeanyves.mig...@free.fr (Jean-Yves Migeon) wrote:
-- Subject: Re: MAXNAMLEN vs NAME_MAX

|  My vote is to bump without versioning, what's yours?
| 
| Hmm, what do you want to do there? Increase NAME_MAX or decrease MAXNAMLEN?
| 
| I would do the latter; ffs, ext2 and lfs all seem to use 255 for 
| MAXNAMLEN. So, I cast my vote for bump without versioning.

If you decrease MAXNAMLEN you *must* version! Anyway we came from there,
and there is no reason to move backwards. The change proposed is to make
NAME_MAX match MAXNAMLEN without bumping.

christos


Re: MAXNAMLEN vs NAME_MAX

2011-09-24 Thread Jean-Yves Migeon

On 25.09.2011 00:57, Christos Zoulas wrote:

On Sep 25, 12:40am, jeanyves.mig...@free.fr (Jean-Yves Migeon) wrote:
-- Subject: Re: MAXNAMLEN vs NAME_MAX

|  My vote is to bump without versioning, what's yours?
|
| Hmm, what do you want to do there? Increase NAME_MAX or decrease MAXNAMLEN?
|
| I would do the latter; ffs, ext2 and lfs all seem to use 255 for
| MAXNAMLEN. So, I cast my vote for bump without versioning.

If you decrease MAXNAMLEN you *must* version! Anyway we came from there,
and there is no reason to move backwards. The change proposed is to make
NAME_MAX match MAXNAMLEN without bumping.


Yup, I forgot about getdents(2) compat.

BTW, why would it be necessary to version? d_name is the last element of 
struct dirent; I can't see how d_name content could be bigger than 256 
(including NULL) anyway, so only those that copy d_name string with 
MAXNAMLEN size directly (instead of using _PC_NAME_MAX, NAME_MAX or 
strlen(3)) are in trouble, no?


--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: MAXNAMLEN vs NAME_MAX

2011-09-24 Thread Christos Zoulas
On Sep 25,  1:28am, jeanyves.mig...@free.fr (Jean-Yves Migeon) wrote:
-- Subject: Re: MAXNAMLEN vs NAME_MAX

| On 25.09.2011 00:57, Christos Zoulas wrote:
|  On Sep 25, 12:40am, jeanyves.mig...@free.fr (Jean-Yves Migeon) wrote:
|  -- Subject: Re: MAXNAMLEN vs NAME_MAX
| 
|  |  My vote is to bump without versioning, what's yours?
|  |
|  | Hmm, what do you want to do there? Increase NAME_MAX or decrease 
MAXNAMLEN?
|  |
|  | I would do the latter; ffs, ext2 and lfs all seem to use 255 for
|  | MAXNAMLEN. So, I cast my vote for bump without versioning.
| 
|  If you decrease MAXNAMLEN you *must* version! Anyway we came from there,
|  and there is no reason to move backwards. The change proposed is to make
|  NAME_MAX match MAXNAMLEN without bumping.
| 
| Yup, I forgot about getdents(2) compat.
| 
| BTW, why would it be necessary to version? d_name is the last element of 
| struct dirent; I can't see how d_name content could be bigger than 256 
| (including NULL) anyway, so only those that copy d_name string with 
| MAXNAMLEN size directly (instead of using _PC_NAME_MAX, NAME_MAX or 
| strlen(3)) are in trouble, no?

Because it will create a horrible mess for anything that tries to allocate
a struct dirent and use it. Imagine having an old library with new binaries
or vice-verse.

christos