Re: iunique() fails to return ino_t (after commit 866b04fccbf125cd)

2007-09-17 Thread Jeff Layton
On Mon, 17 Sep 2007 08:06:15 +0530
"Satyam Sharma" <[EMAIL PROTECTED]> wrote:

> On 9/17/07, Jeff Layton <[EMAIL PROTECTED]> wrote:
> > On Mon, 17 Sep 2007 00:58:54 +0530
> > "Satyam Sharma" <[EMAIL PROTECTED]> wrote:
> >
> > > Hi Jeff,
> > >
> > > I think commit 866b04fccbf125cd39f2bdbcfeaa611d39a061a8 was wrong, and
> > > introduced a regression.
> > >
> > > The "relevant" changelog [*] of that patch says:
> > >
> > >
> > > >  on filesystems w/o permanent inode numbers, i_ino values can be larger
> > > >  than 32 bits, which can cause problems for some 32 bit userspace 
> > > > programs
> > > >  on a 64 bit kernel.  We can't do anything for filesystems that have
> > > >  actual >32-bit inode numbers, but on filesystems that generate i_ino
> > > >  values on the fly, we should try to have them fit in 32 bits.  We could
> > > >  trivially fix this by making the static counters in new_inode and 
> > > > iunique
> > > >  32 bits [...]
> > > >
> > > > [...]
> > > > When a 32-bit program that was not compiled with large file offsets 
> > > > does a
> > > > stat and gets a st_ino value back that won't fit in the 32 bit field, 
> > > > glibc
> > > > (correctly) generates an EOVERFLOW error.  We can't do anything about 
> > > > fs's
> > > > with larger permanent inode numbers, but when we generate them on the 
> > > > fly,
> > > > we ought to try and have them fit within a 32 bit field.
> > > >
> > > > This patch takes the first step toward this by making the static 
> > > > counters
> > > > in these two functions be 32 bits.
> > >
> > >
> > > 1. First and foremost, there was nothing wrong with the existing code that
> > >needed to be "fixed" at all, i.e. there was no "problem" to be solved 
> > > in
> > >the first place. As was said, glibc *correctly* returns EOVERFLOW when 
> > > a
> > >userspace application built *without* _FILE_OFFSET_BITS == 64 (i.e.
> > >ino_t != __ino64_t) tries to stat(2) a file whose serial number does 
> > > not
> > >fit in the "st_ino" member of struct stat. This behaviour is (1) 
> > > correct,
> > >(2) explicitly mandated by the Single UNIX Specification, and, (3) all
> > >userspace programs *must* be prepared to deal with it. [ Should 
> > > probably
> > >mention here that other implementations, such as Solaris, do conform 
> > > with
> > >SUS here. ]
> > >
> >
> > The ideal situation is that everyone would recompile their programs
> > with LFS defines. Unfortunately people have old userspace programs to
> > which they don't have sources, or that can't easily be recompiled this way.
> > These programs would occasionally break when run on 64 bit kernels
> > for the reasons I've described.
> 
> That's a bad excuse! Such userspace programs are buggy, period.
> Let's fix *them*. And, seriously, are you *really* talking of "supporting"
> userspace programs whose even *sources* are no longer available ?
> 
> The standard is clear and simple -- calls from userspace programs (that
> don't define LFS) to stat(2) on a file whose serial number is >2**32 must
> see EOVERFLOW. This is *not* a kernel problem that needs "fixing".
> 
> Moreover, changing a kernel function such as iunique() (which was expressly
> *written* to be used by filesystems to assign ino_t to inodes) to return only
> values <= 2**32 to satisfy such buggy programs is just ... bad.
> 
> BTW, you might've as well changed the type of "res" in iunique() in that
> patch to "unsigned int" too. What's the point of declaring it "ino_t" if we
> never assign it any value in the (ino_t - unsigned int) set in the first 
> place?
> 
> 
> > > 2. The patch has nothing to do with "32-bit userspace on 64-bit kernels" 
> > > or
> > >compatibility modes whatsoever. It is unrelated and tangential that 
> > > this
> > >behaviour is easy to reproduce on the above mentioned setup. Simply 
> > > put,
> > >the issue is that a userspace program built *without* LFS tried to
> > >stat(2) a file with serial number > 2**32. Needless to say, this issue
> > >must be solved in the userspace program itself (either by (1) defining
> > >LFS, or, (2) making it aware of EOVERFLOW), and not in the kernel.
> > >
> >
> > It most certainly does have something to do with 32 bit userspace on 64
> > bit kernels.
> 
> No, it doesn't ...
> 
> > On a 32 bit kernel, new_inode and iunique generate no
> > inode numbers >32 bits. On a 64 bit kernel, they do.
> 
> This is *unrelated*. It's completely immaterial how an inode managed to
> get a serial number > 2**32. Whether it used iunique() running on a 64-bit
> kernel, or it's own little implementation, or whatever. The *real* issue is 
> with
> the *userspace program* and not the kernel ... that's the whole point.
> 
> [ Also, you're assuming sizeof(long) == sizeof(int) for 32-bit kernels
>   here, but okay, probably that's true for all supported targets ... ]
> 
> 
> > This means that
> > programs that are built this way may eventually fail on a 64 bit kernel
> > 

Re: iunique() fails to return ino_t (after commit 866b04fccbf125cd)

2007-09-17 Thread Jeff Layton
On Mon, 17 Sep 2007 08:06:15 +0530
Satyam Sharma [EMAIL PROTECTED] wrote:

 On 9/17/07, Jeff Layton [EMAIL PROTECTED] wrote:
  On Mon, 17 Sep 2007 00:58:54 +0530
  Satyam Sharma [EMAIL PROTECTED] wrote:
 
   Hi Jeff,
  
   I think commit 866b04fccbf125cd39f2bdbcfeaa611d39a061a8 was wrong, and
   introduced a regression.
  
   The relevant changelog [*] of that patch says:
  
  
 on filesystems w/o permanent inode numbers, i_ino values can be larger
 than 32 bits, which can cause problems for some 32 bit userspace 
programs
 on a 64 bit kernel.  We can't do anything for filesystems that have
 actual 32-bit inode numbers, but on filesystems that generate i_ino
 values on the fly, we should try to have them fit in 32 bits.  We could
 trivially fix this by making the static counters in new_inode and 
iunique
 32 bits [...]
   
[...]
When a 32-bit program that was not compiled with large file offsets 
does a
stat and gets a st_ino value back that won't fit in the 32 bit field, 
glibc
(correctly) generates an EOVERFLOW error.  We can't do anything about 
fs's
with larger permanent inode numbers, but when we generate them on the 
fly,
we ought to try and have them fit within a 32 bit field.
   
This patch takes the first step toward this by making the static 
counters
in these two functions be 32 bits.
  
  
   1. First and foremost, there was nothing wrong with the existing code that
  needed to be fixed at all, i.e. there was no problem to be solved 
   in
  the first place. As was said, glibc *correctly* returns EOVERFLOW when 
   a
  userspace application built *without* _FILE_OFFSET_BITS == 64 (i.e.
  ino_t != __ino64_t) tries to stat(2) a file whose serial number does 
   not
  fit in the st_ino member of struct stat. This behaviour is (1) 
   correct,
  (2) explicitly mandated by the Single UNIX Specification, and, (3) all
  userspace programs *must* be prepared to deal with it. [ Should 
   probably
  mention here that other implementations, such as Solaris, do conform 
   with
  SUS here. ]
  
 
  The ideal situation is that everyone would recompile their programs
  with LFS defines. Unfortunately people have old userspace programs to
  which they don't have sources, or that can't easily be recompiled this way.
  These programs would occasionally break when run on 64 bit kernels
  for the reasons I've described.
 
 That's a bad excuse! Such userspace programs are buggy, period.
 Let's fix *them*. And, seriously, are you *really* talking of supporting
 userspace programs whose even *sources* are no longer available ?
 
 The standard is clear and simple -- calls from userspace programs (that
 don't define LFS) to stat(2) on a file whose serial number is 2**32 must
 see EOVERFLOW. This is *not* a kernel problem that needs fixing.
 
 Moreover, changing a kernel function such as iunique() (which was expressly
 *written* to be used by filesystems to assign ino_t to inodes) to return only
 values = 2**32 to satisfy such buggy programs is just ... bad.
 
 BTW, you might've as well changed the type of res in iunique() in that
 patch to unsigned int too. What's the point of declaring it ino_t if we
 never assign it any value in the (ino_t - unsigned int) set in the first 
 place?
 
 
   2. The patch has nothing to do with 32-bit userspace on 64-bit kernels 
   or
  compatibility modes whatsoever. It is unrelated and tangential that 
   this
  behaviour is easy to reproduce on the above mentioned setup. Simply 
   put,
  the issue is that a userspace program built *without* LFS tried to
  stat(2) a file with serial number  2**32. Needless to say, this issue
  must be solved in the userspace program itself (either by (1) defining
  LFS, or, (2) making it aware of EOVERFLOW), and not in the kernel.
  
 
  It most certainly does have something to do with 32 bit userspace on 64
  bit kernels.
 
 No, it doesn't ...
 
  On a 32 bit kernel, new_inode and iunique generate no
  inode numbers 32 bits. On a 64 bit kernel, they do.
 
 This is *unrelated*. It's completely immaterial how an inode managed to
 get a serial number  2**32. Whether it used iunique() running on a 64-bit
 kernel, or it's own little implementation, or whatever. The *real* issue is 
 with
 the *userspace program* and not the kernel ... that's the whole point.
 
 [ Also, you're assuming sizeof(long) == sizeof(int) for 32-bit kernels
   here, but okay, probably that's true for all supported targets ... ]
 
 
  This means that
  programs that are built this way may eventually fail on a 64 bit kernel
  when the inode counter grows large enough. Those programs will work
  indefinitely on a 32 bit kernel.
 
 See below, for why such programs are buggy ...
 
 
   3. Solving a problem at a place where it does not exist naturally leads to
  other breakage. After 866b04fccbf125cd, iunique() no longer returns an

Re: iunique() fails to return ino_t (after commit 866b04fccbf125cd)

2007-09-16 Thread Satyam Sharma
On 9/17/07, Jeff Layton <[EMAIL PROTECTED]> wrote:
> On Mon, 17 Sep 2007 00:58:54 +0530
> "Satyam Sharma" <[EMAIL PROTECTED]> wrote:
>
> > Hi Jeff,
> >
> > I think commit 866b04fccbf125cd39f2bdbcfeaa611d39a061a8 was wrong, and
> > introduced a regression.
> >
> > The "relevant" changelog [*] of that patch says:
> >
> >
> > >  on filesystems w/o permanent inode numbers, i_ino values can be larger
> > >  than 32 bits, which can cause problems for some 32 bit userspace programs
> > >  on a 64 bit kernel.  We can't do anything for filesystems that have
> > >  actual >32-bit inode numbers, but on filesystems that generate i_ino
> > >  values on the fly, we should try to have them fit in 32 bits.  We could
> > >  trivially fix this by making the static counters in new_inode and iunique
> > >  32 bits [...]
> > >
> > > [...]
> > > When a 32-bit program that was not compiled with large file offsets does a
> > > stat and gets a st_ino value back that won't fit in the 32 bit field, 
> > > glibc
> > > (correctly) generates an EOVERFLOW error.  We can't do anything about fs's
> > > with larger permanent inode numbers, but when we generate them on the fly,
> > > we ought to try and have them fit within a 32 bit field.
> > >
> > > This patch takes the first step toward this by making the static counters
> > > in these two functions be 32 bits.
> >
> >
> > 1. First and foremost, there was nothing wrong with the existing code that
> >needed to be "fixed" at all, i.e. there was no "problem" to be solved in
> >the first place. As was said, glibc *correctly* returns EOVERFLOW when a
> >userspace application built *without* _FILE_OFFSET_BITS == 64 (i.e.
> >ino_t != __ino64_t) tries to stat(2) a file whose serial number does not
> >fit in the "st_ino" member of struct stat. This behaviour is (1) correct,
> >(2) explicitly mandated by the Single UNIX Specification, and, (3) all
> >userspace programs *must* be prepared to deal with it. [ Should probably
> >mention here that other implementations, such as Solaris, do conform with
> >SUS here. ]
> >
>
> The ideal situation is that everyone would recompile their programs
> with LFS defines. Unfortunately people have old userspace programs to
> which they don't have sources, or that can't easily be recompiled this way.
> These programs would occasionally break when run on 64 bit kernels
> for the reasons I've described.

That's a bad excuse! Such userspace programs are buggy, period.
Let's fix *them*. And, seriously, are you *really* talking of "supporting"
userspace programs whose even *sources* are no longer available ?

The standard is clear and simple -- calls from userspace programs (that
don't define LFS) to stat(2) on a file whose serial number is >2**32 must
see EOVERFLOW. This is *not* a kernel problem that needs "fixing".

Moreover, changing a kernel function such as iunique() (which was expressly
*written* to be used by filesystems to assign ino_t to inodes) to return only
values <= 2**32 to satisfy such buggy programs is just ... bad.

BTW, you might've as well changed the type of "res" in iunique() in that
patch to "unsigned int" too. What's the point of declaring it "ino_t" if we
never assign it any value in the (ino_t - unsigned int) set in the first place?


> > 2. The patch has nothing to do with "32-bit userspace on 64-bit kernels" or
> >compatibility modes whatsoever. It is unrelated and tangential that this
> >behaviour is easy to reproduce on the above mentioned setup. Simply put,
> >the issue is that a userspace program built *without* LFS tried to
> >stat(2) a file with serial number > 2**32. Needless to say, this issue
> >must be solved in the userspace program itself (either by (1) defining
> >LFS, or, (2) making it aware of EOVERFLOW), and not in the kernel.
> >
>
> It most certainly does have something to do with 32 bit userspace on 64
> bit kernels.

No, it doesn't ...

> On a 32 bit kernel, new_inode and iunique generate no
> inode numbers >32 bits. On a 64 bit kernel, they do.

This is *unrelated*. It's completely immaterial how an inode managed to
get a serial number > 2**32. Whether it used iunique() running on a 64-bit
kernel, or it's own little implementation, or whatever. The *real* issue is with
the *userspace program* and not the kernel ... that's the whole point.

[ Also, you're assuming sizeof(long) == sizeof(int) for 32-bit kernels
  here, but okay, probably that's true for all supported targets ... ]


> This means that
> programs that are built this way may eventually fail on a 64 bit kernel
> when the inode counter grows large enough. Those programs will work
> indefinitely on a 32 bit kernel.

See below, for why such programs are buggy ...


> > 3. Solving a problem at a place where it does not exist naturally leads to
> >other breakage. After 866b04fccbf125cd, iunique() no longer returns an
> >ino_t, but only values <= 2**32, which limits the number of inodes on
> >

Re: iunique() fails to return ino_t (after commit 866b04fccbf125cd)

2007-09-16 Thread Jeff Layton
On Mon, 17 Sep 2007 00:58:54 +0530
"Satyam Sharma" <[EMAIL PROTECTED]> wrote:

> Hi Jeff,
> 
> I think commit 866b04fccbf125cd39f2bdbcfeaa611d39a061a8 was wrong, and
> introduced a regression.
> 
> The "relevant" changelog [*] of that patch says:
> 
> 
> >  on filesystems w/o permanent inode numbers, i_ino values can be larger
> >  than 32 bits, which can cause problems for some 32 bit userspace programs
> >  on a 64 bit kernel.  We can't do anything for filesystems that have
> >  actual >32-bit inode numbers, but on filesystems that generate i_ino
> >  values on the fly, we should try to have them fit in 32 bits.  We could
> >  trivially fix this by making the static counters in new_inode and iunique
> >  32 bits [...]
> >
> > [...]
> > When a 32-bit program that was not compiled with large file offsets does a
> > stat and gets a st_ino value back that won't fit in the 32 bit field, glibc
> > (correctly) generates an EOVERFLOW error.  We can't do anything about fs's
> > with larger permanent inode numbers, but when we generate them on the fly,
> > we ought to try and have them fit within a 32 bit field.
> >
> > This patch takes the first step toward this by making the static counters
> > in these two functions be 32 bits.
> 
> 
> 1. First and foremost, there was nothing wrong with the existing code that
>needed to be "fixed" at all, i.e. there was no "problem" to be solved in
>the first place. As was said, glibc *correctly* returns EOVERFLOW when a
>userspace application built *without* _FILE_OFFSET_BITS == 64 (i.e.
>ino_t != __ino64_t) tries to stat(2) a file whose serial number does not
>fit in the "st_ino" member of struct stat. This behaviour is (1) correct,
>(2) explicitly mandated by the Single UNIX Specification, and, (3) all
>userspace programs *must* be prepared to deal with it. [ Should probably
>mention here that other implementations, such as Solaris, do conform with
>SUS here. ]
> 

The ideal situation is that everyone would recompile their programs
with LFS defines. Unfortunately people have old userspace programs to
which they don't have sources, or that can't easily be recompiled this
way. These programs would occasionally break when run on 64 bit kernels
for the reasons I've described.

> 2. The patch has nothing to do with "32-bit userspace on 64-bit kernels" or
>compatibility modes whatsoever. It is unrelated and tangential that this
>behaviour is easy to reproduce on the above mentioned setup. Simply put,
>the issue is that a userspace program built *without* LFS tried to
>stat(2) a file with serial number > 2**32. Needless to say, this issue
>must be solved in the userspace program itself (either by (1) defining
>LFS, or, (2) making it aware of EOVERFLOW), and not in the kernel.
> 

It most certainly does have something to do with 32 bit userspace on 64
bit kernels. On a 32 bit kernel, new_inode and iunique generate no
inode numbers >32 bits. On a 64 bit kernel, they do. This means that
programs that are built this way may eventually fail on a 64 bit kernel
when the inode counter grows large enough. Those programs will work
indefinitely on a 32 bit kernel.

> 3. Solving a problem at a place where it does not exist naturally leads to
>other breakage. After 866b04fccbf125cd, iunique() no longer returns an
>ino_t, but only values <= 2**32, which limits the number of inodes on
>all filesystems that use that function. Needless to say, this is a
>*regression* w.r.t. previous kernels before that commit went in.
> 

Why is this a problem? Filesystems that truly need that many more inodes
are certainly able to generate one using a different scheme. Typically,
the inode numbers generated by iunique and new_inode are only used by
filesystems that have no permanent inode numbers of their own. In many
cases, inode number uniqueness isn't even an issue as evidenced by the
number of filesystems that simply use the number assigned by new_inode. 

This patch seems like a reasonable compromise to me. It allows us to
keep these inode numbers below 32 bits on filesystems that don't care
too much about what inode number they're using.

> 4. Last but not the least, the sample testcase program that was discussed
>on LKML last year to show this "problem" was buggy and wrong. A program
>built without LFS will also suffer EOVERFLOW when stat(2)'ing a file
>due to other reasons, such as filesize not fitting inside the "st_size"
>member. Do we propose to "fix" that "problem" in the kernel too ?
>Of course not!
> 

Right, but that situation is the same regardless of whether you run it
on a 32 or 64 bit kernel. The issue of inode numbers generated by
new_inode and iunique crossing the 32-bit boundary, however, is not.

Can you elaborate why this testcase was buggy and wrong? It seems to me
that it correctly demonstrated the issue. Just because there are other
reasons that a program might get an EOVERFLOW doesn't mean 

Re: iunique() fails to return ino_t (after commit 866b04fccbf125cd)

2007-09-16 Thread Andrew Morton
On Mon, 17 Sep 2007 00:58:54 +0530 "Satyam Sharma" <[EMAIL PROTECTED]> wrote:

> [*] BTW, the changelog/patch description of this commit demonstrates
> why it is a Bad Thing (tm) to have lengthy [PATCH 0/x] kind of mails
> (containing important technical details) preceding a patchset.
> 
> I can only guess as to what happened, but reading the archives of the
> original submission of this patchset on LKML, I think Andrew had to
> append the contents of the [0/3] mail to the git commit of the [1/3]
> patch (so as to not lose all those details and ensure that they got saved
> in the git history), with the result that most of the changelog of commit
> 866b04fccbf1 has nothing to do with that particular patch at all, but
> instead with other commits, that do not even touch that same file (!)
> 
> So guys, please keep "[0/x]" mails short and only as a non-technical
> introduction of the patchset. All relevant discussion must come in the
> other mails that contain the *real* patches.

yup.

Actually, when I do the copying of the [0/n] text into [1/n]'s changelog
I could add text to the changelogs of [2/n] ... [n/n] mentioning that
more details are in the preceding changelog.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


iunique() fails to return ino_t (after commit 866b04fccbf125cd)

2007-09-16 Thread Satyam Sharma
Hi Jeff,

I think commit 866b04fccbf125cd39f2bdbcfeaa611d39a061a8 was wrong, and
introduced a regression.

The "relevant" changelog [*] of that patch says:


>  on filesystems w/o permanent inode numbers, i_ino values can be larger
>  than 32 bits, which can cause problems for some 32 bit userspace programs
>  on a 64 bit kernel.  We can't do anything for filesystems that have
>  actual >32-bit inode numbers, but on filesystems that generate i_ino
>  values on the fly, we should try to have them fit in 32 bits.  We could
>  trivially fix this by making the static counters in new_inode and iunique
>  32 bits [...]
>
> [...]
> When a 32-bit program that was not compiled with large file offsets does a
> stat and gets a st_ino value back that won't fit in the 32 bit field, glibc
> (correctly) generates an EOVERFLOW error.  We can't do anything about fs's
> with larger permanent inode numbers, but when we generate them on the fly,
> we ought to try and have them fit within a 32 bit field.
>
> This patch takes the first step toward this by making the static counters
> in these two functions be 32 bits.


1. First and foremost, there was nothing wrong with the existing code that
   needed to be "fixed" at all, i.e. there was no "problem" to be solved in
   the first place. As was said, glibc *correctly* returns EOVERFLOW when a
   userspace application built *without* _FILE_OFFSET_BITS == 64 (i.e.
   ino_t != __ino64_t) tries to stat(2) a file whose serial number does not
   fit in the "st_ino" member of struct stat. This behaviour is (1) correct,
   (2) explicitly mandated by the Single UNIX Specification, and, (3) all
   userspace programs *must* be prepared to deal with it. [ Should probably
   mention here that other implementations, such as Solaris, do conform with
   SUS here. ]

2. The patch has nothing to do with "32-bit userspace on 64-bit kernels" or
   compatibility modes whatsoever. It is unrelated and tangential that this
   behaviour is easy to reproduce on the above mentioned setup. Simply put,
   the issue is that a userspace program built *without* LFS tried to
   stat(2) a file with serial number > 2**32. Needless to say, this issue
   must be solved in the userspace program itself (either by (1) defining
   LFS, or, (2) making it aware of EOVERFLOW), and not in the kernel.

3. Solving a problem at a place where it does not exist naturally leads to
   other breakage. After 866b04fccbf125cd, iunique() no longer returns an
   ino_t, but only values <= 2**32, which limits the number of inodes on
   all filesystems that use that function. Needless to say, this is a
   *regression* w.r.t. previous kernels before that commit went in.

4. Last but not the least, the sample testcase program that was discussed
   on LKML last year to show this "problem" was buggy and wrong. A program
   built without LFS will also suffer EOVERFLOW when stat(2)'ing a file
   due to other reasons, such as filesize not fitting inside the "st_size"
   member. Do we propose to "fix" that "problem" in the kernel too ?
   Of course not!


IMHO it's bad to change the kernel's behaviour to avoid buggy userspace
programs from seeing standard-mandated errors being returned from stat(2).

So please reconsider that patch -- IMHO it clearly wasn't correct.


Satyam


[*] BTW, the changelog/patch description of this commit demonstrates
why it is a Bad Thing (tm) to have lengthy [PATCH 0/x] kind of mails
(containing important technical details) preceding a patchset.

I can only guess as to what happened, but reading the archives of the
original submission of this patchset on LKML, I think Andrew had to
append the contents of the [0/3] mail to the git commit of the [1/3]
patch (so as to not lose all those details and ensure that they got saved
in the git history), with the result that most of the changelog of commit
866b04fccbf1 has nothing to do with that particular patch at all, but
instead with other commits, that do not even touch that same file (!)

So guys, please keep "[0/x]" mails short and only as a non-technical
introduction of the patchset. All relevant discussion must come in the
other mails that contain the *real* patches.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


iunique() fails to return ino_t (after commit 866b04fccbf125cd)

2007-09-16 Thread Satyam Sharma
Hi Jeff,

I think commit 866b04fccbf125cd39f2bdbcfeaa611d39a061a8 was wrong, and
introduced a regression.

The relevant changelog [*] of that patch says:


  on filesystems w/o permanent inode numbers, i_ino values can be larger
  than 32 bits, which can cause problems for some 32 bit userspace programs
  on a 64 bit kernel.  We can't do anything for filesystems that have
  actual 32-bit inode numbers, but on filesystems that generate i_ino
  values on the fly, we should try to have them fit in 32 bits.  We could
  trivially fix this by making the static counters in new_inode and iunique
  32 bits [...]

 [...]
 When a 32-bit program that was not compiled with large file offsets does a
 stat and gets a st_ino value back that won't fit in the 32 bit field, glibc
 (correctly) generates an EOVERFLOW error.  We can't do anything about fs's
 with larger permanent inode numbers, but when we generate them on the fly,
 we ought to try and have them fit within a 32 bit field.

 This patch takes the first step toward this by making the static counters
 in these two functions be 32 bits.


1. First and foremost, there was nothing wrong with the existing code that
   needed to be fixed at all, i.e. there was no problem to be solved in
   the first place. As was said, glibc *correctly* returns EOVERFLOW when a
   userspace application built *without* _FILE_OFFSET_BITS == 64 (i.e.
   ino_t != __ino64_t) tries to stat(2) a file whose serial number does not
   fit in the st_ino member of struct stat. This behaviour is (1) correct,
   (2) explicitly mandated by the Single UNIX Specification, and, (3) all
   userspace programs *must* be prepared to deal with it. [ Should probably
   mention here that other implementations, such as Solaris, do conform with
   SUS here. ]

2. The patch has nothing to do with 32-bit userspace on 64-bit kernels or
   compatibility modes whatsoever. It is unrelated and tangential that this
   behaviour is easy to reproduce on the above mentioned setup. Simply put,
   the issue is that a userspace program built *without* LFS tried to
   stat(2) a file with serial number  2**32. Needless to say, this issue
   must be solved in the userspace program itself (either by (1) defining
   LFS, or, (2) making it aware of EOVERFLOW), and not in the kernel.

3. Solving a problem at a place where it does not exist naturally leads to
   other breakage. After 866b04fccbf125cd, iunique() no longer returns an
   ino_t, but only values = 2**32, which limits the number of inodes on
   all filesystems that use that function. Needless to say, this is a
   *regression* w.r.t. previous kernels before that commit went in.

4. Last but not the least, the sample testcase program that was discussed
   on LKML last year to show this problem was buggy and wrong. A program
   built without LFS will also suffer EOVERFLOW when stat(2)'ing a file
   due to other reasons, such as filesize not fitting inside the st_size
   member. Do we propose to fix that problem in the kernel too ?
   Of course not!


IMHO it's bad to change the kernel's behaviour to avoid buggy userspace
programs from seeing standard-mandated errors being returned from stat(2).

So please reconsider that patch -- IMHO it clearly wasn't correct.


Satyam


[*] BTW, the changelog/patch description of this commit demonstrates
why it is a Bad Thing (tm) to have lengthy [PATCH 0/x] kind of mails
(containing important technical details) preceding a patchset.

I can only guess as to what happened, but reading the archives of the
original submission of this patchset on LKML, I think Andrew had to
append the contents of the [0/3] mail to the git commit of the [1/3]
patch (so as to not lose all those details and ensure that they got saved
in the git history), with the result that most of the changelog of commit
866b04fccbf1 has nothing to do with that particular patch at all, but
instead with other commits, that do not even touch that same file (!)

So guys, please keep [0/x] mails short and only as a non-technical
introduction of the patchset. All relevant discussion must come in the
other mails that contain the *real* patches.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iunique() fails to return ino_t (after commit 866b04fccbf125cd)

2007-09-16 Thread Andrew Morton
On Mon, 17 Sep 2007 00:58:54 +0530 Satyam Sharma [EMAIL PROTECTED] wrote:

 [*] BTW, the changelog/patch description of this commit demonstrates
 why it is a Bad Thing (tm) to have lengthy [PATCH 0/x] kind of mails
 (containing important technical details) preceding a patchset.
 
 I can only guess as to what happened, but reading the archives of the
 original submission of this patchset on LKML, I think Andrew had to
 append the contents of the [0/3] mail to the git commit of the [1/3]
 patch (so as to not lose all those details and ensure that they got saved
 in the git history), with the result that most of the changelog of commit
 866b04fccbf1 has nothing to do with that particular patch at all, but
 instead with other commits, that do not even touch that same file (!)
 
 So guys, please keep [0/x] mails short and only as a non-technical
 introduction of the patchset. All relevant discussion must come in the
 other mails that contain the *real* patches.

yup.

Actually, when I do the copying of the [0/n] text into [1/n]'s changelog
I could add text to the changelogs of [2/n] ... [n/n] mentioning that
more details are in the preceding changelog.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iunique() fails to return ino_t (after commit 866b04fccbf125cd)

2007-09-16 Thread Jeff Layton
On Mon, 17 Sep 2007 00:58:54 +0530
Satyam Sharma [EMAIL PROTECTED] wrote:

 Hi Jeff,
 
 I think commit 866b04fccbf125cd39f2bdbcfeaa611d39a061a8 was wrong, and
 introduced a regression.
 
 The relevant changelog [*] of that patch says:
 
 
   on filesystems w/o permanent inode numbers, i_ino values can be larger
   than 32 bits, which can cause problems for some 32 bit userspace programs
   on a 64 bit kernel.  We can't do anything for filesystems that have
   actual 32-bit inode numbers, but on filesystems that generate i_ino
   values on the fly, we should try to have them fit in 32 bits.  We could
   trivially fix this by making the static counters in new_inode and iunique
   32 bits [...]
 
  [...]
  When a 32-bit program that was not compiled with large file offsets does a
  stat and gets a st_ino value back that won't fit in the 32 bit field, glibc
  (correctly) generates an EOVERFLOW error.  We can't do anything about fs's
  with larger permanent inode numbers, but when we generate them on the fly,
  we ought to try and have them fit within a 32 bit field.
 
  This patch takes the first step toward this by making the static counters
  in these two functions be 32 bits.
 
 
 1. First and foremost, there was nothing wrong with the existing code that
needed to be fixed at all, i.e. there was no problem to be solved in
the first place. As was said, glibc *correctly* returns EOVERFLOW when a
userspace application built *without* _FILE_OFFSET_BITS == 64 (i.e.
ino_t != __ino64_t) tries to stat(2) a file whose serial number does not
fit in the st_ino member of struct stat. This behaviour is (1) correct,
(2) explicitly mandated by the Single UNIX Specification, and, (3) all
userspace programs *must* be prepared to deal with it. [ Should probably
mention here that other implementations, such as Solaris, do conform with
SUS here. ]
 

The ideal situation is that everyone would recompile their programs
with LFS defines. Unfortunately people have old userspace programs to
which they don't have sources, or that can't easily be recompiled this
way. These programs would occasionally break when run on 64 bit kernels
for the reasons I've described.

 2. The patch has nothing to do with 32-bit userspace on 64-bit kernels or
compatibility modes whatsoever. It is unrelated and tangential that this
behaviour is easy to reproduce on the above mentioned setup. Simply put,
the issue is that a userspace program built *without* LFS tried to
stat(2) a file with serial number  2**32. Needless to say, this issue
must be solved in the userspace program itself (either by (1) defining
LFS, or, (2) making it aware of EOVERFLOW), and not in the kernel.
 

It most certainly does have something to do with 32 bit userspace on 64
bit kernels. On a 32 bit kernel, new_inode and iunique generate no
inode numbers 32 bits. On a 64 bit kernel, they do. This means that
programs that are built this way may eventually fail on a 64 bit kernel
when the inode counter grows large enough. Those programs will work
indefinitely on a 32 bit kernel.

 3. Solving a problem at a place where it does not exist naturally leads to
other breakage. After 866b04fccbf125cd, iunique() no longer returns an
ino_t, but only values = 2**32, which limits the number of inodes on
all filesystems that use that function. Needless to say, this is a
*regression* w.r.t. previous kernels before that commit went in.
 

Why is this a problem? Filesystems that truly need that many more inodes
are certainly able to generate one using a different scheme. Typically,
the inode numbers generated by iunique and new_inode are only used by
filesystems that have no permanent inode numbers of their own. In many
cases, inode number uniqueness isn't even an issue as evidenced by the
number of filesystems that simply use the number assigned by new_inode. 

This patch seems like a reasonable compromise to me. It allows us to
keep these inode numbers below 32 bits on filesystems that don't care
too much about what inode number they're using.

 4. Last but not the least, the sample testcase program that was discussed
on LKML last year to show this problem was buggy and wrong. A program
built without LFS will also suffer EOVERFLOW when stat(2)'ing a file
due to other reasons, such as filesize not fitting inside the st_size
member. Do we propose to fix that problem in the kernel too ?
Of course not!
 

Right, but that situation is the same regardless of whether you run it
on a 32 or 64 bit kernel. The issue of inode numbers generated by
new_inode and iunique crossing the 32-bit boundary, however, is not.

Can you elaborate why this testcase was buggy and wrong? It seems to me
that it correctly demonstrated the issue. Just because there are other
reasons that a program might get an EOVERFLOW doesn't mean that that one
is invalid.

 
 IMHO it's bad to change the kernel's behaviour to avoid buggy userspace
 

Re: iunique() fails to return ino_t (after commit 866b04fccbf125cd)

2007-09-16 Thread Satyam Sharma
On 9/17/07, Jeff Layton [EMAIL PROTECTED] wrote:
 On Mon, 17 Sep 2007 00:58:54 +0530
 Satyam Sharma [EMAIL PROTECTED] wrote:

  Hi Jeff,
 
  I think commit 866b04fccbf125cd39f2bdbcfeaa611d39a061a8 was wrong, and
  introduced a regression.
 
  The relevant changelog [*] of that patch says:
 
 
on filesystems w/o permanent inode numbers, i_ino values can be larger
than 32 bits, which can cause problems for some 32 bit userspace programs
on a 64 bit kernel.  We can't do anything for filesystems that have
actual 32-bit inode numbers, but on filesystems that generate i_ino
values on the fly, we should try to have them fit in 32 bits.  We could
trivially fix this by making the static counters in new_inode and iunique
32 bits [...]
  
   [...]
   When a 32-bit program that was not compiled with large file offsets does a
   stat and gets a st_ino value back that won't fit in the 32 bit field, 
   glibc
   (correctly) generates an EOVERFLOW error.  We can't do anything about fs's
   with larger permanent inode numbers, but when we generate them on the fly,
   we ought to try and have them fit within a 32 bit field.
  
   This patch takes the first step toward this by making the static counters
   in these two functions be 32 bits.
 
 
  1. First and foremost, there was nothing wrong with the existing code that
 needed to be fixed at all, i.e. there was no problem to be solved in
 the first place. As was said, glibc *correctly* returns EOVERFLOW when a
 userspace application built *without* _FILE_OFFSET_BITS == 64 (i.e.
 ino_t != __ino64_t) tries to stat(2) a file whose serial number does not
 fit in the st_ino member of struct stat. This behaviour is (1) correct,
 (2) explicitly mandated by the Single UNIX Specification, and, (3) all
 userspace programs *must* be prepared to deal with it. [ Should probably
 mention here that other implementations, such as Solaris, do conform with
 SUS here. ]
 

 The ideal situation is that everyone would recompile their programs
 with LFS defines. Unfortunately people have old userspace programs to
 which they don't have sources, or that can't easily be recompiled this way.
 These programs would occasionally break when run on 64 bit kernels
 for the reasons I've described.

That's a bad excuse! Such userspace programs are buggy, period.
Let's fix *them*. And, seriously, are you *really* talking of supporting
userspace programs whose even *sources* are no longer available ?

The standard is clear and simple -- calls from userspace programs (that
don't define LFS) to stat(2) on a file whose serial number is 2**32 must
see EOVERFLOW. This is *not* a kernel problem that needs fixing.

Moreover, changing a kernel function such as iunique() (which was expressly
*written* to be used by filesystems to assign ino_t to inodes) to return only
values = 2**32 to satisfy such buggy programs is just ... bad.

BTW, you might've as well changed the type of res in iunique() in that
patch to unsigned int too. What's the point of declaring it ino_t if we
never assign it any value in the (ino_t - unsigned int) set in the first place?


  2. The patch has nothing to do with 32-bit userspace on 64-bit kernels or
 compatibility modes whatsoever. It is unrelated and tangential that this
 behaviour is easy to reproduce on the above mentioned setup. Simply put,
 the issue is that a userspace program built *without* LFS tried to
 stat(2) a file with serial number  2**32. Needless to say, this issue
 must be solved in the userspace program itself (either by (1) defining
 LFS, or, (2) making it aware of EOVERFLOW), and not in the kernel.
 

 It most certainly does have something to do with 32 bit userspace on 64
 bit kernels.

No, it doesn't ...

 On a 32 bit kernel, new_inode and iunique generate no
 inode numbers 32 bits. On a 64 bit kernel, they do.

This is *unrelated*. It's completely immaterial how an inode managed to
get a serial number  2**32. Whether it used iunique() running on a 64-bit
kernel, or it's own little implementation, or whatever. The *real* issue is with
the *userspace program* and not the kernel ... that's the whole point.

[ Also, you're assuming sizeof(long) == sizeof(int) for 32-bit kernels
  here, but okay, probably that's true for all supported targets ... ]


 This means that
 programs that are built this way may eventually fail on a 64 bit kernel
 when the inode counter grows large enough. Those programs will work
 indefinitely on a 32 bit kernel.

See below, for why such programs are buggy ...


  3. Solving a problem at a place where it does not exist naturally leads to
 other breakage. After 866b04fccbf125cd, iunique() no longer returns an
 ino_t, but only values = 2**32, which limits the number of inodes on
 all filesystems that use that function. Needless to say, this is a
 *regression* w.r.t. previous kernels before that commit went in.
 

 Why is this a problem?