Read/write counts

2007-06-04 Thread David H. Lynch Jr.

I have a file system that has really odd blocking.

All files have a variable length header (basically a directory
entry) at their start.
Most but not all sectors, have a small fixed length signature as
well as some link data at their start.

The net result is that implimentation would be simpler if I could
just read/write, the amount of data
that can be done with the least amount of work, even if that is less
than was requested.

If I receive a request to read 512 bytes, and I return that I have
read 486, is either the OS, libc, or something else
going to treat that as an error, or are they coming back for the
rest in a subsequent call ?

I though I recalled that read()/write() returning a cound less than
requested is not an error.
   
   
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Read/write counts

2007-06-04 Thread Andreas Dilger
On Jun 04, 2007  06:20 -0400, David H. Lynch Jr. wrote:
 The net result is that implimentation would be simpler if I could
 just read/write, the amount of data that can be done with the least
 amount of work, even if that is less than was requested.
 
 If I receive a request to read 512 bytes, and I return that I have read
 486, is either the OS, libc, or something else going to treat that as an
 error, or are they coming back for the rest in a subsequent call ?
 
 I though I recalled that read()/write() returning a cound less than
 requested is not an error.

It is not strictly an error to read/write less than the requested amount,
but you will find that a lot of applications don't handle this correctly.
They will assume that if the amount read/written is != amount requested
that this is an error.  Of course the opposite is also true - some
applications assume that the amount requested == amount read/written and
don't even check whether that is actually the case or not.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Read/write counts

2007-06-04 Thread Bryan Henderson
It is not strictly an error to read/write less than the requested amount,
but you will find that a lot of applications don't handle this correctly.

I'd give it  a slightly different nuance.  It's not an error, and it's a 
reasonable thing to do, but there is value in not doing it.  POSIX and its 
predecessors back to the beginning of Unix say read()/write() don't have 
to transfer the full count (they must transfer at least one byte).  The 
main reason for this choice is that it may require more resources (e.g.  a 
memory buffer) than the system can allocate to do the whole request at 
once.

Programs that assume a full transfer are fairly common, but are 
universally regarded as either broken or just lazy, and when it does cause 
a problem, it is far more common to fix the application than the kernel.

Most application programs access files via libc's fread/fwrite, which 
don't have partial transfers.  GNU libc does handle partial (kernel) reads 
and writes correctly.  I'd be surprised if someone can name a major 
application that doesn't.

--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Read/write counts

2007-06-04 Thread Matthew Wilcox
On Mon, Jun 04, 2007 at 09:56:07AM -0700, Bryan Henderson wrote:
 Programs that assume a full transfer are fairly common, but are 
 universally regarded as either broken or just lazy, and when it does cause 
 a problem, it is far more common to fix the application than the kernel.

Linus has explicitly forbidden short reads from being returned.  The
original poster may get away with it for a specialised case, but for
example, signals may not cause a return to userspace with a short read
for exactly this reason.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Read/write counts

2007-06-04 Thread Theodore Tso
On Mon, Jun 04, 2007 at 11:02:23AM -0600, Matthew Wilcox wrote:
 On Mon, Jun 04, 2007 at 09:56:07AM -0700, Bryan Henderson wrote:
  Programs that assume a full transfer are fairly common, but are 
  universally regarded as either broken or just lazy, and when it does cause 
  a problem, it is far more common to fix the application than the kernel.
 
 Linus has explicitly forbidden short reads from being returned.  The
 original poster may get away with it for a specialised case, but for
 example, signals may not cause a return to userspace with a short read
 for exactly this reason.

Hmm, I'm not sure I would go that far.  Per the POSIX specification,
we support the optional BSD-style restartable system calls for signals
which will avoid short reads; but this is only true if SA_RESTART is
passed to sigaction().  Without SA_RESTART, we will indeed return
short reads, as required by POSIX.

I don't think Linus has said that short reads are always evil; I
certainly can't remember him ever making that statement.  Do you have
a pointer to a LKML message where he's said that?

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Read/write counts

2007-06-04 Thread Roman Zippel
Hi,

On Mon, 4 Jun 2007, Theodore Tso wrote:

 Hmm, I'm not sure I would go that far.  Per the POSIX specification,
 we support the optional BSD-style restartable system calls for signals
 which will avoid short reads; but this is only true if SA_RESTART is
 passed to sigaction().  Without SA_RESTART, we will indeed return
 short reads, as required by POSIX.
 
 I don't think Linus has said that short reads are always evil; I
 certainly can't remember him ever making that statement.  Do you have
 a pointer to a LKML message where he's said that?

That's the last discussion about signals and I/O I can remember:
http://www.ussg.iu.edu/hypermail/linux/kernel/0208.0/0188.html

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Read/write counts

2007-06-04 Thread Joel Becker
On Mon, Jun 04, 2007 at 08:57:16PM +0200, Roman Zippel wrote:
 On Mon, 4 Jun 2007, Theodore Tso wrote:
 
  Hmm, I'm not sure I would go that far.  Per the POSIX specification,
  we support the optional BSD-style restartable system calls for signals
  which will avoid short reads; but this is only true if SA_RESTART is
  passed to sigaction().  Without SA_RESTART, we will indeed return
  short reads, as required by POSIX.
  
  I don't think Linus has said that short reads are always evil; I
  certainly can't remember him ever making that statement.  Do you have
  a pointer to a LKML message where he's said that?
 
 That's the last discussion about signals and I/O I can remember:
 http://www.ussg.iu.edu/hypermail/linux/kernel/0208.0/0188.html

He said 'disk read', not 'read(2)'.  I'd expect he means certain
things like stat(2) and readdir(2) when they have to go to disk.
read(2) explicitly lists EINTR as a valid result, and often folks use
signals to interrupt read(2).  The world certainly writes programs
to expect short read(2).

Joel

-- 

Gone to plant a weeping willow
 On the bank's green edge it will roll, roll, roll.
 Sing a lulaby beside the waters.
 Lovers come and go, the river roll, roll, rolls.

Joel Becker
Principal Software Developer
Oracle
E-mail: [EMAIL PROTECTED]
Phone: (650) 506-8127
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Read/write counts

2007-06-04 Thread Theodore Tso
On Mon, Jun 04, 2007 at 08:57:16PM +0200, Roman Zippel wrote:
 That's the last discussion about signals and I/O I can remember:
 http://www.ussg.iu.edu/hypermail/linux/kernel/0208.0/0188.html

Well, I think Linus was saying that we have to do both (where the
signal interrupts and where it doesn't), and I agree with that:

  There are enough reasons to discourage people from using uninterruptible
  sleep (this f*cking application won't die when the network goes down)
  that I don't think this is an issue. We need to handle both cases, and
   ^
  while we can expand on the two cases we have now, we can't remove them. 
  ^^^

Fortunately, although the -ERESTARTSYS framework is a little awkward
(and people can shoot arrows at me for creating it 15 year ago :-), we
do have a way of supporting both styles without _too_ much pain.

- Ted

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html