subject:"Why side\-effects on open\(2\) are evil. \(was Re\: \[RFD"

Re: [reiserfs-list] Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-31 Thread Hans Reiser


Daniel Phillips wrote:

> Graciously accepted.  Coming up with something sensible in a mere 6
> months would be a minor miracle. ;-)
> 
> - what happens if the user forgets to close the transaction?

then the user has branched into his own version, or at least that would be my
take on it.  Another possible method is to expire transactions by persons who
lack permission to keep them open indefinitely.  I suppose one could expire them
to abort, or expire them to commit, both being valid under some circumstances.  


> 
>I plan to set a checkpoint there (because the transaction got
>too big) and log the fact that it's open.
> 
> - issues of lock/transaction duration
> 
>Once again relying on checkpoints, when the transaction gets
>uncomfortably big for cache, set a checkpoint.  I haven't thought
>about locks
> 
> - transaction batching
> 
>1) Explicit transaction batch close 2) Cache gets past a certain
>fullness.  In both cases, no new transactions are allowed to start
>and as soon as all current ones are closed we close the batch.re6;
> 
> - of levels of isolation
> - concurrent transactions modifying global fs metadata
>and some but not all of those concurrent transactions receiving a
>rollback
> 
>First I was going to write 'huh?' here, then I realized you're
>talking about real database ops, not just filesystem ops.  I had
>in mind something more modest: transactions are 'mv', 'read/write'
>(if the 'atomic read/write' is set), other filesystem operations I've
>forgotten, and anything the user puts between open_xact and
>close_xact.  You are raising the ante a little ;-)
> 
>In my case (Tux2) I could do an efficient rollback to the beginning
>   of the batch (phase), then I would have had to have kept an
>in-memory log of the transactions for selective replay.  With a
>journal log you can obviously do the same thing, but perhaps more
>efficiently if your journal design supports undo/redo.
> 
>The above is a pure flight of fancy, we won't be seeing anything
>so fancy as an API across filesystems.

It is just a matter of time, and we will.  I think that the major release AFTER
2.6 will see it.  First we have to get a prototype done in time for 2.6

> 
> - permissions relating to keeping transactions open.
>We can see this one in the light of a simple filesystem
>transaction: what happens if we are in the middle of a mv and
>someone changes the permissions?  Go with the starting or
>ending permissions?
> 
> Well, the database side of this is really interesting, but to get
> something generic across filesystems, the scope pretty well has to be
> limited to journal-type transactions, don't you think?

don't know what a journal-type transaction is and how it differs from a database
transaction.

> 
> --
> Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-29 Thread Horst von Brand


Daniel Phillips <[EMAIL PROTECTED]> said:
> On Monday 28 May 2001 03:26, Horst von Brand wrote:
> > Daniel Phillips <[EMAIL PROTECTED]> said:
> > > On Sunday 27 May 2001 15:32, Edgar Toernig wrote:
> >
> > [...]
> >
> > > > you break UNIX fundamentals.  But I'm quite relieved now because
> > > > I'm pretty sure that something like that will never go into the
> > > > kernel.
> > >
> > > OK, I'll take that as "I couldn't find a piece of code that breaks,
> > > so it's on to the legal issues".
> >
> > It boggles my (perhaps underdeveloped) mind to have things that are
> > files _and_ directories at the same time.

> They are not, the device file and the directory are different objects 
> that have the same name.  In C, "foo" and "struct foo" can appear in 
> the same scope but they are different objects.  This must have seemed 
> to be a strange idea at first.  Here we have "foo" (a device) and 
> "directory foo" (the device's properties).

They have the exact same name, how is anybody going to distinguish them?

> When I first saw Linus mention the idea I did a double-take, I thought 
> it was a strange idea and my first reaction was, it would break all 
> kinds of things.  But when I started examining cases I was unable to 
> find any real problems.  When I asked code examples of breakage none of 
> the supplied examples survived scrutiny.  Then, when I looked through 
> SUS I didn't find any prohibition.

I isn't allowed either...

> > The last time this was
> > discussed was for handling forks (a la Mac et al) in files, and it
> > was shot down.
> 
> Do you have the subject line?  It might save us some time ;-)

Nope, sorry.

> I seem to recall that the fork idea died because it was thought to 
> require changes to userspace programs such as tar and find.  The 
> magicdev idea doesn't require such changes, none that I've seen so far.

tar(1) of /dev should blow up in exactly the same way, AFAICS...

Everybody just knows a device is a device, a file is a file, and a
directory is a directory. Standards notwithstanding, this is how things
work, and have worked for a _long_ time; with absolutely no warning that
the assumption might become wrong sometime or be wrong on some strange
beast (you didn't find anything in your search). I'd suspect nobody
bothered to cast this in stone because nobody even considered such a
twisted possibility.

Take it up with somebody on the standards commitees, they (should) have
looked long and hard at the nooks and cranies in the standard, and so are
in a better position to comment than we here are.
-- 
Dr. Horst H. von Brand   mailto:[EMAIL PROTECTED]
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria  +56 32 654239
Casilla 110-V, Valparaiso, ChileFax:  +56 32 797513
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-27 Thread Marko Kreen

On Sun, May 27, 2001 at 10:45:17PM +0200, Daniel Phillips wrote:
> On Sunday 27 May 2001 15:32, Edgar Toernig wrote:
> > Daniel Phillips wrote:
> > > I'm not claiming there isn't breakage somewhere,
> >
> > you break UNIX fundamentals.  But I'm quite relieved now because I'm
> > pretty sure that something like that will never go into the kernel.
> 
> OK, I'll take that as "I couldn't find a piece of code that breaks, so 
> it's on to the legal issues".
> 
> SUS doesn't seem to have a lot to say about this.  The nearest thing to 
> a ruling I found was "The special filename dot refers to the directory 
> specified by its predecessor".  Which is not the same thing as:
> 
>open("foo", O_RDONLY) == open ("foo/.", O_RDONLY)
> 
> I don't know about POSIX (I don't have it: a pox on standards 
> organizations that don't make their standards freely available) but SUS 
> doesn't seem to forbid this.

My question is: Is it needed?  You are advocating quite
non-obvious behaviour on a UNIX-like fs.  Cant the end result
achieved in more obvious manner?

I see at most 3 types of magic files:

1) regular file - nothing special.  Whether it has CHR/BLK set
   or not is irrelevant.

2) file with subdevs.  As 1) but you can acces dev/something
   for subdev 'something'.  Permissions should be probably taken
   from 'dev'.  Ofcourse you cant do 'ls' on the thing.

3) magicdev as directory.  Act as ordinary directory.  Only
   reason is to group devices.

And all those should be manageable by devfsd, so you can tell
devfsd to take subdev and create it as file somewhere else.
So 2) and 3) are more like 'defaults'.

So: is there additional type required with non-obvious file/dir
behaviour mix?

-- 
marko

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-27 Thread Daniel Phillips

On Sunday 27 May 2001 15:32, Edgar Toernig wrote:
> Daniel Phillips wrote:
> > It won't, the open for "." is handled in the VFS, not the
> > filesystem - it will open the directory.  (Without needing to be
> > told it's a directory via O_DIRECTORY.)  If you do open("magicdev")
> > you'll get the device, because that's handled by magicdevfs.
>
> You really mean that "magicdev" is a directory and:
>
>   open("magicdev/.", O_RDONLY);
>   open("magicdev", O_RDONLY);
>
> would both succeed but open different objects?

Yes, and:

open("magicdev/.", O_RDONLY | O_DIRECTORY);
open("magicdev", O_RDONLY | O_DIRECTORY);

will both succeed and open the same object.

> > I'm not claiming there isn't breakage somewhere,
>
> you break UNIX fundamentals.  But I'm quite relieved now because I'm
> pretty sure that something like that will never go into the kernel.

OK, I'll take that as "I couldn't find a piece of code that breaks, so 
it's on to the legal issues".

SUS doesn't seem to have a lot to say about this.  The nearest thing to 
a ruling I found was "The special filename dot refers to the directory 
specified by its predecessor".  Which is not the same thing as:

   open("foo", O_RDONLY) == open ("foo/.", O_RDONLY)

I don't know about POSIX (I don't have it: a pox on standards 
organizations that don't make their standards freely available) but SUS 
doesn't seem to forbid this.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-27 Thread Edgar Toernig

Daniel Phillips wrote:
> 
> It won't, the open for "." is handled in the VFS, not the filesystem -
> it will open the directory.  (Without needing to be told it's a
> directory via O_DIRECTORY.)  If you do open("magicdev") you'll get the
> device, because that's handled by magicdevfs.

You really mean that "magicdev" is a directory and:

open("magicdev/.", O_RDONLY);
open("magicdev", O_RDONLY);

would both succeed but open different objects?

> I'm not claiming there isn't breakage somewhere,

you break UNIX fundamentals.  But I'm quite relieved now because I'm
pretty sure that something like that will never go into the kernel.

Ciao, ET.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-25 Thread Edgar Toernig


Daniel Phillips wrote:
> 
> Oops, oh wait, there's already another open point: your breakage
> examples both rely on opening ".".  You're right, "." should always be
> a directory and I believe that's enforced by the VFS.  So we don't have
> an example of breakage yet.

That's just because I did a simple "ls".  But it doesn't make a
difference.  The magicdevs _are_ directories and

chdir("magicdev");
open(".", O_RDONLY);

shouldn't open the device.

Ciao, ET.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-25 Thread Daniel Phillips

On Thursday 24 May 2001 22:59, Edgar Toernig wrote:
> Daniel Phillips wrote:
> > > > Readdir fills in a directory type, so ls sees it as a directory
> > > > and does the right thing.  On the other hand, we know we're on
> > > > a device filesystem so we will next open the name as a regular
> > > > file, and find ISCHR or ISBLK: good.
> > >
> > > ??? The kernel may know it, but the app?  Or do you really want
> > > to give different stat data on stat(2) and fstat(2)?  These flags
> > > are currently used by archive/backup prgs.  It's a hint that
> > > these files are not regular files and shouldn't be opened for
> > > reading. Having a 'd' would mean that they would really try to
> > > enter the directory and save it's contents.  Don't know what
> > > happens in this case to your "special" files ;-)
> >
> > I guess that's much like the question 'what happens in proc?'.
>
> And that's already bad enough.  Most of the "files" in proc should
> be fifos!  And using proc as an excuse to introduce another set of
> magic dirs?  No, thanks.

Wait a second, I thought proc was here to stay.  Wait another
second, device nodes are already magic.  Magic is magic, just
choose your color ;-)

This set of magic dirs is supposed to clean things up, not mess things 
up.  We already saw how the side-effects-on-open problem in ls -l goes 
away.  There's a much bigger problem I'd love to deal with: the 'no 
heirarchy can please everybody' problem.  In database terms, aheirarchy 
is an insufficiently general model for real-world problems, in other 
words, they never worked.  Tables work.  That's where I'm trying to go 
with this, so please bear with me.  This is not just a solution in 
search of a problem.

> > Correct me if I'm wrong, but what we learn from the proc example
> > is that tarring your whole source tree starting at / is not
> > something you want to do.
>
> IMHO it would be better to fix proc instead of adding more magic.  At
> the moment you have to exclude /proc.  You want to add /dev.

Well, actually no, ls -R, tar, zip, etc, work pretty well with the 
scheme I've described.

> And
> next? Exclude all $HOME/dev (in case process name spaces get added)? 
> Or make fifos magic too and add all of them to the exclude list?  But
> there's no central place for fifos.  So lets add more magic :-(

No, no, no, agreed and sometimes magic is good.  It's not deep magic.  
The only new thing here is the interpretation of the O_DIRECTORY flag, 
or rather, the lack of it.

> > What *won't* happen is, you won't get side effects from opening
> > your serial ports (you'd have to open them without O_DIRECTORY
> > to get that) so that seems like a little step forward.
>
> As already said: depending on O_DIRECTORY breaks POSIX compliance
> and that alone should kill this idea...

Thanks, two good points:
  - libc5 will get confused when doing ls in /magicdev
  - POSIX specifically forbids this

I'll put this away until I've specifically dug into both of them.  OK, 
over and out, thanks for your commentary.

/me peruses man pages

Oops, oh wait, there's already another open point: your breakage 
examples both rely on opening ".".  You're right, "." should always be 
a directory and I believe that's enforced by the VFS.  So we don't have 
an example of breakage yet.

--
Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-25 Thread Daniel Phillips

On Friday 25 May 2001 00:00, Hans Reiser wrote:
> Daniel Phillips wrote:
> > I suppose I'm just reiterating the obvious, but we should
> > eventually have a generic filesystem transaction API at the VFS
> > level, once we have enough data points to know what the One True
> > API should be.
>
> Daniel, implementing transactions is not a trivial thing as you
> probably know. It requires that you resolve such issues as, what
> happens if the user forgets to close the transaction, issues of
> lock/transaction duration, of transaction batching, of levels of
> isolation, of concurrent transactions modifying global fs metadata
> and some but not all of those concurrent transactions receiving a
> rollback, and of permissions relating to keeping transactions open. 
> I would encourage you to participate in the reiser4 design discussion
> we will be having over the next 6 months, and give us your opinions. 
> Josh will be leading that design effort for the ReiserFS team.

Graciously accepted.  Coming up with something sensible in a mere 6 
months would be a minor miracle. ;-)

- what happens if the user forgets to close the transaction?

   I plan to set a checkpoint there (because the transaction got
   too big) and log the fact that it's open.

- issues of lock/transaction duration

   Once again relying on checkpoints, when the transaction gets
   uncomfortably big for cache, set a checkpoint.  I haven't thought
   about locks

- transaction batching

   1) Explicit transaction batch close 2) Cache gets past a certain 
   fullness.  In both cases, no new transactions are allowed to start
   and as soon as all current ones are closed we close the batch.

- of levels of isolation
- concurrent transactions modifying global fs metadata
   and some but not all of those concurrent transactions receiving a
   rollback

   First I was going to write 'huh?' here, then I realized you're   
   talking about real database ops, not just filesystem ops.  I had
   in mind something more modest: transactions are 'mv', 'read/write'
   (if the 'atomic read/write' is set), other filesystem operations I've
   forgotten, and anything the user puts between open_xact and  
   close_xact.  You are raising the ante a little ;-)

   In my case (Tux2) I could do an efficient rollback to the beginning
  of the batch (phase), then I would have had to have kept an   
   in-memory log of the transactions for selective replay.  With a  
   journal log you can obviously do the same thing, but perhaps more
   efficiently if your journal design supports undo/redo.

   The above is a pure flight of fancy, we won't be seeing anything
   so fancy as an API across filesystems.

- permissions relating to keeping transactions open. 
   We can see this one in the light of a simple filesystem  
   transaction: what happens if we are in the middle of a mv and
   someone changes the permissions?  Go with the starting or
   ending permissions?

Well, the database side of this is really interesting, but to get 
something generic across filesystems, the scope pretty well has to be 
limited to journal-type transactions, don't you think?

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-25 Thread Daniel Phillips


On Thursday 24 May 2001 23:26, Alexander Viro wrote:
> On Thu, 24 May 2001, Edgar Toernig wrote:
> > > What *won't* happen is, you won't get side effects from opening
> > > your serial ports (you'd have to open them without O_DIRECTORY
> > > to get that) so that seems like a little step forward.
> >
> > As already said: depending on O_DIRECTORY breaks POSIX compliance
> > and that alone should kill this idea...
>
> What really kills that idea is the fact that you can trick
> applications into opening your serial ports _without_ O_DIRECTORY.

Err, I thought we already had that problem, but worse: an ordinary
ls -l will do it.  This way, we harmlessly list the device's properties 
instead.

--
Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-24 Thread Daniel Phillips


On Thursday 24 May 2001 16:39, Oliver Xymoron wrote:
> On Thu, 24 May 2001, Marko Kreen wrote:
> > On Thu, May 24, 2001 at 02:23:27AM +0200, Edgar Toernig wrote:
> > > Daniel Phillips wrote:
> > > > > > It's going to be marked 'd', it's a directory, not a file.
> > > > >
> > > > > Aha.  So you lose the S_ISCHR/BLK attribute.
> > > >
> > > > Readdir fills in a directory type, so ls sees it as a directory
> > > > and does the right thing.  On the other hand, we know we're on
> > > > a device filesystem so we will next open the name as a regular
> > > > file, and find ISCHR or ISBLK: good.
> > >
> > > ??? The kernel may know it, but the app?  Or do you really want
> > > to give different stat data on stat(2) and fstat(2)?  These flags
> > > are currently used by archive/backup prgs.  It's a hint that
> > > these files are not regular files and shouldn't be opened for
> > > reading. Having a 'd' would mean that they would really try to
> > > enter the directory and save it's contents.  Don't know what
> > > happens in this case to your "special" files ;-)
> >
> > IMHO the CHR/BLK is not needed.  Think of /proc.  In the future,
> > the backup tools will be told to ignore /dev, that's all.
>
> The /dev dir should not be special. At least not to the kernel. I
> have device files in places other than /dev, and you probably do too
> (hint: anonymous FTP).

True.  If we're using a special filesystem for devices we can express
the desired restriction in terms of 'don't back up this filesystem type'
or 'don't go outside the root filesystem'.

--
Daniel

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-24 Thread Hans Reiser

Daniel Phillips wrote:
> 
> On Tuesday 22 May 2001 22:10, Andreas Dilger wrote:
> > Peter Braam writes:
> > > File system journal recovery can corrupt a snapshot, because it
> > > copies data that needs to be preserved in a snapshot. During
> > > journal replay such data may be copied again, but the source can
> > > have new data already.
> >
> > The way it is implemented in reiserfs is to wait for existing
> > transactions to complete, entirely flush the journal and block all
> > new transactions from starting.  Stephen implemented a journal flush
> > API to do this for ext3, but the hooks to call it from LVM are not in
> > place yet.  This way the journal is totally empty at the time the
> > snapshot is done, so the read-only copy does not need to do journal
> > recovery, so no problems can arise.
> 
> I suppose I'm just reiterating the obvious, but we should eventually
> have a generic filesystem transaction API at the VFS level, once we
> have enough data points to know what the One True API should be.
> 
> --
> Daniel
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

Daniel, implementing transactions is not a trivial thing as you probably know. 
It requires that you resolve such issues as, what happens if the user forgets to
close the transaction, issues of lock/transaction duration, of transaction
batching, of levels of isolation, of concurrent transactions modifying global fs
metadata and some but not all of those concurrent transactions receiving a
rollback, and of permissions relating to keeping transactions open.  I would
encourage you to participate in the reiser4 design discussion we will be having
over the next 6 months, and give us your opinions.  Josh will be leading that
design effort for the ReiserFS team.

Hans
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-24 Thread Alexander Viro




On Thu, 24 May 2001, Edgar Toernig wrote:

> > What *won't* happen is, you won't get side effects from opening
> > your serial ports (you'd have to open them without O_DIRECTORY
> > to get that) so that seems like a little step forward.
> 
> As already said: depending on O_DIRECTORY breaks POSIX compliance
> and that alone should kill this idea...

What really kills that idea is the fact that you can trick applications
into opening your serial ports _without_ O_DIRECTORY.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-24 Thread Edgar Toernig

Daniel Phillips wrote:
> 
> > > Readdir fills in a directory type, so ls sees it as a directory and
> > > does the right thing.  On the other hand, we know we're on a device
> > > filesystem so we will next open the name as a regular file, and
> > > find ISCHR or ISBLK: good.
> >
> > ??? The kernel may know it, but the app?  Or do you really want to
> > give different stat data on stat(2) and fstat(2)?  These flags are
> > currently used by archive/backup prgs.  It's a hint that these files
> > are not regular files and shouldn't be opened for reading.
> > Having a 'd' would mean that they would really try to enter the
> > directory and save it's contents.  Don't know what happens in this
> > case to your "special" files ;-)
> 
> I guess that's much like the question 'what happens in proc?'.

And that's already bad enough.  Most of the "files" in proc should
be fifos!  And using proc as an excuse to introduce another set of
magic dirs?  No, thanks.

> Correct me if I'm wrong, but what we learn from the proc example
> is that tarring your whole source tree starting at / is not something
> you want to do.

IMHO it would be better to fix proc instead of adding more magic.  At
the moment you have to exclude /proc.  You want to add /dev.  And next?
Exclude all $HOME/dev (in case process name spaces get added)?  Or make
fifos magic too and add all of them to the exclude list?  But there's
no central place for fifos.  So lets add more magic :-(

> What *won't* happen is, you won't get side effects from opening
> your serial ports (you'd have to open them without O_DIRECTORY
> to get that) so that seems like a little step forward.

As already said: depending on O_DIRECTORY breaks POSIX compliance
and that alone should kill this idea...

Over and out, ET.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-24 Thread Daniel Phillips


On Tuesday 22 May 2001 22:10, Andreas Dilger wrote:
> Peter Braam writes:
> > File system journal recovery can corrupt a snapshot, because it
> > copies data that needs to be preserved in a snapshot. During
> > journal replay such data may be copied again, but the source can
> > have new data already.
>
> The way it is implemented in reiserfs is to wait for existing
> transactions to complete, entirely flush the journal and block all
> new transactions from starting.  Stephen implemented a journal flush
> API to do this for ext3, but the hooks to call it from LVM are not in
> place yet.  This way the journal is totally empty at the time the
> snapshot is done, so the read-only copy does not need to do journal
> recovery, so no problems can arise.

I suppose I'm just reiterating the obvious, but we should eventually
have a generic filesystem transaction API at the VFS level, once we
have enough data points to know what the One True API should be.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]devicearguments from lookup)

2001-05-24 Thread Andreas Dilger

Malcolm Beattie writes:
> Andreas Dilger writes:
> > PS - I used to think shrinking a filesystem online was useful, but there
> >  are a huge amount of problems with this and very few real-life
> >  benefits, as long as you can at least do offline shrinking.  With
> >  proper LVM usage, the need to shrink a filesystem never really
> >  happens in practise, unlike the partition case where you always
> >  have to guess in advance how big a filesystem needs to be, and then
> >  add 10% for a safety margin.  With LVM you just create the minimal
> >  sized device you need now, and freely grow it in the future.
> 
> In an attempt to nudge you back towards your previous opinion: consider
> a system-wide spool or tmp filesystem. It would be nice to be able to
> add in a few extra volumes for a busy period but then shrink it down
> again when usage returns to normal. In the absence of the ability to
> shrink a live filesystem, storage management becomes a much harder job.
> You can't throw in a spare volume or two where it's needed without
> careful thought because you'll be ratchetting up the space on that one
> filesystem without being able to change your mind and reduce it again
> later. You'll end up with stingy storage admins who refuse to give you
> a bunch of extra filesystem space for a while because they can't get it
> back again afterwards.

I suppose it depends a bit on how your system is administered.  On LVM
systems, I tend to allocate new volumes for special situations like this.
When the special need is gone, you simply remove the whole thing.  Yes,
this is a bit of a hack for not having online shrinking, but I have not
really had a _big_ need to do that.

The only time I've really needed online shrinking is when someone
screwed up and made / or /var way too huge for some (bad) reason and
you can't unmount it conveniently.  Under AIX, you can't shrink JFS
even unmounted so it meant backup/restore.  Even so, having empty
space in a filesystem is not a reason to panic, while having no free
space in a filesystem _is_ a reason to panic, hence online growing
of ext2.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-24 Thread Daniel Phillips

On Thursday 24 May 2001 02:23, Edgar Toernig wrote:
> Daniel Phillips wrote:
> > > > It's going to be marked 'd', it's a directory, not a file.
> > >
> > > Aha.  So you lose the S_ISCHR/BLK attribute.
> >
> > Readdir fills in a directory type, so ls sees it as a directory and
> > does the right thing.  On the other hand, we know we're on a device
> > filesystem so we will next open the name as a regular file, and
> > find ISCHR or ISBLK: good.
>
> ??? The kernel may know it, but the app?  Or do you really want to
> give different stat data on stat(2) and fstat(2)?  These flags are
> currently used by archive/backup prgs.  It's a hint that these files
> are not regular files and shouldn't be opened for reading.
> Having a 'd' would mean that they would really try to enter the
> directory and save it's contents.  Don't know what happens in this
> case to your "special" files ;-)

I guess that's much like the question 'what happens in proc?'.

Recursively entering the device directory is ok as long as everything
inside it is ok.  I tried zipping /proc/bus -r and what I got is what I'd
expect if I'd cat'ed every non-directory entry.  This is what I
expected.  Maybe it's not right - zipping /proc/kcore is kind of
interesting.  Regardless, we are no worse than proc here.  In fact,
since we don't anticipate putting an elephant like kcore in as a
device property, we're a little nicer to get along with.

Correct me if I'm wrong, but what we learn from the proc example
is that tarring your whole source tree starting at / is not something
you want to do.  Just extend that idea to /dev - however, if you do
it, it will produce pretty reasonable results.

What *won't* happen is, you won't get side effects from opening
your serial ports (you'd have to open them without O_DIRECTORY
to get that) so that seems like a little step forward.

I'm still thinking about some of your other comments.

--
Daniel

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device

2001-05-24 Thread Albert D. Cahalan

Oliver Xymoron writes:

> The /dev dir should not be special. At least not to the kernel. I have
> device files in places other than /dev, and you probably do too (hint:
> anonymous FTP).

This is a horribly broken FTP server.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]devicearguments from lookup)

2001-05-24 Thread Oliver Xymoron


On Thu, 24 May 2001, Marko Kreen wrote:

> On Thu, May 24, 2001 at 02:23:27AM +0200, Edgar Toernig wrote:
> > Daniel Phillips wrote:
> > > > > It's going to be marked 'd', it's a directory, not a file.
> > > >
> > > > Aha.  So you lose the S_ISCHR/BLK attribute.
> > >
> > > Readdir fills in a directory type, so ls sees it as a directory and does
> > > the right thing.  On the other hand, we know we're on a device
> > > filesystem so we will next open the name as a regular file, and find
> > > ISCHR or ISBLK: good.
> >
> > ??? The kernel may know it, but the app?  Or do you really want to
> > give different stat data on stat(2) and fstat(2)?  These flags are
> > currently used by archive/backup prgs.  It's a hint that these files
> > are not regular files and shouldn't be opened for reading.
> > Having a 'd' would mean that they would really try to enter the
> > directory and save it's contents.  Don't know what happens in this
> > case to your "special" files ;-)
>
> IMHO the CHR/BLK is not needed.  Think of /proc.  In the future,
> the backup tools will be told to ignore /dev, that's all.

The /dev dir should not be special. At least not to the kernel. I have
device files in places other than /dev, and you probably do too (hint:
anonymous FTP).

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-24 Thread Malcolm Beattie

[cc list reduced]

Andreas Dilger writes:
> PS - I used to think shrinking a filesystem online was useful, but there
>  are a huge amount of problems with this and very few real-life
>  benefits, as long as you can at least do offline shrinking.  With
>  proper LVM usage, the need to shrink a filesystem never really
>  happens in practise, unlike the partition case where you always
>  have to guess in advance how big a filesystem needs to be, and then
>  add 10% for a safety margin.  With LVM you just create the minimal
>  sized device you need now, and freely grow it in the future.

In an attempt to nudge you back towards your previous opinion: consider
a system-wide spool or tmp filesystem. It would be nice to be able to
add in a few extra volumes for a busy period but then shrink it down
again when usage returns to normal. In the absence of the ability to
shrink a live filesystem, storage management becomes a much harder job.
You can't throw in a spare volume or two where it's needed without
careful thought because you'll be ratchetting up the space on that one
filesystem without being able to change your mind and reduce it again
later. You'll end up with stingy storage admins who refuse to give you
a bunch of extra filesystem space for a while because they can't get it
back again afterwards.

--Malcolm

-- 
Malcolm Beattie <[EMAIL PROTECTED]>
Unix Systems Programmer
Oxford University Computing Services
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]devicearguments from lookup)

2001-05-24 Thread Andreas Dilger

Linus writes:
> There are some strong arguments that we should have filesystem
> "backdoors" for maintenance purposes, including backup. 
> 
> You can, of course, so parts of this on a LVM level, and doing backups
> with "disk snapshots" may be a valid approach. However, even that is
> debatable: there is very little that says that the disk image has to be
> up-to-date at any particular point in time, so even with a disk snapshot
> capability (which is not necessarily reasonable under all circumstances)
> there are arguments for maintenance interfaces.

Actually, the LVM snapshot interface has (optional) hooks into the filesystem
to ensure that it is consistent at the time the snapshot is created.  For
most filesystems, it will call fsync_dev(dev) so that all buffers are written
to disk.  However, for journalled filesystems, LVM needs to write out the
journal and mark the filesystem clean because the snapshot is a read-only
block device.  In this case it calls fsync_dev_lockfs(dev) which will call
the write_super_lockfs() method for the filesystem (if it exists) which
tells the filesystem to flush the journal, block transactions, and mark the
filesystem clean until the unlockfs() method is called.

Reiserfs and XFS both use this to make consistent snapshots of the live
filesystem.  Unfortunately, XFS checks filesystem UUIDs at mount time,
which means you can't mount two copies of the same filesystem (even read-only).

> Things like "lazy fsck" (ie fsck while already running the filesystem) and
> defragmentation simply is not feasible on a LVM level.

Yes, with consistent LVM snapshots you can do fsck on the read-only copy.
In 99.9*% cases you will not detect any errors and you can continue.  If
you _do_ detect an error you probably want to stop everything and fix it
(fsck repairing an in-use filesystem is too twisted and dangerous, IMHO,
and a huge amount of effort for an extremely rare situation).

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]devicearguments from lookup)

2001-05-24 Thread Andreas Dilger

Peter Braam writes:
> On Tue, 22 May 2001, Andreas Dilger wrote:
> > Actually, the LVM snapshot
> > interface has (optional) hooks into the filesystem to ensure that it
> > is consistent at the time the snapshot is created.
> 
> File system journal recovery can corrupt a snapshot, because it copies
> data that needs to be preserved in a snapshot. During journal replay such
> data may be copied again, but the source can have new data already.

The way it is implemented in reiserfs is to wait for existing transactions
to complete, entirely flush the journal and block all new transactions from
starting.  Stephen implemented a journal flush API to do this for ext3, but
the hooks to call it from LVM are not in place yet.  This way the journal is
totally empty at the time the snapshot is done, so the read-only copy does
not need to do journal recovery, so no problems can arise.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]devicearguments from lookup)

2001-05-24 Thread Andreas Dilger

Jeff writes:
> Here's a dumb question, and I apologize if I am questioning computer
> science dogma...
> 
> Why are LVM and EVMS(competing LVM project) needed at all?
> 
> Surely the same can be accomplished with
> * md
> * snapshot blkdev (attached in previous e-mail)
> * giving partitions and blkdevs the ability to grow and shrink
> * giving filesystems the ability to grow and shrink
> 
> On-line optimization (defrag, etc) shouldn't be hard once you have the
> ability to move blocks and files around, which would come with the
> ability to grow and shrink blkdevs and fs's.

You're missing virtual->physical block mapping allowing you to move parts
of the device around, freedom from the need for contiguous disk space.

In the end, what you've described above is pretty much what LVM does (and
EVMS does better).  Having the various components inside a single layer
like EVMS gives you a lot move flexibility, IMHO.  You also don't have
the issue of wasted minor numbers for unused partitions, or too few minor
numbers in other cases.

For example, with MD RAID you still need devices of equal size to create
a RAID 1 mirror, or part of one device is wasted.  With EVMS you can (in
the future, or right now with AIX/HPUX LVM) do the RAID 1 mirroring on a
per-logical-extent basis and you get your physical extents from any device.
Because your virtual->physical mapping is already abstract, it also allows
you to add mirroring to any existing LVM device without interruption.

Cheers, Andreas

PS - I used to think shrinking a filesystem online was useful, but there
 are a huge amount of problems with this and very few real-life
 benefits, as long as you can at least do offline shrinking.  With
 proper LVM usage, the need to shrink a filesystem never really
 happens in practise, unlike the partition case where you always
 have to guess in advance how big a filesystem needs to be, and then
 add 10% for a safety margin.  With LVM you just create the minimal
 sized device you need now, and freely grow it in the future.
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-24 Thread Marko Kreen


On Thu, May 24, 2001 at 02:23:27AM +0200, Edgar Toernig wrote:
> Daniel Phillips wrote:
> > > > It's going to be marked 'd', it's a directory, not a file.
> > >
> > > Aha.  So you lose the S_ISCHR/BLK attribute.
> > 
> > Readdir fills in a directory type, so ls sees it as a directory and does
> > the right thing.  On the other hand, we know we're on a device
> > filesystem so we will next open the name as a regular file, and find
> > ISCHR or ISBLK: good.
> 
> ??? The kernel may know it, but the app?  Or do you really want to
> give different stat data on stat(2) and fstat(2)?  These flags are
> currently used by archive/backup prgs.  It's a hint that these files
> are not regular files and shouldn't be opened for reading.
> Having a 'd' would mean that they would really try to enter the
> directory and save it's contents.  Don't know what happens in this
> case to your "special" files ;-)

IMHO the CHR/BLK is not needed.  Think of /proc.  In the future,
the backup tools will be told to ignore /dev, that's all.

-- 
marko

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-23 Thread Edgar Toernig

Daniel Phillips wrote:
> On Wednesday 23 May 2001 06:19, Edgar Toernig wrote:
> > Daniel Phillips wrote:
> > > On Tuesday 22 May 2001 17:24, Oliver Xymoron wrote:
> > > > On Mon, 21 May 2001, Daniel Phillips wrote:
> > > > > On Monday 21 May 2001 19:16, Oliver Xymoron wrote:
> > > > > > What I'd like to see:
> > > > > >
> > > > > > - An interface for registering an array of related devices
> > > > > > (almost always two: raw and ctl) and their legacy device
> > > > > > numbers with a single userspace callout that does whatever
> > > > > > /dev/ creation needs to be done. Thus, naming and permissions
> > > > > > live in user space. No "device node is also a directory"
> > > > > > weirdness...
> > > > >
> > > > > Could you be specific about what is weird about it?
> > > >
> > > > *boggle*
> > > >
> > > >[general sense of unease]
> >
> > I fully agree with Oliver.  It's an abomination.
> 
> We are, or at least, I am, investigating this question purely on
> technical grounds - name calling is a noop.

Right.  But sometimes new ideas raise these kind of feelings ;)

> > > It's going to be marked 'd', it's a directory, not a file.
> >
> > Aha.  So you lose the S_ISCHR/BLK attribute.
> 
> Readdir fills in a directory type, so ls sees it as a directory and does
> the right thing.  On the other hand, we know we're on a device
> filesystem so we will next open the name as a regular file, and find
> ISCHR or ISBLK: good.

??? The kernel may know it, but the app?  Or do you really want to
give different stat data on stat(2) and fstat(2)?  These flags are
currently used by archive/backup prgs.  It's a hint that these files
are not regular files and shouldn't be opened for reading.
Having a 'd' would mean that they would really try to enter the
directory and save it's contents.  Don't know what happens in this
case to your "special" files ;-)

> The rule for this filesystem is: if you open with O_DIRECTORY then
> directory operations are permitted, nothing else.  If you open without
> O_DIRECTORY then directory operations are forbidden (as
> usual) and normal device semantics apply.

As usual?  I think you've just changed the rules for O_DIRECTORY.  Up
to now it's only a flag that tells open it should fail if the name
does not refer to a directory.  Nothing else.  It was introduced to
remove a race condition in user space applications.  Especially it
is optional - everything works the same whether you give the flag
or not (except the race avoidance of course).  And there are a lot
of programs that do not use O_DIRECTORY (it's a Linux private flag,
not even mentioned in POSIX).  Every program that does:

fd = open(foo, O_RDONLY);
fchdir(fd);
x = opendir(".")

will break.  And that is POSIX conform.  And I know that there are
programs that use this when recursively scanning directories (avoids
name mangling and repeated name lookups of the directory on later
stat calls).

> > Directories are not allowed to be read from/written to.  The VFS may
> > support it, but it's not (current) UNIX.
> 
> Here, we obey this rule: if you open it with O_DIRECTORY then you
> can't read from or write to it.

IMHO you've just invented opendir(2).

> Nothing breaks here, ls works as it always did.
> 
> This is what ls does:
> 
> open("foobar", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
> fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> fcntl64(0x3, 0x2, 0x1, 0x2) = -1 ENOSYS (Function not implemented)
> fcntl(3, F_SETFD, FD_CLOEXEC)   = 0
> brk(0x805b000)  = 0x805b000
> getdents64(0x3, 0x8058270, 0x1000, 0x26) = -1 ENOSYS (Function not implemented)
> getdents(3, /* 2 entries */, 2980)  = 28
> getdents(3, /* 0 entries */, 2980)  = 0
> close(3)= 0
> 
> Note that ls doesn't do anything as inconvenient as opening
> foobar as a normal file first, expecting that operation to fail.

Well, your ls does not work "as it always did".  Here's an strace of
my libc5 system ls:

open(".", O_RDONLY) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)   = 0
getdents(3, /* 64 entries */, 4096) = 1216
getdents(3, /* 9 entries */, 4096)  = 168
getdents(3, /* 0 entries */, 4096)  = 0
close(3)= 0

And my find(1) does:

open(".", O_RDONLY) = 3
[scan all dirs]
fchdir(3)   = 0

to return to its initial dir.  Will break too.

> No, you would get side effects only if you open as a regular file.

IMHO your assumption that opening a dir _requires_ O_DIRECTORY is
wrong.  You've put in a new semantic that has not been there and
that will break programs and POSIX conformance.

> Please, if you know something that actually breaks, tell me.

Yeah, see above ;)

Ciao, ET.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]devicearguments from lookup)

2001-05-23 Thread Oliver Xymoron

On Wed, 23 May 2001, Daniel Phillips wrote:

> > > > *boggle*
> > > >
> > > >[general sense of unease]
> >
> > I fully agree with Oliver.  It's an abomination.
>
> We are, or at least, I am, investigating this question purely on
> technical grounds - name calling is a noop.  I'd be happy to find a
> real reason why this is a bad idea but so far none has been
> presented.

I will agree that the thing can be done in principle. You're not going to
find anyone who's going to argue that part. All other things being equal,
I actually think it's a neat idea.

The part that is a problem is people, namely people who write programs.
They've had decades to expect that directories are not also files, and if
they happen to do things like check whether a file is not a directory
before opening it, it's _our fault_ if they get confused.

Consider the recent subtle change to fork() that was reversed because it
uncovered an unforseen bug in bash. The proposed change is not at all
subtle, is entirely without precedent, and is likely to break much.

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-23 Thread Daniel Phillips


On Wednesday 23 May 2001 06:19, Edgar Toernig wrote:
> IMO the whole idea of arguments following the device name is junk
> (incl a "/ctrl").

You know I didn't suggest that, right?  I find it pretty strange too, but
I'm listening to hear the technical arguments.

> Just think about the implications of the original "/dev/ttyS0/19200"
> suggestion.  It sounds nice and tempting.  But which programs will
> benefit.  Which gets confused.  What will be cleaned up.  After some
> thoughts you'll find out that it's useless ;-)

You know I didn't suggest that either, right?  But I'm with you, I don't
like it at'all, not least because we might change baud rate on the fly.

> And with special "ctrl" devices (ie /dev/ttyS0 and /dev/ttyS0ctrl):
> This _may_ work for some kind of devices.  But serial ports are one
> example where it simply will _not_.  It requires that you know the
> name of the device.  For ttys this is often not the case.
> Even if you manage to get some name for stdin for example - now I 
> should simply attach a "ctrl" to that name to get a control channel???
> At least dangerous.  If I'm lucky I only get an EPERM...

Again, I'll provide a sympathetic ear, but it wasn't my suggestion.

> Ciao, ET.

And you were referring to who?

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-23 Thread Daniel Phillips

On Wednesday 23 May 2001 06:19, Edgar Toernig wrote:
> Daniel Phillips wrote:
> > On Tuesday 22 May 2001 17:24, Oliver Xymoron wrote:
> > > On Mon, 21 May 2001, Daniel Phillips wrote:
> > > > On Monday 21 May 2001 19:16, Oliver Xymoron wrote:
> > > > > What I'd like to see:
> > > > >
> > > > > - An interface for registering an array of related devices
> > > > > (almost always two: raw and ctl) and their legacy device
> > > > > numbers with a single userspace callout that does whatever
> > > > > /dev/ creation needs to be done. Thus, naming and permissions
> > > > > live in user space. No "device node is also a directory"
> > > > > weirdness...
> > > >
> > > > Could you be specific about what is weird about it?
> > >
> > > *boggle*
> > >
> > >[general sense of unease]
>
> I fully agree with Oliver.  It's an abomination.

We are, or at least, I am, investigating this question purely on
technical grounds - name calling is a noop.  I'd be happy to find a
real reason why this is a bad idea but so far none has been
presented.

Don't get me wrong, the fact that people I respect have reservations
about the idea does mean something to me, but this still needs to be
investigated properly.  Now on to the technical content...

> > > I don't think it's likely to be even workable. Just consider the
> > > directory entry for a moment - is it going to be marked d or
> > > [cb]?
> >
> > It's going to be marked 'd', it's a directory, not a file.
>
> Aha.  So you lose the S_ISCHR/BLK attribute.

Readdir fills in a directory type, so ls sees it as a directory and does
the right thing.  On the other hand, we know we're on a device 
filesystem so we will next open the name as a regular file, and find
ISCHR or ISBLK: good.

The rule for this filesystem is: if you open with O_DIRECTORY then
directory operations are permitted, nothing else.  If you open without
O_DIRECTORY then directory operations are forbidden (as
usual) and normal device semantics apply.

If there is weirdness anywhere, it's right here with this rule.  The
question is: what if anything breaks?

> > > If it doesn't have the directory bit set, Midnight commander
> > > won't let me look at it, and I wouldn't blame cd or ls for
> > > complaining. If it does have the 'd' bit set, I wouldn't blame
> > > cp, tar, find, or a million other programs if they did the wrong
> > > thing. They've had 30 years to expect that files aren't
> > > directories. They're going to act weird.
> >
> > No problem, it's a directory.
>
> Directories are not allowed to be read from/written to.  The VFS may
> support it, but it's not (current) UNIX.

Here, we obey this rule: if you open it with O_DIRECTORY then you
can't read from or write to it.

> > > Linus has been kicking this idea around for a couple years now
> > > and it's still a cute solution looking for a problem. It just
> > > doesn't belong in UNIX.
> >
> > Hmm, ok, do we still have any *technical* reasons?
>
> So with your definition, I have a fs-object that is marked as a
> directory but opening it opens a device.  Pretty nice..

No, you have to open it without O_DIRECTORY to get your device
fd handle.

> How I'm supposed to list it's contents?  open+readdir?

Nothing breaks here, ls works as it always did.

This is what ls does:

open("foobar", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
fcntl64(0x3, 0x2, 0x1, 0x2) = -1 ENOSYS (Function not implemented)
fcntl(3, F_SETFD, FD_CLOEXEC)   = 0
brk(0x805b000)  = 0x805b000
getdents64(0x3, 0x8058270, 0x1000, 0x26) = -1 ENOSYS (Function not implemented)
getdents(3, /* 2 entries */, 2980)  = 28
getdents(3, /* 0 entries */, 2980)  = 0
close(3)= 0

Note that ls doesn't do anything as inconvenient as opening 
foobar as a normal file first, expecting that operation to fail.

> But the open has nasty side effects.
> So you have a directory that you are not allowed
> to list (because of the possible side effects) but is allowed to be
> read from/written to maybe even issue ioctls to?. 

No, you would get side effects only if you open as a regular file.
I'd agree that that sucks, but that's not what we're trying to fix
just now.

> And you call that sane???

I would hope it seems saner now, after the clarification.
Please, if you know something that actually breaks, tell me.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-23 Thread Stephen C. Tweedie

Hi,

On Tue, May 22, 2001 at 01:16:42PM -0600, Peter J. Braam wrote:

> File system journal recovery can corrupt a snapshot, because it copies
> data that needs to be preserved in a snapshot.

Journal recovery may move data from the journal to other locations on
the device, yes, but that doesn't change the logical contents of the
filesystem.  I don't see how that results in "corruption": the
snapshot is (or at least, ought to be!) fully independent of the
original version of the data, so such recovery should only be taking
the snapshot from one consistent state to a different but equivalent
state.

> During journal replay such
> data may be copied again, but the source can have new data already.

Only if you are recovering a live volume, surely?  And that is
*guaranteed* to cause problems.  

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Alexander Viro




On Wed, 23 May 2001, Edgar Toernig wrote:

> And with special "ctrl" devices (ie /dev/ttyS0 and /dev/ttyS0ctrl):
> This _may_ work for some kind of devices.  But serial ports are one
> example where it simply will _not_.  It requires that you know the

That's quite funny, you know...


From: Dennis Ritchie ([EMAIL PROTECTED])
Subject: Re: Plan 9 (was Re: Rubouts)
Newsgroups: alt.folklore.computers
Date: 1998/10/12
   
Neil Franklin wrote:
>
> No ioctl()s?
>
> Something like:echo "38400,8,n,1" > /ioctrl/ttyS0?
>
> Now that would be cool.
>
Exactly like that, though it would be /dev/eia80ctl .
No ioctl().

> Is there anyone who has an URL about Plan 9. Code download?
>

 http://plan9.bell-labs.com/plan9


Dennis


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Edgar Toernig

Daniel Phillips wrote:
> 
> On Tuesday 22 May 2001 17:24, Oliver Xymoron wrote:
> > On Mon, 21 May 2001, Daniel Phillips wrote:
> > > On Monday 21 May 2001 19:16, Oliver Xymoron wrote:
> > > > What I'd like to see:
> > > >
> > > > - An interface for registering an array of related devices
> > > > (almost always two: raw and ctl) and their legacy device numbers
> > > > with a single userspace callout that does whatever /dev/ creation
> > > > needs to be done. Thus, naming and permissions live in user
> > > > space. No "device node is also a directory" weirdness...
> > >
> > > Could you be specific about what is weird about it?
> >
> > *boggle*
> >
> >[general sense of unease]

I fully agree with Oliver.  It's an abomination.

> > I don't think it's likely to be even workable. Just consider the
> > directory entry for a moment - is it going to be marked d or [cb]?
> 
> It's going to be marked 'd', it's a directory, not a file.

Aha.  So you lose the S_ISCHR/BLK attribute.

> > If it doesn't have the directory bit set, Midnight commander won't
> > let me look at it, and I wouldn't blame cd or ls for complaining. If it
> > does have the 'd' bit set, I wouldn't blame cp, tar, find, or a
> > million other programs if they did the wrong thing. They've had 30
> > years to expect that files aren't directories. They're going to act
> > weird.
> 
> No problem, it's a directory.

Directories are not allowed to be read from/written to.  The VFS may
support it, but it's not (current) UNIX.

> > Linus has been kicking this idea around for a couple years now and
> > it's still a cute solution looking for a problem. It just doesn't
> > belong in UNIX.
> 
> Hmm, ok, do we still have any *technical* reasons?

So with your definition, I have a fs-object that is marked as a directory
but opening it opens a device.  Pretty nice.  How I'm supposed to list
it's contents?  open+readdir?  But the open has nasty side effects.
So you have a directory that you are not allowed to list (because of the
possible side effects) but is allowed to be read from/written to maybe
even issue ioctls to?.  And you call that sane???

IMO the whole idea of arguments following the device name is junk (incl
a "/ctrl").

Just think about the implications of the original "/dev/ttyS0/19200"
suggestion.  It sounds nice and tempting.  But which programs will
benefit.  Which gets confused.  What will be cleaned up.  After some
thoughts you'll find out that it's useless ;-)

And with special "ctrl" devices (ie /dev/ttyS0 and /dev/ttyS0ctrl):
This _may_ work for some kind of devices.  But serial ports are one
example where it simply will _not_.  It requires that you know the
name of the device.  For ttys this is often not the case.  Even if
you manage to get some name for stdin for example - now I should
simply attach a "ctrl" to that name to get a control channel???
At least dangerous.  If I'm lucky I only get an EPERM...

Ciao, ET.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Peter J. Braam

Andreas,

I think that the issue is something different.  Suppose the snapshot has
been created. I know that this can be done safely with the API's you
allude to. Life goes on and the journal FS keeps changing the file system
and if the system doesn't crash, everything is fine: blocks get copied
correctly from the primary volume to the snapshot volume.

Now consider a crash -- not during snapshot creation, but way after that
when "life is going on".  Suppose there is a two block transaction that
has made it to the journal and after writing one block to the fs location
the system crashes.  The journal replay will try to write that block
again.

But during recovery, LVM cannot possibly know if the whole process of
copying out the data from the current to the snapshot area completed
during the previous run. Yes, LVM updates the redirection table first and
then copies, but, still, you don't know _where exactly_ the writes stopped
happening and in particular you don't know if the block was copied already
or not.

So during replay it is quite possible that LVM corrupts the snapshot.

It's better to keep the snapshot in the old volume and write the new data
to a separate area (that's what most commercial systems do I think).  It
avoid redirections and copying upon write.  When you delete the snapshot
you have to copy, but you can do that as a low priority process.
Finally, as you pointed out a full volume is handled better too in that
way, since you don't terminate the snapshot but you tell the current
volume that it is full.

Hmm, I was expecting a storm of email explaining what I have
misunderstood, but it has in fact been rather quiet...

- Peter -

On Tue, 22 May 2001, Andreas Dilger wrote:

> Peter Braam writes:
> > On Tue, 22 May 2001, Andreas Dilger wrote:
> > > Actually, the LVM snapshot
> > > interface has (optional) hooks into the filesystem to ensure that it
> > > is consistent at the time the snapshot is created.
> >
> > File system journal recovery can corrupt a snapshot, because it copies
> > data that needs to be preserved in a snapshot. During journal replay such
> > data may be copied again, but the source can have new data already.
>
> The way it is implemented in reiserfs is to wait for existing transactions
> to complete, entirely flush the journal and block all new transactions from
> starting.  Stephen implemented a journal flush API to do this for ext3, but
> the hooks to call it from LVM are not in place yet.  This way the journal is
> totally empty at the time the snapshot is done, so the read-only copy does
> not need to do journal recovery, so no problems can arise.
>
> Cheers, Andreas
>

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Daniel Phillips


On Tuesday 22 May 2001 19:49, Oliver Xymoron wrote:
> On Tue, 22 May 2001, Daniel Phillips wrote:
> > > I don't think it's likely to be even workable. Just consider the
> > > directory entry for a moment - is it going to be marked d or
> > > [cb]?
> >
> > It's going to be marked 'd', it's a directory, not a file.
>
> Are we talking about the same proposal?  The one where I can open
> /dev/dsp and /dev/dsp/ctl? But I can still do 'cat /dev/hda >
> /dev/dsp'?

We already support read/write on directories in the VFS, that's not a
problem.

> It's still a file. If it's not a file anymore, it ain't UNIX.

It's a file with the directory bit set, I believe that's UNIX.

> > > If it doesn't have the directory bit set, Midnight commander
> > > won't let me look at it, and I wouldn't blame cd or ls for
> > > complaining. If it does have the 'd' bit set, I wouldn't blame
> > > cp, tar, find, or a million other programs if they did the wrong
> > > thing. They've had 30 years to expect that files aren't
> > > directories. They're going to act weird.
> >
> > No problem, it's a directory.
> >
> > > Linus has been kicking this idea around for a couple years now
> > > and it's still a cute solution looking for a problem. It just
> > > doesn't belong in UNIX.
> >
> > Hmm, ok, do we still have any *technical* reasons?
>
> If you define *technical* to not include design, sure.

Sorry, I don't see what you mean, do you mean the design is
difficult?

> Oh, did I mention unnecessary, solvable in userspace?

That's exactly the point: the generic filesystem allows all the
funny-shaped stuff to be dealt with in user space.  The
filesystem itself is lovely and clean.

BTW, I didn't realize I was reinventing Linus's wheel, this just
seemed very obvious and natural to me.  So I had to believe
there's a technical obstacle somewhere.

Has anyone written code to demonstrate the idea?

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Peter J. Braam

On Tue, 22 May 2001, Linus Torvalds wrote:
>

> On Tue, 22 May 2001, Andreas Dilger wrote:  Actually, the LVM snapshot
> interface has (optional) hooks into the filesystem to ensure that it
> is consistent at the time the snapshot is created.

But I think that LVM is implemented "the wrong way around".

File system journal recovery can corrupt a snapshot, because it copies
data that needs to be preserved in a snapshot. During journal replay such
data may be copied again, but the source can have new data already.

Most LVM snapshot systems write the new data in the separate volume and
don't copy the old data that eliminates this problem (and also eliminates
the copy of data but introduces data copy when a snapshot is removed).

- Peter -

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Linus Torvalds

On Tue, 22 May 2001, Andreas Dilger wrote:
> 
> Actually, the LVM snapshot interface has (optional) hooks into the filesystem
> to ensure that it is consistent at the time the snapshot is created.

Note that this is still fundamentally a broken interface: the filesystem
may not _have_ a block device underneath it, yet you might very well like
to do defragmentation and backup none-the-less.

Also, lvm snapshots are fundamentally limited to read-only data, which
means that the LVM interfaces cannot be used for defragmentation and lazy
fsck etc anyway. You _have_ to do those at a filesystem level.

disk snapshots are useful, but they are not the answer.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]devicearguments from lookup)

2001-05-22 Thread Oliver Xymoron


On Tue, 22 May 2001, Daniel Phillips wrote:

> > I don't think it's likely to be even workable. Just consider the
> > directory entry for a moment - is it going to be marked d or [cb]?
>
> It's going to be marked 'd', it's a directory, not a file.

Are we talking about the same proposal?  The one where I can open /dev/dsp
and /dev/dsp/ctl? But I can still do 'cat /dev/hda > /dev/dsp'?

It's still a file. If it's not a file anymore, it ain't UNIX.

> > If it doesn't have the directory bit set, Midnight commander won't
> > let me look at it, and I wouldn't blame cd or ls for complaining. If it
> > does have the 'd' bit set, I wouldn't blame cp, tar, find, or a
> > million other programs if they did the wrong thing. They've had 30
> > years to expect that files aren't directories. They're going to act
> > weird.
>
> No problem, it's a directory.
>
> > Linus has been kicking this idea around for a couple years now and
> > it's still a cute solution looking for a problem. It just doesn't
> > belong in UNIX.
>
> Hmm, ok, do we still have any *technical* reasons?

If you define *technical* to not include design, sure. Oh, did I
mention unnecessary, solvable in userspace?

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Daniel Phillips


On Tuesday 22 May 2001 17:24, Oliver Xymoron wrote:
> On Mon, 21 May 2001, Daniel Phillips wrote:
> > On Monday 21 May 2001 19:16, Oliver Xymoron wrote:
> > > What I'd like to see:
> > >
> > > - An interface for registering an array of related devices
> > > (almost always two: raw and ctl) and their legacy device numbers
> > > with a single userspace callout that does whatever /dev/ creation
> > > needs to be done. Thus, naming and permissions live in user
> > > space. No "device node is also a directory" weirdness...
> >
> > Could you be specific about what is weird about it?
>
> *boggle*
>
>[general sense of unease]
>
> I don't think it's likely to be even workable. Just consider the
> directory entry for a moment - is it going to be marked d or [cb]?

It's going to be marked 'd', it's a directory, not a file.

> If it doesn't have the directory bit set, Midnight commander won't
> let me look at it, and I wouldn't blame cd or ls for complaining. If it
> does have the 'd' bit set, I wouldn't blame cp, tar, find, or a
> million other programs if they did the wrong thing. They've had 30
> years to expect that files aren't directories. They're going to act
> weird.

No problem, it's a directory.

> Linus has been kicking this idea around for a couple years now and
> it's still a cute solution looking for a problem. It just doesn't
> belong in UNIX.

Hmm, ok, do we still have any *technical* reasons?

--
Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]devicearguments from lookup)

2001-05-22 Thread Oliver Xymoron

On Mon, 21 May 2001, Daniel Phillips wrote:

> On Monday 21 May 2001 19:16, Oliver Xymoron wrote:
> > What I'd like to see:
> >
> > - An interface for registering an array of related devices (almost
> > always two: raw and ctl) and their legacy device numbers with a
> > single userspace callout that does whatever /dev/ creation needs to
> > be done. Thus, naming and permissions live in user space. No "device
> > node is also a directory" weirdness...
>
> Could you be specific about what is weird about it?

*boggle*

Without precedent in any other UNIX? Or other operating systems, for that
matter? Can you honestly say it doesn't strike you as weird? It's beating
the least surprise rule with a big stick, fercryinoutloud.

Ok, so technically UNIX directories were once just files. But it's been a
long time since people thought exposing that implementation detail was a
good idea, and anyway, it's the opposite situation (and no longer true on
modern fses).

I don't think it's likely to be even workable. Just consider the directory
entry for a moment - is it going to be marked d or [cb]? If it doesn't
have the directory bit set, Midnight commander won't let me look at it,
and I wouldn't blame cd or ls for complaining. If it does have the 'd' bit
set, I wouldn't blame cp, tar, find, or a million other programs if they
did the wrong thing. They've had 30 years to expect that files aren't
directories. They're going to act weird.

Linus has been kicking this idea around for a couple years now and it's
still a cute solution looking for a problem. It just doesn't belong in
UNIX.

More importantly, there's no call for the weirdness. Look, we've already
got to have a userspace callout for new devices so that we can do config,
firmware downloading, automounting, etc. There's no reason we can't stick
the rest of the dynamic /dev/ magic in userspace with the same mechanism.

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-22 Thread Daniel Phillips


On Monday 21 May 2001 19:16, Oliver Xymoron wrote:
> What I'd like to see:
>
> - An interface for registering an array of related devices (almost
> always two: raw and ctl) and their legacy device numbers with a
> single userspace callout that does whatever /dev/ creation needs to
> be done. Thus, naming and permissions live in user space. No "device
> node is also a directory" weirdness...

Could you be specific about what is weird about it?

> ...which is overkill in the vast majority of cases.

--
Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-21 Thread Alexander Viro

On Sun, 20 May 2001, Pavel Machek wrote:

> Hi!
> 
> > A lot of stuff relies on the fact that close(open(foo, O_RDONLY)) is a
> > no-op. Breaking that assumption is a Bad Thing(tm).
> 
> Then we have a problem. Just opening /dev/ttyS0 currently *has* side
> effects (it is visible on modem lines from serial port; it can block
> you forever). 
> 
> If this assumption is somewhere, we should fix that place... Or fix
> serial ports.

There is no way to fix it. If process A has ability to create and remove
files in directory foo, then process B has no way to know what file it
will actually open upon the attempt to open file in foo.

Example: you want to open /home/luser/barf and /home in on root
filesystem (too many systems have such setup, and braindead as it is
it _is_ valid). Luser creates a link to his tty (currently owned by
luser, so no bullshit about "let's restrict link(2) to the case when
target is owned by caller", please). After that he renames that link
to barf.

If you've just decided to open it and rename() comes when you
enter open(3) (in libc, still in userland), you _will_ end up opening
luser's tty.

OTOH, behaviour of serial ports is required by standards.

All we can do is to open it in non-blocking mode and then checking whether
we've got what we wanted. You _must_ call fstat(2) after opening a file
that could be replaced under you. If you are not doing that (and open
file in directory controled by somebody else) - you have an exploitable
race. However, fstat() is too late to avoid side-effects of open() itself.

For serial ports O_NDELAY is enough to avoid that side effect. For something
where it's not enough - well, too bad. Don't do it.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-21 Thread Pavel Machek


Hi!

> So I guess things have already been a bit messy in this
> area for many years, even before linux even existed, and
> in some cases you can't really do anything about it because
> the behaviour is mandated by the applicable standards, like
> POSIX, SUS, or whatever.
> (The blocking of the open on a tty device is explicitly
>  documented in my copy of the X/Open specification.)

If X/Open documents security hole, then, I guess, X/Open will have to
be changed.
Pavel
-- 
I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-21 Thread Pavel Machek


Hi!

> Yes, and that is exactly the difference between having a side effect
> on the open(2), versus having the effect as a result of a write(2).
> 
> Unfortunately, there are already some cases where an open
> on a device can have unexpected results.  If you don't want
> to get blocked waiting for the carrier-detect signal from the
> modem when opening a tty device, you had better specify the
> O_NONBLOCK option on the open.  If you don't want this flag
> to be active during the actual I/O operations, then you would
> have to do an fcntl to clear the O_NONBLOCK again after the open.
> 
> So I guess things have already been a bit messy in this
> area for many years, even before linux even existed, and
> in some cases you can't really do anything about it because
> the behaviour is mandated by the applicable standards, like
> POSIX, SUS, or whatever.
> (The blocking of the open on a tty device is explicitly
>  documented in my copy of the X/Open specification.)
> 
> Fortunately, blocking the nightly backup program by making it
> accidentally open a tty is not quite as catastrophic as having
> it start a nuclear war, or format the disks, or something,
> just because a user was playing games with symlinks.

Maybe not *as* catastrophic, but security hole, anyway. User should
not be able to block system backups.

Small demonstration for bugtraq, anyone?
Pavel
-- 
I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-21 Thread Oliver Xymoron


On Mon, 21 May 2001, David Lang wrote:

> what makes you think it's safe to say there's only one floppy drive?

Read as: it doesn't make sense to have per-fd state on a single floppy
device given that there's only one actual hardware instance associated
with it and multiple openers don't make sense. Opening a floppy at
different densities with magic filenames was an example Linus used earlier
in the thread. Surely there can be more than one drive and more than one
serial port.

> On Mon, 21 May 2001, Oliver Xymoron wrote:
>
> > On Sat, 19 May 2001, Alexander Viro wrote:
> >
> > > Let's distinguish between per-fd effects (that's what name in
> > > open(name, flags) is for - you are asking for descriptor and telling
> > > what behaviour do you want for IO on it) and system-wide side effects.
> > >
> > > IMO encoding the former into name is perfectly fine, and no write on
> > > another file can be sanely used for that purpose. For the latter, though,
> > > we need to write commands into files and here your miscdevices (or procfs
> > > files, or /dev/foo/ctl - whatever) is needed.
> >
> > I'm a little skeptical about the necessity of these per-fd effects in the
> > first place - after all, Plan 9 does without them.  There's only one
> > floppy drive, yes? No concurrent users of serial ports? The counter that
> > comes to mind is sound devices supporting multiple opens, but I think
> > esound and friends are a better solution to that problem.

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-21 Thread David Lang


what makes you think it's safe to say there's only one floppy drive?

David Lang

On Mon, 21 May 2001, Oliver Xymoron wrote:

> On Sat, 19 May 2001, Alexander Viro wrote:
>
> > Let's distinguish between per-fd effects (that's what name in
> > open(name, flags) is for - you are asking for descriptor and telling
> > what behaviour do you want for IO on it) and system-wide side effects.
> >
> > IMO encoding the former into name is perfectly fine, and no write on
> > another file can be sanely used for that purpose. For the latter, though,
> > we need to write commands into files and here your miscdevices (or procfs
> > files, or /dev/foo/ctl - whatever) is needed.
>
> I'm a little skeptical about the necessity of these per-fd effects in the
> first place - after all, Plan 9 does without them.  There's only one
> floppy drive, yes? No concurrent users of serial ports? The counter that
> comes to mind is sound devices supporting multiple opens, but I think
> esound and friends are a better solution to that problem.
>
> What I'd like to see:
>
> - An interface for registering an array of related devices (almost always
> two: raw and ctl) and their legacy device numbers with a single userspace
> callout that does whatever /dev/ creation needs to be done. Thus, naming
> and permissions live in user space. No "device node is also a directory"
> weirdness which is overkill in the vast majority of cases. No kernel names
> or permissions leaking into userspace.
>
> - An unregister_devices that does the same, giving userspace a
> chance to persist permissions, etc.
>
> - A userspace program that keeps a mapping of kernel names to /dev/ names,
> permissions, etc.
>
> - An autofs hook that does the reverse mapping for running with modules
> (possibly calling modprobe directly)
>
> Possible future extension:
>
> - Allow exporting proc as a large collection of devices. Manage /proc in
> userspace on a tmpfs.
>
> --
>  "Love the dolphins," she advised him. "Write by W.A.S.T.E.."
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-21 Thread Oliver Xymoron

On Sat, 19 May 2001, Jeff Garzik wrote:

> Why are LVM and EVMS(competing LVM project) needed at all?
>
> Surely the same can be accomplished with
> * md
> * snapshot blkdev (attached in previous e-mail)
> * giving partitions and blkdevs the ability to grow and shrink
> * giving filesystems the ability to grow and shrink

You can migrate data off disks while the filesystems on top of them are
live. Add disk b, migrate a->b, remove disk a. Perhaps this is intrinsic
in the above somehow but I don't see it.

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-21 Thread Oliver Xymoron


On Sat, 19 May 2001, Alexander Viro wrote:

> Let's distinguish between per-fd effects (that's what name in
> open(name, flags) is for - you are asking for descriptor and telling
> what behaviour do you want for IO on it) and system-wide side effects.
>
> IMO encoding the former into name is perfectly fine, and no write on
> another file can be sanely used for that purpose. For the latter, though,
> we need to write commands into files and here your miscdevices (or procfs
> files, or /dev/foo/ctl - whatever) is needed.

I'm a little skeptical about the necessity of these per-fd effects in the
first place - after all, Plan 9 does without them.  There's only one
floppy drive, yes? No concurrent users of serial ports? The counter that
comes to mind is sound devices supporting multiple opens, but I think
esound and friends are a better solution to that problem.

What I'd like to see:

- An interface for registering an array of related devices (almost always
two: raw and ctl) and their legacy device numbers with a single userspace
callout that does whatever /dev/ creation needs to be done. Thus, naming
and permissions live in user space. No "device node is also a directory"
weirdness which is overkill in the vast majority of cases. No kernel names
or permissions leaking into userspace.

- An unregister_devices that does the same, giving userspace a
chance to persist permissions, etc.

- A userspace program that keeps a mapping of kernel names to /dev/ names,
permissions, etc.

- An autofs hook that does the reverse mapping for running with modules
(possibly calling modprobe directly)

Possible future extension:

- Allow exporting proc as a large collection of devices. Manage /proc in
userspace on a tmpfs.

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD

2001-05-20 Thread Alan Cox


> Why are LVM and EVMS(competing LVM project) needed at all?

I prefer to think of it the other way around

> Surely the same can be accomplished with
> * md
> * snapshot blkdev (attached in previous e-mail)
> * giving partitions and blkdevs the ability to grow and shrink
> * giving filesystems the ability to grow and shrink

How about 'partitions are in inferior legacy form of LVM'

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-20 Thread Matthew Kirkwood

On Sat, 19 May 2001 [EMAIL PROTECTED] wrote:

> One would like to have a version of the open() call that was
> guaranteed free of side effects, and gave a fd only -
> perhaps for stat(), perhaps for ioctl().

I did this a while ago, after some discussion.  The
implementation may suck, but I think it's a useful
facility.

http://web.gnu.walfield.org/mail-archive/linux-fsdevel/2000-March/0230.html

Matthew.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Jeff Garzik


Here's a dumb question, and I apologize if I am questioning computer
science dogma...

Why are LVM and EVMS(competing LVM project) needed at all?

Surely the same can be accomplished with
* md
* snapshot blkdev (attached in previous e-mail)
* giving partitions and blkdevs the ability to grow and shrink
* giving filesystems the ability to grow and shrink

On-line optimization (defrag, etc) shouldn't be hard once you have the
ability to move blocks and files around, which would come with the
ability to grow and shrink blkdevs and fs's.

-- 
Jeff Garzik  | "Do you have to make light of everything?!"
Building 1024| "I'm extremely serious about nailing your
MandrakeSoft |  step-daughter, but other than that, yes."
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Jeff Garzik


Linus Torvalds wrote:
> There are some strong arguments that we should have filesystem
> "backdoors" for maintenance purposes, including backup.

I think I agree with something Al said over IRC, that fs-level snapshots
are preferred over block level snapshots.

fs-level snapshots should become easy if you have a generic transaction
layer.  The OS spits out file ops, which get processed into a set of fs
transactions.  (remember that fs-level stuff like "change this block
bitmap" is also a transaction, just like the more generic "update this
inode's mtime")

Also, I think there should be generic block allocation strategies that
fs's can use.  Implementing fs-specific strategies such as ext2's
readahead or XFS's delayed allocation is not a solution, IMHO, but
working towards solving the real problem.



> You can, of course, so parts of this on a LVM level, and doing backups
> with "disk snapshots" may be a valid approach. However, even that is
> debatable: there is very little that says that the disk image has to be
> up-to-date at any particular point in time, so even with a disk snapshot
> capability (which is not necessarily reasonable under all circumstances)
> there are arguments for maintenance interfaces.

I've been hacking on the attached, a snapshot block device driver, which
doesn't require LVM at all.  (warning: compiled and updated per outside
review, but very alpha...  do not apply)

The point of the driver is to provide a sync point at snapshot time, at
which all metadata and data is flushed to the block device.

My question... is there a fundamental flaw in this plan?  Ideally when
userspace says "start snapshot", the fsync_dev occurs [a
simplification].  At that point, userspace can safely run dump or tar or
whatever on the virtual snapshot device.

-- 
Jeff Garzik  | "Do you have to make light of everything?!"
Building 1024| "I'm extremely serious about nailing your
MandrakeSoft |  step-daughter, but other than that, yes."

Index: linux_2_4/drivers/block/Config.in
diff -u linux_2_4/drivers/block/Config.in:1.1.1.44 
linux_2_4/drivers/block/Config.in:1.1.1.44.4.1
--- linux_2_4/drivers/block/Config.in:1.1.1.44  Tue May 15 04:43:24 2001
+++ linux_2_4/drivers/block/Config.in   Wed May 16 15:44:59 2001
@@ -46,4 +46,6 @@
 fi
 dep_bool '  Initial RAM disk (initrd) support' CONFIG_BLK_DEV_INITRD 
$CONFIG_BLK_DEV_RAM
 
+tristate 'Snapshot device support' CONFIG_BLK_DEV_SNAP
+
 endmenu
Index: linux_2_4/drivers/block/Makefile
diff -u linux_2_4/drivers/block/Makefile:1.1.1.46 
linux_2_4/drivers/block/Makefile:1.1.1.46.4.1
--- linux_2_4/drivers/block/Makefile:1.1.1.46   Tue May 15 04:43:24 2001
+++ linux_2_4/drivers/block/MakefileWed May 16 15:44:59 2001
@@ -31,6 +31,7 @@
 obj-$(CONFIG_BLK_DEV_DAC960)   += DAC960.o
 
 obj-$(CONFIG_BLK_DEV_NBD)  += nbd.o
+obj-$(CONFIG_BLK_DEV_SNAP) += snap.o
 
 subdir-$(CONFIG_PARIDE) += paride
 
Index: linux_2_4/drivers/block/snap.c
diff -u /dev/null linux_2_4/drivers/block/snap.c:1.1.6.10
--- /dev/null   Sat May 19 17:36:30 2001
+++ linux_2_4/drivers/block/snap.c  Thu May 17 11:48:54 2001
@@ -0,0 +1,1055 @@
+/*
+   Copyright 2001 Jeff Garzik <[EMAIL PROTECTED]>
+   Copyright (C) 2000 Jens Axboe <[EMAIL PROTECTED]>
+  
+   May be copied or modified under the terms of the GNU General Public
+   License.  See linux/COPYING for more information.
+  
+   Several ideas and some code taken from Jens Axboe's pktcdvd.c 0.0.2j.
+  
+   To-Do list:
+   * Write support.  It's easy, and might be useful in isolated circumstances.
+   * Convert MAX_SNAPDEVS to a module parameter.
+   * Wrap use of "%" operator, to prepare for 64-bit-sized blockdevs on 
+ 32-bit processors
+  
+ */
+
+#define VERSION_CODE   "v0.5.0-take6  17 May 2001  Jeff Garzik 
+<[EMAIL PROTECTED]>"
+#define MODNAME"snap"
+#define PFXMODNAME ": "
+#define MAX_SNAPDEVS   16 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int *snap_sizes;
+static int *snap_blksize;
+static int *snap_readahead;
+static struct snap_device *snap_devs;
+static int snap_major = -1;
+static spinlock_t snap_lock = SPIN_LOCK_UNLOCKED;
+
+
+/*
+ * a bit of a kludge, but we want to be able to pass source, log,
+ * or snap dev and get the right one.
+ */
+static struct snap_device *snap_find_dev(kdev_t dev)
+{
+   int i, j;
+   struct snap_device *sd;
+
+   spin_lock(&snap_lock);
+
+   for (i = 0; i < MAX_SNAPDEVS; i++) {
+   sd = &snap_devs[i];
+   if ((sd->src.dev == dev) || (sd->snap_dev == dev))
+   goto out;
+   for (j = 0; j < sd->n_logs; j++)
+   if (sd->logs[j].dev == dev)
+   goto out;
+   }
+   sd = NULL;
+
+out:
+   spin_unlock(&snap_lock);
+   return sd;
+}
+
+static request_queue_t *snap_get_queue(kdev_t dev)
+{
+   struct snap_device *sd =

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Alexander Viro

On Sun, 20 May 2001, Edgar Toernig wrote:

> That assumption is totally bogus.  Even for regular files you have side
> effects (atime); for anything else they're unpredictable.

That means only one thing: safe backups are possible only in single-user
mode. For values of safe being "not triggering these side effects on
arbitrary files outside of the area you are trying to backup". You can't
pin an object down until you open it. You can check that it's the same
object you think it is, but that will require fstat(). I.e. opening the
thing.

If all effects of open() either disappear on close() or are something you
don't care about - fine. Otherwise you have a problem. On any UNIX.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Edgar Toernig

nitpicking: a system call without side effects would be pretty useless.

Alexander Viro wrote:
> A lot of stuff relies on the fact that close(open(foo, O_RDONLY)) is a
> no-op. Breaking that assumption is a Bad Thing(tm).

That assumption is totally bogus.  Even for regular files you have side
effects (atime); for anything else they're unpredictable.

Ciao, ET.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Alexander Viro

On Sat, 19 May 2001, Jeff Garzik wrote:

> Are we talking about device arguments just for chrdevs and blkdevs? 
> (ie. drivers)  or for regular files too?

Let's distinguish between per-fd effects (that's what name in open(name, flags)
is for - you are asking for descriptor and telling what behaviour do you
want for IO on it) and system-wide side effects.

IMO encoding the former into name is perfectly fine, and no write on
another file can be sanely used for that purpose. For the latter, though,
we need to write commands into files and here your miscdevices (or procfs
files, or /dev/foo/ctl - whatever) is needed.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Jeff Garzik


Jeff Garzik wrote:
> Notice also a "metadata miscdev" solves the problem of passing options
> on open -- just pass those options to the miscdev before you open it...

to be more clear, "it" == the data device, not the metadata miscdev

-- 
Jeff Garzik  | "Do you have to make light of everything?!"
Building 1024| "I'm extremely serious about nailing your
MandrakeSoft |  step-daughter, but other than that, yes."
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Jeff Garzik


Are we talking about device arguments just for chrdevs and blkdevs? 
(ie. drivers)  or for regular files too?

Speaking about drivers specifically, a controlling miscdev, one per
device or one per group of devices depending on your needs, is a much
more clean solution for passing ioctl-type data.  You are free to come
up with whatever method of communication with the driver is most
efficient for your needs -- without perverting open(2).

Notice also a "metadata miscdev" solves the problem of passing options
on open -- just pass those options to the miscdev before you open it...

metadata miscdevs are a clean solution to what procfs hacks and ioctls
are trying to accomplish.

Jeff


-- 
Jeff Garzik  | "Do you have to make light of everything?!"
Building 1024| "I'm extremely serious about nailing your
MandrakeSoft |  step-daughter, but other than that, yes."
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Alexander Viro




On Sat, 19 May 2001, Matthew Wilcox wrote:

> On Sat, May 19, 2001 at 12:51:07PM -0400, Alexander Viro wrote:
> > clone(), walk(), clunk(), stat() and open() ;-) Basically, we can add
> > unopened descriptors. I.e. no IO until you open it (turning the thing into
> > opened one), but we can do lookups (move to child), we can clone and
> > kill them and we can stat them.
> 
> Those who would like a more detailed explanation can find one at
> http://plan9.bell-labs.com/sys/man/5/INDEX.html

Umm... Yes, it's an allusion to 9P, but no, I'm not serious about exporting
that to userland.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Alexander Viro




On Sat, 19 May 2001, Linus Torvalds wrote:

> 
> On Sat, 19 May 2001, Alexander Viro wrote:
> >
> > Folks, before you get all excited about cramming side effects into
> > open(2), consider the following case:
> 
> Your argument is stupid, imnsho.
> 
> Side-effects are perfectly fine if they are _local_ to the file
> descriptor. Your example is contrieved and idiotic.

Linus, would you _look_ at the uses of open() proposed upthread?

Would you like to argue that close(open("/bin/ls,-l,/etc/passwd", O_RDONLY));
as equivalent of spawn(3) is _not_ contrieved and idiotic?

Would you like to argue that close(open("/dev/md0/..add-...=/foo/bar",O_RDONLY))
as a way to add stripes is not contrieved and idiotic?
 
> These are _not_ side effects. They are very much naming conventions. If I

I would say that both examples above (both really proposed) _are_ side
effects by any definition.

> want to open a the floppy in one of the special extended modes, it makes a
> LOT more sense to just open it with the naming, than to open a "generic"
> floppy device only to them use a magic and very unreadable ioctl to set
> the mode of the device.

Who argues for ioctls? I'm perfectly OK with the stuff that affects future
IO on the descriptor you've opened. That's what open() is for, after all.
However, IMNSHO examples of abusing open() (see above, grep your mailbox if
you think that I'm making it up) posted to that thread _are_ side effects
- ugly as hell, contrieved and bound to be source of exploits.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Linus Torvalds

On Sat, 19 May 2001, Alexander Viro wrote:
>
>   Folks, before you get all excited about cramming side effects into
> open(2), consider the following case:

Your argument is stupid, imnsho.

Side-effects are perfectly fine if they are _local_ to the file
descriptor. Your example is contrieved and idiotic.

Filename extensions would not replace ioctl's. But they are wonderful ways
to avoid unnecessary binary name-spaces, like the ones we have with
"callout" TTY names, and the one that the fb people had.

For example, do a "ls -l /dev/fd0*", and ponder. Also, realize that we
have these hard-coded names in _addition_ to the magic ioctl to set even
more parameters. These are all stupid and bad, and it would have been a
_lot_ cleaner to be able to do

open("/dev/fd0/H1440", O_RDWR)..

or

open("/dev/fd0/HD,18,85", O_RDWD)

to open special non-standard high-density modes.

We already did this, in a very limited and stupid way, by encoding the
minor number and generating a standard naming scheme. We can do the same
thing in a _much_ more generic way by just realizing that we wanted the
open to be name-based in the first place.

These are _not_ side effects. They are very much naming conventions. If I
want to open a the floppy in one of the special extended modes, it makes a
LOT more sense to just open it with the naming, than to open a "generic"
floppy device only to them use a magic and very unreadable ioctl to set
the mode of the device.

In short, I don't buy your arguments for one single second.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH] device arguments from lookup)

2001-05-19 Thread Matthew Wilcox


On Sat, May 19, 2001 at 12:51:07PM -0400, Alexander Viro wrote:
> clone(), walk(), clunk(), stat() and open() ;-) Basically, we can add
> unopened descriptors. I.e. no IO until you open it (turning the thing into
> opened one), but we can do lookups (move to child), we can clone and
> kill them and we can stat them.

Those who would like a more detailed explanation can find one at
http://plan9.bell-labs.com/sys/man/5/INDEX.html

-- 
Revolutions do not require corporate support.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Alexander Viro

On Sat, 19 May 2001 [EMAIL PROTECTED] wrote:

> One would like to have a version of the open() call that was
> guaranteed free of side effects, and gave a fd only -
> perhaps for stat(), perhaps for ioctl().
> This guarantee could perhaps be obtained by omitting the
>   f->f_op->open(inode,f);
> call in dentry_open() when the open call is
>   open("file", O_FDONLY);
> Of course it may be that we afterwards decide that fd must
> be used, and then it needs upgrading:
>   fd = f_open(fd, O_RDWR);

clone(), walk(), clunk(), stat() and open() ;-) Basically, we can add
unopened descriptors. I.e. no IO until you open it (turning the thing into
opened one), but we can do lookups (move to child), we can clone and
kill them and we can stat them.

It makes tree traversals much easier, but AFAIK nobody had exported that
API directly to userland. Might be a good idea, but it's completely
non-portable...

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH] device arguments from lookup)

2001-05-19 Thread Andries . Brouwer


>> Opening device files often has interesting side effects.

> Too bad. They can be triggered by similar races between attacker
> changing the type of object (file<->symlink) and backup.

Yes. This is a well-known security problem.
Doing
stat("file", &s);
if (action desired) {
action("file");
}
is no good because there is a race.
But doing
fd = open("file", flags);
fstat(fd, &s);
if (action desired) {
f_action(fd);
}
is no good either because the open() has unknown side effects.
It helps to add flags like O_NONBLOCK and perhaps O_NOCTTY,
but that is not quite good enough.

One would like to have a version of the open() call that was
guaranteed free of side effects, and gave a fd only -
perhaps for stat(), perhaps for ioctl().
This guarantee could perhaps be obtained by omitting the
f->f_op->open(inode,f);
call in dentry_open() when the open call is
open("file", O_FDONLY);
Of course it may be that we afterwards decide that fd must
be used, and then it needs upgrading:
fd = f_open(fd, O_RDWR);

Andries

[Such a construction allows various cleanups.
But no doubt it has problems that I have not yet thought of.]
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Willem Konynenberg

Abramo Bagnara wrote:
> Alexander Viro wrote:
> > Folks, before you get all excited about cramming side effects into
> > open(2), consider the following case:
> > 
> > 1) opening "/dev/zero/start_nuclear_war" has a certain side effect.
[...]
> Can't this easily avoided if the needed action is not
> 
> < /dev/zero/start_nuclear_war 
> or
> > /dev/zero/start_nuclear_war
> 
> but
> 
> echo "I'm evil" > /dev/zero/start_nuclear_war
> 
> ?

Yes, and that is exactly the difference between having a side effect
on the open(2), versus having the effect as a result of a write(2).

Unfortunately, there are already some cases where an open
on a device can have unexpected results.  If you don't want
to get blocked waiting for the carrier-detect signal from the
modem when opening a tty device, you had better specify the
O_NONBLOCK option on the open.  If you don't want this flag
to be active during the actual I/O operations, then you would
have to do an fcntl to clear the O_NONBLOCK again after the open.

So I guess things have already been a bit messy in this
area for many years, even before linux even existed, and
in some cases you can't really do anything about it because
the behaviour is mandated by the applicable standards, like
POSIX, SUS, or whatever.
(The blocking of the open on a tty device is explicitly
 documented in my copy of the X/Open specification.)

Fortunately, blocking the nightly backup program by making it
accidentally open a tty is not quite as catastrophic as having
it start a nuclear war, or format the disks, or something,
just because a user was playing games with symlinks.

-- 
 Willem Konynenberg <[EMAIL PROTECTED]>
I am not able rightly to apprehend the kind of confusion of ideas
that could provoke such a question  --  Charles Babbage
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Alexander Viro

On Sat, 19 May 2001, Abramo Bagnara wrote:

> Can't this easily avoided if the needed action is not
> 
> < /dev/zero/start_nuclear_war 
> or
> > /dev/zero/start_nuclear_war
> 
> but
> 
> echo "I'm evil" > /dev/zero/start_nuclear_war

Sure. And that's the right thing to do (not the implied action, that is -
_that_ would be too messy).

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Abramo Bagnara


Alexander Viro wrote:
> 
> Folks, before you get all excited about cramming side effects into
> open(2), consider the following case:
> 
> 1) opening "/dev/zero/start_nuclear_war" has a certain side effect.
> 
> 2) Local user does the following:
> ln -sf /dev/zero/start_nuclear_war bar
> while true; do
> mkdir foo
> rmdir foo
> ln -sf bar foo
> rm foo
> done
> 
> 3) Comes the night and root runs (from crontab) updatedb(8). Said beast
> includes find(1). With sufficiently bad timing find _will_ be tricked
> into attempt to open foo. It will honestly lstat() it, all right. But
> there's no way to make sure that subsequent open() on the found directory
> will get the same object.
> 
> 4) Side effect happens...
> 
> Similar scenarios can be found for other programs run by/as root, but I
> think that the point is obvious - side effects on open() are not a good
> idea. Yes, we can play with checking for O_DIRECTORY, yodda, yodda, but
> I wouldn't bet a dime on security of a system with such side effects.
> A lot of stuff relies on the fact that close(open(foo, O_RDONLY)) is a
> no-op. Breaking that assumption is a Bad Thing(tm).

Can't this easily avoided if the needed action is not

< /dev/zero/start_nuclear_war 
or
> /dev/zero/start_nuclear_war

but

echo "I'm evil" > /dev/zero/start_nuclear_war

?

-- 
Abramo Bagnara   mailto:[EMAIL PROTECTED]

Opera Unica  Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy

ALSA project   http://www.alsa-project.org
It sounds good!
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)

2001-05-19 Thread Alexander Viro

On Sat, 19 May 2001 [EMAIL PROTECTED] wrote:

> > A lot of stuff relies on the fact that close(open(foo, O_RDONLY))
> > is a no-op. Breaking that assumption is a Bad Thing(tm).
> 
> Also here I would like to agree. Unfortunately this is false.
> Opening device files often has interesting side effects.

Too bad. They can be triggered by similar races between attacker
changing the type of object (file<->symlink) and backup.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH] device arguments from lookup)

2001-05-19 Thread Andries . Brouwer

Alexander Viro writes:

> Folks, before you get all excited about cramming side effects
> into open(2), consider ...

I agree completely.

> A lot of stuff relies on the fact that close(open(foo, O_RDONLY))
> is a no-op. Breaking that assumption is a Bad Thing(tm).

Also here I would like to agree. Unfortunately this is false.
Opening device files often has interesting side effects.

Andries
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

65 matches

Mail list logo