Re: "restricted" kernel threads implementation from NetBSD via newconfig

1999-06-27 Thread Anonymous

please yes..
eventually we'll be using it to fire off a thread for every interrupt
source if we go the BSDI way. (as dicussed with various people at USENIX)



I was actually thinking about this today...


now this is threads within the kernel, and not kernel support for user
threads right?



julian


On Mon, 28 Jun 1999, Warner Losh wrote:

> 
> I'd like to bring a kernel thread implementation, ported from NetBDS
> by the newconfig project, into the kernel.  Who would like to review
> things before they go into the tree?  I can see many benefits for
> having this in the tree, but very little downside.  This should allow
> people to more easily port raid-frame from NetBSD if they desire.
> 
> FYI, this is an outshoot of the porting of the newconfig code to
> new-bus.  Each bridge controller has its own even thread to handle
> cards events in a sane manner.  It is basically a stripped down
> pccardd in the kernel, but one that has a huge hint database.  I'm not 
> proposing, at this time, to bring it on.  I just want to get the
> kthread stuff in as a separate issue.
> 
> Comments?
> 
> Warner
> 
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-hackers" in the body of the message
> 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: "restricted" kernel threads implementation from NetBSD via newconfig

1999-06-27 Thread Julian Elischer
please yes..
eventually we'll be using it to fire off a thread for every interrupt
source if we go the BSDI way. (as dicussed with various people at USENIX)



I was actually thinking about this today...


now this is threads within the kernel, and not kernel support for user
threads right?



julian


On Mon, 28 Jun 1999, Warner Losh wrote:

> 
> I'd like to bring a kernel thread implementation, ported from NetBDS
> by the newconfig project, into the kernel.  Who would like to review
> things before they go into the tree?  I can see many benefits for
> having this in the tree, but very little downside.  This should allow
> people to more easily port raid-frame from NetBSD if they desire.
> 
> FYI, this is an outshoot of the porting of the newconfig code to
> new-bus.  Each bridge controller has its own even thread to handle
> cards events in a sane manner.  It is basically a stripped down
> pccardd in the kernel, but one that has a huge hint database.  I'm not 
> proposing, at this time, to bring it on.  I just want to get the
> kthread stuff in as a separate issue.
> 
> Comments?
> 
> Warner
> 
> 
> To Unsubscribe: send mail to majord...@freebsd.org
> with "unsubscribe freebsd-hackers" in the body of the message
> 



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



"restricted" kernel threads implementation from NetBSD via newconfig

1999-06-27 Thread Warner Losh


I'd like to bring a kernel thread implementation, ported from NetBDS
by the newconfig project, into the kernel.  Who would like to review
things before they go into the tree?  I can see many benefits for
having this in the tree, but very little downside.  This should allow
people to more easily port raid-frame from NetBSD if they desire.

FYI, this is an outshoot of the porting of the newconfig code to
new-bus.  Each bridge controller has its own even thread to handle
cards events in a sane manner.  It is basically a stripped down
pccardd in the kernel, but one that has a huge hint database.  I'm not 
proposing, at this time, to bring it on.  I just want to get the
kthread stuff in as a separate issue.

Comments?

Warner


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



"restricted" kernel threads implementation from NetBSD via newconfig

1999-06-27 Thread Warner Losh

I'd like to bring a kernel thread implementation, ported from NetBDS
by the newconfig project, into the kernel.  Who would like to review
things before they go into the tree?  I can see many benefits for
having this in the tree, but very little downside.  This should allow
people to more easily port raid-frame from NetBSD if they desire.

FYI, this is an outshoot of the porting of the newconfig code to
new-bus.  Each bridge controller has its own even thread to handle
cards events in a sane manner.  It is basically a stripped down
pccardd in the kernel, but one that has a huge hint database.  I'm not 
proposing, at this time, to bring it on.  I just want to get the
kthread stuff in as a separate issue.

Comments?

Warner


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: setiathome crashes 3.2?

1999-06-27 Thread Bernd Walter

On Sun, Jun 27, 1999 at 10:09:22PM -0400, Thomas David Rivers wrote:
> 
> I seem to recall seeing this someone (this may not be the
> right list.)
> 
> But - I downloaded the 3.2  Seti@home and starting running it
> on a left-over 75mhz laptop I have.
> 
> It seems to crash the laptop (silently lock it up, actually)
> fairly quickly.
> 
> Did I recall someone else mentioning that?
> 
> Would everyone agree that it's not a "good thing" for a user-mode
> program to be able to lock up the OS?
> 
There are severall resons.
One of them is that I got panics with a to high set MAXUSER in kernel options.
I don't know if it's a problem with 3.2.
The other possible reason might be a CPU overheating. CPUs used under FreeBSD
are typicall suspended during idle-time - when running seti or other permanent
running programms there is no idle time.
I asume there are several more possbilities.
But it sounds like there is something broken with your configuration.

-- 
B.Walter  COSMO-Project  http://www.cosmo-project.de
[EMAIL PROTECTED] Usergroup[EMAIL PROTECTED]



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: setiathome crashes 3.2?

1999-06-27 Thread Bernd Walter
On Sun, Jun 27, 1999 at 10:09:22PM -0400, Thomas David Rivers wrote:
> 
> I seem to recall seeing this someone (this may not be the
> right list.)
> 
> But - I downloaded the 3.2  s...@home and starting running it
> on a left-over 75mhz laptop I have.
> 
> It seems to crash the laptop (silently lock it up, actually)
> fairly quickly.
> 
> Did I recall someone else mentioning that?
> 
> Would everyone agree that it's not a "good thing" for a user-mode
> program to be able to lock up the OS?
> 
There are severall resons.
One of them is that I got panics with a to high set MAXUSER in kernel options.
I don't know if it's a problem with 3.2.
The other possible reason might be a CPU overheating. CPUs used under FreeBSD
are typicall suspended during idle-time - when running seti or other permanent
running programms there is no idle time.
I asume there are several more possbilities.
But it sounds like there is something broken with your configuration.

-- 
B.Walter  COSMO-Project  http://www.cosmo-project.de
ti...@cicely.de Usergroupi...@cosmo-project.de



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Inetd and wrapping.

1999-06-27 Thread Aaron Smith

On Sun, 27 Jun 1999 22:26:34 EDT, John Baldwin writes:
>Let's say I have two services, foo and bar, with food and bard.  I want to
>wrap food, but *NOT* bard and they are both in /etc/inetd.conf.  How do
>you propose to solve this with the internal wrapping (which is a good
>idea, IMO as it eliminates an exec())?

i wouldn't...i'd have to either pay the (small) cost of wrapping or pay the
(less small) tcpd exec and not use internal wrapping. it's "nice" to save
the exec, but intensely performance or latency sensitive daemons probably
shouldn't be starting out of inetd, they should be standalone and
preforked or threaded...

aaron


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Inetd and wrapping.

1999-06-27 Thread Aaron Smith
On Sun, 27 Jun 1999 22:26:34 EDT, John Baldwin writes:
>Let's say I have two services, foo and bar, with food and bard.  I want to
>wrap food, but *NOT* bard and they are both in /etc/inetd.conf.  How do
>you propose to solve this with the internal wrapping (which is a good
>idea, IMO as it eliminates an exec())?

i wouldn't...i'd have to either pay the (small) cost of wrapping or pay the
(less small) tcpd exec and not use internal wrapping. it's "nice" to save
the exec, but intensely performance or latency sensitive daemons probably
shouldn't be starting out of inetd, they should be standalone and
preforked or threaded...

aaron


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: setiathome crashes 3.2?

1999-06-27 Thread Matthew Jacob


Umm- I've been running it for weeks on 3.2 with no problem.


On Sun, 27 Jun 1999, Thomas David Rivers wrote:

> 
> I seem to recall seeing this someone (this may not be the
> right list.)
> 
> But - I downloaded the 3.2  Seti@home and starting running it
> on a left-over 75mhz laptop I have.
> 
> It seems to crash the laptop (silently lock it up, actually)
> fairly quickly.
> 
> Did I recall someone else mentioning that?
> 
> Would everyone agree that it's not a "good thing" for a user-mode
> program to be able to lock up the OS?
> 
>   - Dave Rivers -
> 
> 
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-hackers" in the body of the message
> 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: setiathome crashes 3.2?

1999-06-27 Thread Matthew Jacob

Umm- I've been running it for weeks on 3.2 with no problem.


On Sun, 27 Jun 1999, Thomas David Rivers wrote:

> 
> I seem to recall seeing this someone (this may not be the
> right list.)
> 
> But - I downloaded the 3.2  s...@home and starting running it
> on a left-over 75mhz laptop I have.
> 
> It seems to crash the laptop (silently lock it up, actually)
> fairly quickly.
> 
> Did I recall someone else mentioning that?
> 
> Would everyone agree that it's not a "good thing" for a user-mode
> program to be able to lock up the OS?
> 
>   - Dave Rivers -
> 
> 
> 
> To Unsubscribe: send mail to majord...@freebsd.org
> with "unsubscribe freebsd-hackers" in the body of the message
> 



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



RE: setiathome crashes 3.2?

1999-06-27 Thread Richard Flores

unsubscribe freebsd-hackers
end

> -Original Message-
> From: Thomas David Rivers [mailto:[EMAIL PROTECTED]]
> Sent: Sunday, June 27, 1999 7:09 PM
> To: [EMAIL PROTECTED]
> Subject: setiathome crashes 3.2?
> 
> 
> 
> I seem to recall seeing this someone (this may not be the
> right list.)
> 
> But - I downloaded the 3.2  Seti@home and starting running it
> on a left-over 75mhz laptop I have.
> 
> It seems to crash the laptop (silently lock it up, actually)
> fairly quickly.
> 
> Did I recall someone else mentioning that?
> 
> Would everyone agree that it's not a "good thing" for a user-mode
> program to be able to lock up the OS?
> 
>   - Dave Rivers -
> 
> 
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-hackers" in the body of the message
> 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



RE: setiathome crashes 3.2?

1999-06-27 Thread Richard Flores
unsubscribe freebsd-hackers
end

> -Original Message-
> From: Thomas David Rivers [mailto:riv...@dignus.com]
> Sent: Sunday, June 27, 1999 7:09 PM
> To: freebsd-hackers@FreeBSD.ORG
> Subject: setiathome crashes 3.2?
> 
> 
> 
> I seem to recall seeing this someone (this may not be the
> right list.)
> 
> But - I downloaded the 3.2  s...@home and starting running it
> on a left-over 75mhz laptop I have.
> 
> It seems to crash the laptop (silently lock it up, actually)
> fairly quickly.
> 
> Did I recall someone else mentioning that?
> 
> Would everyone agree that it's not a "good thing" for a user-mode
> program to be able to lock up the OS?
> 
>   - Dave Rivers -
> 
> 
> 
> To Unsubscribe: send mail to majord...@freebsd.org
> with "unsubscribe freebsd-hackers" in the body of the message
> 


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Inetd and wrapping.

1999-06-27 Thread Ben Rosengart

On Fri, 25 Jun 1999, David Malone wrote:

> Some people think that doing the hosts.allow lookup is too expensive
> for some services but not others. (It requires opening /etc/hosts.allow,
> reading it in line by line and possibly doing DNS lookups).

I would hope that anyone concerned about speed would be writing tcp-wrappers
rules with numbers, not names.

--
 Ben

UNIX Systems Engineer, Skunk Group
StarMedia Network, Inc.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Inetd and wrapping.

1999-06-27 Thread Ben Rosengart
On Fri, 25 Jun 1999, David Malone wrote:

> Some people think that doing the hosts.allow lookup is too expensive
> for some services but not others. (It requires opening /etc/hosts.allow,
> reading it in line by line and possibly doing DNS lookups).

I would hope that anyone concerned about speed would be writing tcp-wrappers
rules with numbers, not names.

--
 Ben

UNIX Systems Engineer, Skunk Group
StarMedia Network, Inc.



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Inetd and wrapping.

1999-06-27 Thread John Baldwin


On 25-Jun-99 Drew Eckhardt wrote:
> In article <[EMAIL PROTECTED]> you write:
>>
>>Here's one possibility,  it adds a a wrap/nowrap field that goes beside the
>>wait/nowait field, so you would have:
>>
>>ftp stream  tcp nowait  wrap root   /usr/libexec/ftpd   ftpd
>>-l
> 
> Breaking backwards compatability is evil.  Do something like this instead -
> 
> ftp stream  tcp nowait&wrap root   /usr/libexec/ftpd   ftpd
> -l

That's easy to change (just change where it reads the wrap/nowrap whatever in
the last half of the patch).  It was more of a proof of concept to show that it
could be easily done in 10 minutes or so.

---

John Baldwin <[EMAIL PROTECTED]> -- http://members.freedomnet.com/~jbaldwin/
PGP Key: http://members.freedomnet.com/~jbaldwin/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.freebsd.org


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Inetd and wrapping.

1999-06-27 Thread John Baldwin

On 25-Jun-99 Drew Eckhardt wrote:
> In article <199906242353.taa06...@smtp4.erols.com> you write:
>>
>>Here's one possibility,  it adds a a wrap/nowrap field that goes beside the
>>wait/nowait field, so you would have:
>>
>>ftp stream  tcp nowait  wrap root   /usr/libexec/ftpd   ftpd
>>-l
> 
> Breaking backwards compatability is evil.  Do something like this instead -
> 
> ftp stream  tcp nowait&wrap root   /usr/libexec/ftpd   ftpd
> -l

That's easy to change (just change where it reads the wrap/nowrap whatever in
the last half of the patch).  It was more of a proof of concept to show that it
could be easily done in 10 minutes or so.

---

John Baldwin  -- http://members.freedomnet.com/~jbaldwin/
PGP Key: http://members.freedomnet.com/~jbaldwin/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.freebsd.org


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Inetd and wrapping.

1999-06-27 Thread John Baldwin


On 25-Jun-99 Aaron Smith wrote:
> On Fri, 25 Jun 1999 10:14:48 +0200, Sheldon Hearn writes:
>>I think I prefer the suggestion I saw from someone else, which would
>>allow
>>
>>ftp   stream  tcp nowait/10/10/wrap root  ...
>>
>>This can be done in such a way as to be backward compatible. Looks like
>>something for the week-end, if I can convince my wife that it's a good
>>idea. :-)
> 
> could you please restate the argument for this? i still haven't heard a
> decent reason for this sort of conf format perturbation. every small whack
> like this makes freebsd weirder to administrate -- there is a value to
> sharing the same inetd.conf format with lots of other platforms.
> 
> if people have their undies in a wad over this, can't they compile inetd
> without LIBWRAP?

Ahem..

Let's say I have two services, foo and bar, with food and bard.  I want to wrap
food, but *NOT* bard and they are both in /etc/inetd.conf.  How do you propose
to solve this with the internal wrapping (which is a good idea, IMO as it
eliminates an exec())?

> aaron

---

John Baldwin <[EMAIL PROTECTED]> -- http://members.freedomnet.com/~jbaldwin/
PGP Key: http://members.freedomnet.com/~jbaldwin/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.freebsd.org


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Inetd and wrapping.

1999-06-27 Thread John Baldwin

On 25-Jun-99 Aaron Smith wrote:
> On Fri, 25 Jun 1999 10:14:48 +0200, Sheldon Hearn writes:
>>I think I prefer the suggestion I saw from someone else, which would
>>allow
>>
>>ftp   stream  tcp nowait/10/10/wrap root  ...
>>
>>This can be done in such a way as to be backward compatible. Looks like
>>something for the week-end, if I can convince my wife that it's a good
>>idea. :-)
> 
> could you please restate the argument for this? i still haven't heard a
> decent reason for this sort of conf format perturbation. every small whack
> like this makes freebsd weirder to administrate -- there is a value to
> sharing the same inetd.conf format with lots of other platforms.
> 
> if people have their undies in a wad over this, can't they compile inetd
> without LIBWRAP?

Ahem..

Let's say I have two services, foo and bar, with food and bard.  I want to wrap
food, but *NOT* bard and they are both in /etc/inetd.conf.  How do you propose
to solve this with the internal wrapping (which is a good idea, IMO as it
eliminates an exec())?

> aaron

---

John Baldwin  -- http://members.freedomnet.com/~jbaldwin/
PGP Key: http://members.freedomnet.com/~jbaldwin/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.freebsd.org


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread der Mouse
>>  -f  The filesystem is forcibly unmounted.  Active special devices
>>  continue to work, but all other files return errors if further
>>  accesses are attempted.
> I think that returning errors is WRONG, unless [...]
> It means that you can't fix the problem with the filesystem and
> resume operations nicely afterwards;

I think I see part of the problem here.

You are thinking "unmount to fix problem, will remount later".

"umount -f" is more like "it's going away, dammit, and I'd rather crash
a few processes than have to take down the whole system".

It might be worthwhile having an option that causes attempted accesses
to hang until the filesystem comes back online, somewhat akin to
Auspex's filesystem "isolation".

der Mouse

   mo...@rodents.montreal.qc.ca
 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



setiathome crashes 3.2?

1999-06-27 Thread Anonymous


I seem to recall seeing this someone (this may not be the
right list.)

But - I downloaded the 3.2  Seti@home and starting running it
on a left-over 75mhz laptop I have.

It seems to crash the laptop (silently lock it up, actually)
fairly quickly.

Did I recall someone else mentioning that?

Would everyone agree that it's not a "good thing" for a user-mode
program to be able to lock up the OS?

- Dave Rivers -



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



setiathome crashes 3.2?

1999-06-27 Thread Thomas David Rivers

I seem to recall seeing this someone (this may not be the
right list.)

But - I downloaded the 3.2  s...@home and starting running it
on a left-over 75mhz laptop I have.

It seems to crash the laptop (silently lock it up, actually)
fairly quickly.

Did I recall someone else mentioning that?

Would everyone agree that it's not a "good thing" for a user-mode
program to be able to lock up the OS?

- Dave Rivers -



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro


On Sun, 27 Jun 1999 allb...@ece.cmu.edu wrote:

> On 27 Jun, Jason Thorpe wrote:
> +-
> |   Alexander Viro  wrote:
> |   > doesn't unmap the stuff. Oh, shit, there is such thing as pending
> |   > unlink... Does vgone() force it?
> |  
> |  Regarding unlink()... those aren't operations on vnodes.  Those are
> |  operations on the filesystem namespace, and are thus (correctly)
> |  unaffected.
> +--->8
> 
> I believe what he meant is "how is deallocation of a pending-unlink
> file whose only reference is an open fd which has been revoked dealt
> with"?
> 
> (To which my own answer would be:  "deallocated on close as usual, no
> reason to treat this case specially that I know of".)

When it's already remounted r/o?



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread allbery
On 27 Jun, To: thor...@nas.nasa.gov wrote:
+-
|  (To which my own answer would be:  "deallocated on close as usual, no
|  reason to treat this case specially that I know of".)
+--->8

Strike that, I was on the wrong page.  (Crossed threads re: general
revoke() on Linux)

-- 
brandon s. allbery  [os/2][linux][solaris][japh] allb...@kf8nh.apk.net
system administrator [WAY too many hats]   allb...@ece.cmu.edu
carnegie mellon / electrical and computer engineeringKF8NH
 We are Linux. Resistance is an indication that you missed the point.



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API y,

1999-06-27 Thread Brian F. Feldman
On Sun, 27 Jun 1999, Alexander Viro wrote:

> 
> As for the opening with no permissions - well, it would make *big* sense
> if we could narrow down the API and move chown(), chmod(), etc. into libc
> leaving f-variants in the kernel. Binary compatibility... Extreme variant
> might include {set,get}sockopt extended to files and doing both *stat and
> *ch{mod,own,flags} via that. Out of curiosity - did somebody on *BSD side
> play with that?
> 

Actually, instead of *big* sense, that makes *no* sense.

> 
> 
> To Unsubscribe: send mail to majord...@freebsd.org
> with "unsubscribe freebsd-hackers" in the body of the message
> 

 Brian Fundakowski Feldman  _ __ ___   ___ ___ ___  
 gr...@freebsd.org   _ __ ___ | _ ) __|   \ 
 FreeBSD: The Power to Serve!_ __ | _ \._ \ |) |
   http://www.FreeBSD.org/  _ |___/___/___/ 



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread allbery
On 27 Jun, Jason Thorpe wrote:
+-
|   Alexander Viro  wrote:
|   > doesn't unmap the stuff. Oh, shit, there is such thing as pending
|   > unlink... Does vgone() force it?
|  
|  Regarding unlink()... those aren't operations on vnodes.  Those are
|  operations on the filesystem namespace, and are thus (correctly)
|  unaffected.
+--->8

I believe what he meant is "how is deallocation of a pending-unlink
file whose only reference is an open fd which has been revoked dealt
with"?

(To which my own answer would be:  "deallocated on close as usual, no
reason to treat this case specially that I know of".)

-- 
brandon s. allbery  [os/2][linux][solaris][japh] allb...@kf8nh.apk.net
system administrator [WAY too many hats]   allb...@ece.cmu.edu
carnegie mellon / electrical and computer engineeringKF8NH
 We are Linux. Resistance is an indication that you missed the point.



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Colin Wood

Alexander Viro wrote:
> [1]
> BTW, how does NetBSD deal with HFS  forks?
> 

easy, it doesn't :-)  we don't currently have HFS support, mainly b/c the
only freeware implementations of it (that i'm aware of) are GPL'd, and no
one has been able to devote enough time to it to get a BSD-licensed
version.  although the darwin stuff is now available.  i'm not too sure
how much of it is useful (i haven't looked at it either, tho).

later.

colin



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Colin Wood
Alexander Viro wrote:
> [1]
> BTW, how does NetBSD deal with HFS  forks?
> 

easy, it doesn't :-)  we don't currently have HFS support, mainly b/c the
only freeware implementations of it (that i'm aware of) are GPL'd, and no
one has been able to devote enough time to it to get a BSD-licensed
version.  although the darwin stuff is now available.  i'm not too sure
how much of it is useful (i haven't looked at it either, tho).

later.

colin



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro



On Sun, 27 Jun 1999, Jason Thorpe wrote:

> Regarding unlink()... those aren't operations on vnodes.  Those are
> operations on the filesystem namespace, and are thus (correctly)
> unaffected.

Eh, wait. Those are operations on namespace, but at some moment you need
to clean the bit in inode bitmap. You can't do it before the last close()
and it definitely alters the filesystem. fsck will pick them up, but that
may be *not* a desired result. Dirty filesystem is definitely not desired
anyway.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro


On Sun, 27 Jun 1999, Jason Thorpe wrote:

> Regarding unlink()... those aren't operations on vnodes.  Those are
> operations on the filesystem namespace, and are thus (correctly)
> unaffected.

Eh, wait. Those are operations on namespace, but at some moment you need
to clean the bit in inode bitmap. You can't do it before the last close()
and it definitely alters the filesystem. fsck will pick them up, but that
may be *not* a desired result. Dirty filesystem is definitely not desired
anyway.



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro



On Mon, 28 Jun 1999, Doug Rabson wrote:
> I'm talking about the concept of a header file containing something like:
> 
>   #define FL_VFS  0
>   #define FL_FOOFS1
>   #define FD_BARFS2
>   ...
> 
> not being scalable.
> 
> Do you have a complete list of filesystem types? Are you prepared to act
> as an Assigned Number authority for that list. For this kind of problem,
> strings are a damn sight easier to manage in the long term.

Augh... It's ugly, indeed, but... sysctl() is not much nicer and all
systems in question manage to deal with it somehow. OTOH doing it as
strings... Hell knows. I'll look at it. Considering that HFS folks
had already asked for more than one value here (creator and type?) it may
be reasonable. I'm afraid that doing that may open the hell gates ;-/
'N' in *ANA can be 'namespace' as well as 'number'...

[1]
BTW, how does NetBSD deal with HFS  forks?


[1] cue current flamew^Wthreads on l-k regarding files-as-directories
hell.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro


On Mon, 28 Jun 1999, Doug Rabson wrote:
> I'm talking about the concept of a header file containing something like:
> 
>   #define FL_VFS  0
>   #define FL_FOOFS1
>   #define FD_BARFS2
>   ...
> 
> not being scalable.
> 
> Do you have a complete list of filesystem types? Are you prepared to act
> as an Assigned Number authority for that list. For this kind of problem,
> strings are a damn sight easier to manage in the long term.

Augh... It's ugly, indeed, but... sysctl() is not much nicer and all
systems in question manage to deal with it somehow. OTOH doing it as
strings... Hell knows. I'll look at it. Considering that HFS folks
had already asked for more than one value here (creator and type?) it may
be reasonable. I'm afraid that doing that may open the hell gates ;-/
'N' in *ANA can be 'namespace' as well as 'number'...

[1]
BTW, how does NetBSD deal with HFS  forks?


[1] cue current flamew^Wthreads on l-k regarding files-as-directories
hell.



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Jason Thorpe

On Sun, 27 Jun 1999 20:43:28 -0400 (EDT) 
 Alexander Viro <[EMAIL PROTECTED]> wrote:

 > Forced revoke()? But then there is mmap() and IIRC revoke() on *BSD
 > doesn't unmap the stuff. Oh, shit, there is such thing as pending
 > unlink... Does vgone() force it?

It doesn't unmap the region, but it doesn't allow any more page faults
from that backing vnode (the pager will get an error from the file system,
and thus send a SIGSEGV to the process), and no dirty pages can be cleaned
to that vnode.

I mean, you wouldn't invalidate any buffers the user read the file into
when the file was revoke()'d, would you? :-)

Regarding unlink()... those aren't operations on vnodes.  Those are
operations on the filesystem namespace, and are thus (correctly)
unaffected.

-- Jason R. Thorpe <[EMAIL PROTECTED]>



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Jason Thorpe
On Sun, 27 Jun 1999 20:43:28 -0400 (EDT) 
 Alexander Viro  wrote:

 > Forced revoke()? But then there is mmap() and IIRC revoke() on *BSD
 > doesn't unmap the stuff. Oh, shit, there is such thing as pending
 > unlink... Does vgone() force it?

It doesn't unmap the region, but it doesn't allow any more page faults
from that backing vnode (the pager will get an error from the file system,
and thus send a SIGSEGV to the process), and no dirty pages can be cleaned
to that vnode.

I mean, you wouldn't invalidate any buffers the user read the file into
when the file was revoke()'d, would you? :-)

Regarding unlink()... those aren't operations on vnodes.  Those are
operations on the filesystem namespace, and are thus (correctly)
unaffected.

-- Jason R. Thorpe 



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro



On Sun, 27 Jun 1999, der Mouse wrote:

> >> (clri didn't work?)
> > Never heard about clri (was under Linux).
> 
> May not have existed, then, which *would* explain it. :-)

# debugfs -w /dev/sda1
debugfs:  clri file
debugfs:  close

It exists, all right ;-) Even documented - man 8 debugfs and there you go.

> The NetBSD manpage doesn't say what happens if you "mount -o
> update,force,rdonly" when there are writeable descriptors open onto the
> filesystem, and then try to use those fds.  I would assume further
> attempts to write would produce errors (EROFS?), unless of course the
> filesystem has been re-remounted read/write.

Forced revoke()? But then there is mmap() and IIRC revoke() on *BSD
doesn't unmap the stuff. Oh, shit, there is such thing as pending
unlink... Does vgone() force it?



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro


On Sun, 27 Jun 1999, der Mouse wrote:

> >> (clri didn't work?)
> > Never heard about clri (was under Linux).
> 
> May not have existed, then, which *would* explain it. :-)

# debugfs -w /dev/sda1
debugfs:  clri file
debugfs:  close

It exists, all right ;-) Even documented - man 8 debugfs and there you go.

> The NetBSD manpage doesn't say what happens if you "mount -o
> update,force,rdonly" when there are writeable descriptors open onto the
> filesystem, and then try to use those fds.  I would assume further
> attempts to write would produce errors (EROFS?), unless of course the
> filesystem has been re-remounted read/write.

Forced revoke()? But then there is mmap() and IIRC revoke() on *BSD
doesn't unmap the stuff. Oh, shit, there is such thing as pending
unlink... Does vgone() force it?



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Francois-Rene Rideau

On Sun, Jun 27, 1999 at 07:33:32PM -0400, der Mouse wrote:
>> If you re-read the original message, the problem is what to do about
>> processes with open file descriptors on the partition [...]
> Yes, that's the most difficult part. [...] NetBSD manpage:
>  -f  The filesystem is forcibly unmounted.  Active special devices
>  continue to work, but all other files return errors if further
>  accesses are attempted.
I think that returning errors is WRONG,
unless specifically requested by fnctl().
It means that processes will get unexpected errors
from otherwise validly open filedescriptor.
It means that you can't fix the problem with the filesystem
and resume operations nicely afterwards;
or you will have to manually stop processes from userland before unmounting,
which would not be atomic and generate yet another race condition.
Robert seemed to favor atomically stopping processes.
I am personally in favor of defaulting to a blocking behavior.

>> How will you allow for such large table-walking to be compatible with
>> real-time kernel response?
> *What* large table-walking?  All this means you have to do is have
> every write check the relevant mount point to see if it's mounted
> read-only, for downgrading remounts, and mark the filesystem as gone,
> for forced unmounts.  (I suspect this is what deadfs is for.)
That's typical incremental behavior. Again, it's a matter of tradeoff:
do you want a big atomic operation once in a while and
simple operations every time, or complex incremental operations every time?
It's real-time response vs overall-time duration.
See GC for a field of CS where this trade-off has been beaten to death.
Also, the worry was most important in the case
of atomically stopping processes as recommended by Robert.

Hum. It looks like the need to avoid losing file descriptor information
and pending I/O requests would make it a good idea that there be a
mount mode without either read or write permissions,
similarly to opening files without read or write permissions.
Looks to me like an interesting alternative to deadfs, anyway...

>> Competition is _not_ about taunting each other for pride;
> I know this.  I even think most of the people involved know it.
Cool.

> But there seem to be a few - not many, but very poisonous - who seem to
> take any competition - indeed, almost any *difference* - as an
> opportunity for "we're better than you" egoboo.
"Hey, stupid, my underwear is nicer than yours!"
Hum. Let's just send those kiddies to /dev/null; uh, I mean, er, whatever.

[ "Faré" | VN: Уng-Vû Bân | Join the TUNES project!   http://www.tunes.org/  ]
[ FR: François-René Rideau | TUNES is a Useful, Nevertheless Expedient System ]
[ Reflection&Cybernethics  | Project for  a Free Reflective  Computing System ]
My opinions may have changed, but not the fact that I am right.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Francois-Rene Rideau
On Sun, Jun 27, 1999 at 07:33:32PM -0400, der Mouse wrote:
>> If you re-read the original message, the problem is what to do about
>> processes with open file descriptors on the partition [...]
> Yes, that's the most difficult part. [...] NetBSD manpage:
>  -f  The filesystem is forcibly unmounted.  Active special devices
>  continue to work, but all other files return errors if further
>  accesses are attempted.
I think that returning errors is WRONG,
unless specifically requested by fnctl().
It means that processes will get unexpected errors
from otherwise validly open filedescriptor.
It means that you can't fix the problem with the filesystem
and resume operations nicely afterwards;
or you will have to manually stop processes from userland before unmounting,
which would not be atomic and generate yet another race condition.
Robert seemed to favor atomically stopping processes.
I am personally in favor of defaulting to a blocking behavior.

>> How will you allow for such large table-walking to be compatible with
>> real-time kernel response?
> *What* large table-walking?  All this means you have to do is have
> every write check the relevant mount point to see if it's mounted
> read-only, for downgrading remounts, and mark the filesystem as gone,
> for forced unmounts.  (I suspect this is what deadfs is for.)
That's typical incremental behavior. Again, it's a matter of tradeoff:
do you want a big atomic operation once in a while and
simple operations every time, or complex incremental operations every time?
It's real-time response vs overall-time duration.
See GC for a field of CS where this trade-off has been beaten to death.
Also, the worry was most important in the case
of atomically stopping processes as recommended by Robert.

Hum. It looks like the need to avoid losing file descriptor information
and pending I/O requests would make it a good idea that there be a
mount mode without either read or write permissions,
similarly to opening files without read or write permissions.
Looks to me like an interesting alternative to deadfs, anyway...

>> Competition is _not_ about taunting each other for pride;
> I know this.  I even think most of the people involved know it.
Cool.

> But there seem to be a few - not many, but very poisonous - who seem to
> take any competition - indeed, almost any *difference* - as an
> opportunity for "we're better than you" egoboo.
"Hey, stupid, my underwear is nicer than yours!"
Hum. Let's just send those kiddies to /dev/null; uh, I mean, er, whatever.

[ "Faré" | VN: Уng-Vû Bân | Join the TUNES project!   http://www.tunes.org/  ]
[ FR: François-René Rideau | TUNES is a Useful, Nevertheless Expedient System ]
[ Reflection&Cybernethics  | Project for  a Free Reflective  Computing System ]
My opinions may have changed, but not the fact that I am right.


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



pseudo kernel dma/tee

1999-06-27 Thread Alfred Perlstein


Is there any support or plans for support of "kernel dma",
sort of like the aio stuff, however you just give the kernel
two file descriptors and perhaps some parameters (such as
seeking to a specific point on either or both files and
amount of data to be sent) and
the kernel will then do all the copying for you?

You could avoid a lot of work when doing proxy like
connections this way..

This would be like sendfile() however it would be possible 
to mix socket+socket or fd+fd.

Another interesting application would be to implement this
with an option of tee'ing the transfer into the process'
address space as well.

Just something that popped into my head while writing a 
mini proxy today...  It would save a lot of cycles for
certain apps.
 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]] 
systems administrator and programmer
Win Telecom - http://www.wintelcom.net/



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



pseudo kernel dma/tee

1999-06-27 Thread Alfred Perlstein

Is there any support or plans for support of "kernel dma",
sort of like the aio stuff, however you just give the kernel
two file descriptors and perhaps some parameters (such as
seeking to a specific point on either or both files and
amount of data to be sent) and
the kernel will then do all the copying for you?

You could avoid a lot of work when doing proxy like
connections this way..

This would be like sendfile() however it would be possible 
to mix socket+socket or fd+fd.

Another interesting application would be to implement this
with an option of tee'ing the transfer into the process'
address space as well.

Just something that popped into my head while writing a 
mini proxy today...  It would save a lot of cycles for
certain apps.
 
-Alfred Perlstein - [bri...@rush.net|bri...@wintelcom.net] 
systems administrator and programmer
Win Telecom - http://www.wintelcom.net/



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread der Mouse

>> (clri didn't work?)
> Never heard about clri (was under Linux).

May not have existed, then, which *would* explain it. :-)

>>> Another problem was the ability to change the mount status of a
>>> partition from read-write to read-only or to unmounted,
>> See NetBSD (and presumably other BSD) "mount -o update,rdonly"
>> and/or "umount -f".
> If you re-read the original message, the problem is what to do about
> processes with open file descriptors on the partition: stop them at
> once? stop them at first file access? block them instead? kill them?

Yes, that's the most difficult part.

The NetBSD manpage doesn't say what happens if you "mount -o
update,force,rdonly" when there are writeable descriptors open onto the
filesystem, and then try to use those fds.  I would assume further
attempts to write would produce errors (EROFS?), unless of course the
filesystem has been re-remounted read/write.

The manpage for umount says

 -f  The filesystem is forcibly unmounted.  Active special devices
 continue to work, but all other files return errors if further
 accesses are attempted.

I haven't looked at the relevant kernel code to see what *really*
happens.

> How will you allow for such large table-walking to be compatible with
> real-time kernel response?

*What* large table-walking?  All this means you have to do is have
every write check the relevant mount point to see if it's mounted
read-only, for downgrading remounts, and mark the filesystem as gone,
for forced unmounts.  (I suspect this is what deadfs is for.)

>>> I intend to put free unices in competition [...]
>> Reasonable as this sounds, I think the last thing we need is yet
>> another ground on which one free-unix can be doing the "nana nana
>> boo boo" taunt at another.
> Competition is _not_ about taunting each other for pride;

I know this.  I even think most of the people involved know it.

But there seem to be a few - not many, but very poisonous - who seem to
take any competition - indeed, almost any *difference* - as an
opportunity for "we're better than you" egoboo.

der Mouse

   [EMAIL PROTECTED]
 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread der Mouse
>> (clri didn't work?)
> Never heard about clri (was under Linux).

May not have existed, then, which *would* explain it. :-)

>>> Another problem was the ability to change the mount status of a
>>> partition from read-write to read-only or to unmounted,
>> See NetBSD (and presumably other BSD) "mount -o update,rdonly"
>> and/or "umount -f".
> If you re-read the original message, the problem is what to do about
> processes with open file descriptors on the partition: stop them at
> once? stop them at first file access? block them instead? kill them?

Yes, that's the most difficult part.

The NetBSD manpage doesn't say what happens if you "mount -o
update,force,rdonly" when there are writeable descriptors open onto the
filesystem, and then try to use those fds.  I would assume further
attempts to write would produce errors (EROFS?), unless of course the
filesystem has been re-remounted read/write.

The manpage for umount says

 -f  The filesystem is forcibly unmounted.  Active special devices
 continue to work, but all other files return errors if further
 accesses are attempted.

I haven't looked at the relevant kernel code to see what *really*
happens.

> How will you allow for such large table-walking to be compatible with
> real-time kernel response?

*What* large table-walking?  All this means you have to do is have
every write check the relevant mount point to see if it's mounted
read-only, for downgrading remounts, and mark the filesystem as gone,
for forced unmounts.  (I suspect this is what deadfs is for.)

>>> I intend to put free unices in competition [...]
>> Reasonable as this sounds, I think the last thing we need is yet
>> another ground on which one free-unix can be doing the "nana nana
>> boo boo" taunt at another.
> Competition is _not_ about taunting each other for pride;

I know this.  I even think most of the people involved know it.

But there seem to be a few - not many, but very poisonous - who seem to
take any competition - indeed, almost any *difference* - as an
opportunity for "we're better than you" egoboo.

der Mouse

   mo...@rodents.montreal.qc.ca
 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Doug Rabson

On Sun, 27 Jun 1999, Alexander Viro wrote:

> 
> 
> On Sun, 27 Jun 1999, Doug Rabson wrote:
> 
> > This looks viable as long as you don't use small integers to represent
> > FL_UFS etc. Having a single header defining constants for all filesystems
> 
>   Erm... sizeof(int)==4. I doubt that you will need more.
> 
> > just doesn't scale at all.
>   Sure. If you don't need fs-specific stuff -  and there
> you go. If you need some particular fs -  and 

I'm talking about the concept of a header file containing something like:

#define FL_VFS  0
#define FL_FOOFS1
#define FD_BARFS2
...

not being scalable.

Do you have a complete list of filesystem types? Are you prepared to act
as an Assigned Number authority for that list. For this kind of problem,
strings are a damn sight easier to manage in the long term.

> 
> > You still want a clearly defined set of FS independant flags so that the
> > application doesn't need to care what filesystem it is sitting on.
> 
>   And that's exactly the reason for FL_VFS vs. FL_FOOFS separation -
> some applications should be able to talk with the filesystem in the
> filesystem's terms *and* be sure that they will not mess with another fs;
> the rest shouldn't care for fs differences at all (aside of "did the
> sucker set the bits I wanted?" that you already have for SUID/SGID/sticky).
> 
>   I don't think that porting it to 4.4 will be difficult - all you
> need is a way to tell VOP_SETATTR what level are you talking to (most
> likely the same way as on the our side - add a field to the structure and 
> let the methods scratch their heads). I'm going to do the Linux variant
> and see how it will work. If somebody wants to do it with *BSD - fine, it
> shouldn't be a problem.

I'm sure the api would be easy to port.  I wouldn't accept any api for
FreeBSD which involved assigning numbers to filesystem types. It was too
painful to rid it of the last set of numbers from the old mount(2) call.

--
Doug Rabson Mail:  [EMAIL PROTECTED]
Nonlinear Systems Ltd.  Phone: +44 181 442 9037



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Improving the Unix API

1999-06-27 Thread Doug Rabson
On Sun, 27 Jun 1999, Alexander Viro wrote:

> 
> 
> On Sun, 27 Jun 1999, Doug Rabson wrote:
> 
> > This looks viable as long as you don't use small integers to represent
> > FL_UFS etc. Having a single header defining constants for all filesystems
> 
>   Erm... sizeof(int)==4. I doubt that you will need more.
> 
> > just doesn't scale at all.
>   Sure. If you don't need fs-specific stuff -  and there
> you go. If you need some particular fs -  and 

I'm talking about the concept of a header file containing something like:

#define FL_VFS  0
#define FL_FOOFS1
#define FD_BARFS2
...

not being scalable.

Do you have a complete list of filesystem types? Are you prepared to act
as an Assigned Number authority for that list. For this kind of problem,
strings are a damn sight easier to manage in the long term.

> 
> > You still want a clearly defined set of FS independant flags so that the
> > application doesn't need to care what filesystem it is sitting on.
> 
>   And that's exactly the reason for FL_VFS vs. FL_FOOFS separation -
> some applications should be able to talk with the filesystem in the
> filesystem's terms *and* be sure that they will not mess with another fs;
> the rest shouldn't care for fs differences at all (aside of "did the
> sucker set the bits I wanted?" that you already have for SUID/SGID/sticky).
> 
>   I don't think that porting it to 4.4 will be difficult - all you
> need is a way to tell VOP_SETATTR what level are you talking to (most
> likely the same way as on the our side - add a field to the structure and 
> let the methods scratch their heads). I'm going to do the Linux variant
> and see how it will work. If somebody wants to do it with *BSD - fine, it
> shouldn't be a problem.

I'm sure the api would be easy to port.  I wouldn't accept any api for
FreeBSD which involved assigning numbers to filesystem types. It was too
painful to rid it of the last set of numbers from the old mount(2) call.

--
Doug Rabson Mail:  d...@nlsystems.com
Nonlinear Systems Ltd.  Phone: +44 181 442 9037




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Francois-Rene Rideau
On Sun, Jun 27, 1999 at 12:58:05PM -0400, der Mouse wrote:
> As I think someone already mentioned, BSD has chflags(), [...]
Yup.

>> Robert had to hand-remove the immutable flag
>> (I guess, by accessing the relevant block directly).
> (clri didn't work?)
Never heard about clri (was under Linux). And I dunno what Robert did.
I will ask him, if it matters.

> funlink makes no sense [...] unlink() operates on names, not files [...]
Oops. Indeed. The thinko is purely mine.

> I've often wanted open-with-no-access in conjunction with fchdir().
> This is because you need only execute access to set your cwd to a
> directory, but there's no way to get an fd on a mode-111 directory.
Again and again, open-with-no-access definitely seems
to have lots of applications.

>> flink (make a new directory link for file given by descriptor),
flink() combined with the ability to create an unlinked file
in a given filesystem would allow for safe temporaries
without race conditions, that could be "published" when ready.

>> freadlink (read link from a file descriptor opened with O_NULL),
>> fexec (execute the binary that we checked), etc.
> freadlink() implies that open() with O_NULL has the peculiar property
> that, unlike all other open()s, it doesn't follow terminal symlinks.
I suggested that there could be a flag O_DONTFOLLOWLINK in such cases;
I'm not fully sure the feature, but it would allow to set flags on symlinks,
and other goodies.

> While I think there are ways symlinks could be improved, I don't think
> this is one of them.  I can't see any use for opening a symlink except
> use of write() to atomically make the link point somewhere different,
> and I'd prefer to do that by making symlink() do that when the link
> already exists and some appropriate condition is met.
Well, I can imagine opening them to lock them,
so as to prevent other people from making them point somewhere else,
as well as change some filesystem attributes on the right thing, etc.
Again, open() allows locking and prevents race conditions.

>> Of course, you'll want to be able to fcntl(fd,F_SETFL,O_RDWR)
>> or something equivalent, to upgrade your access mode
>> on a file you opened with O_NULL.
> The security weenie in me is _really_ unsure that the ability to
> increase the access modes on an open fd is a good idea.
Well, there could be a flag O_NOINCREASEACCESS to prevent
further increasing of access modes (by e.g. children),
if you that makes you safer.
And of course, increasing access mode
is subject to usual permission checking.

>> Another problem was the ability to change the mount status of a partition
>> from read-write to read-only or to unmounted,
> See NetBSD (and presumably other BSD) "mount -o update,rdonly" and/or
> "umount -f".  (Last I tried, the latter didn't work as it should, but
> that's a matter of fixing bugs rather than introducing new features.)
If you re-read the original message, the problem is what to do
about processes with open file descriptors on the partition:
stop them at once? stop them at first file access?
block them instead? kill them? Will you do it atomically?
How will you allow for such large table-walking to be compatible
with real-time kernel response? [Hint: either use incremental
data-structures, or don't be atomic and be interruptible instead.]

>> Finally, we discussed about saving _and restoring_ the state of a process,
>> another hack that he did once to preserve a long-winded calculation
>> from the service shutdown of a big unix computer.
> I did this once, long long ago, under (I think) 4.3.  I found that I
> couldn't just dump core, though I forget why.  As for the open file
> descriptor question, I punted - I made the relevant call fail unless
> the process had no fds open.
Again, the difficult part is precisely about fd handling;
and the suggested feature of whole-computer save&restore
(where external connections will still be a problem)
similarly required that device drivers be able to dump restorable state.

>> By posting on all free unix kernel mailing-list I know,
>> I intend to put free unices in competition as to which
>> will implement these features first.
> Reasonable as this sounds, I think the last thing we need is yet
> another ground on which one free-unix can be doing the "nana nana boo
> boo" taunt at another.
Competition is _not_ about taunting each other for pride;
it's about striving to be the best we can in an atmosphere
of creative diversity whereby people copy each other's good ideas
and drop everyone's bad ideas. Diversity and free competition
increase the odds of good and bad ideas being recognized as what they are,
first by one, then by everyone,
which benefits to everyone in the form of positive evolution.
But let's reserve such meta-technical discussions to another forum.

>> As for the opening with no permissions - well, it would make *big*
>> sense if we could narrow down the API and move chown(), chmod(), etc.
>> into libc leaving f-variants in the kernel.
> I re

Re: Microsoft performance (was: All this and documentation too?

1999-06-27 Thread Peter Jeremy

Nick Hibma <[EMAIL PROTECTED]> wrote:
>> Programmers need documentation too.
>
>And they are going to scream like mad if there isn't any. But in the end
>they start reading the code anyway, even if there is docu, because they 
>don't trust anything but their own eyes and brain.
>
>It's all documented in C anyway.

Not really.  The C code defines what a piece of code is doing and how
it does it.  It does not explain why it is doing what it is doing,
and most importantly, why it is doing it the way that it is.

In many cases, the code might be written the way it is because that's
the first thing that popped into the author's head.  In this case, it
might not matter if the code is substantially re-arranged.

In some cases, the code is written in a particular way because the
`more obvious' ways of writing the code didn't meet the author's
requirements.  Whilst it possible that the particular requirement was
`this code must be unintelligible', it's more likely to be a subtle
interaction with some other subsystem.

Peter



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Microsoft performance (was: All this and documentation too?

1999-06-27 Thread Peter Jeremy
Nick Hibma  wrote:
>> Programmers need documentation too.
>
>And they are going to scream like mad if there isn't any. But in the end
>they start reading the code anyway, even if there is docu, because they 
>don't trust anything but their own eyes and brain.
>
>It's all documented in C anyway.

Not really.  The C code defines what a piece of code is doing and how
it does it.  It does not explain why it is doing what it is doing,
and most importantly, why it is doing it the way that it is.

In many cases, the code might be written the way it is because that's
the first thing that popped into the author's head.  In this case, it
might not matter if the code is substantially re-arranged.

In some cases, the code is written in a particular way because the
`more obvious' ways of writing the code didn't meet the author's
requirements.  Whilst it possible that the particular requirement was
`this code must be unintelligible', it's more likely to be a subtle
interaction with some other subsystem.

Peter



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



[Call for review] apmd for FreeBSD

1999-06-27 Thread Mitsuru IWASAKI

Hi,

I'm ready to import apmd into freefall CVS repository.
Now manpage (first version) and patch for CURRENT kernel were prepared :)

Please review them before my commit.  Any comments, suggestions, 
corrections are very appreciated.
The latest (and final?) version of apmd package is available at:

apmd(8):
http://home.jp.freebsd.org/~iwasaki/apm/19990628/apmd-usr.sbin.tar.gz

CURRENT kernel patch (as of 19990628):
http://home.jp.freebsd.org/~iwasaki/apm/19990628/apmd-sys-CURRENT.diff.gz

No changes were made for previous PAO3 and 3.2-RELEASE kernel patch:
http://home.jp.freebsd.org/~iwasaki/apm/19990610/apmd-sys-PAO3.diff.gz
http://home.jp.freebsd.org/~iwasaki/apm/19990610/apmd-sys-R320.diff.gz

Thanks.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



[Call for review] apmd for FreeBSD

1999-06-27 Thread Mitsuru IWASAKI
Hi,

I'm ready to import apmd into freefall CVS repository.
Now manpage (first version) and patch for CURRENT kernel were prepared :)

Please review them before my commit.  Any comments, suggestions, 
corrections are very appreciated.
The latest (and final?) version of apmd package is available at:

apmd(8):
http://home.jp.freebsd.org/~iwasaki/apm/19990628/apmd-usr.sbin.tar.gz

CURRENT kernel patch (as of 19990628):
http://home.jp.freebsd.org/~iwasaki/apm/19990628/apmd-sys-CURRENT.diff.gz

No changes were made for previous PAO3 and 3.2-RELEASE kernel patch:
http://home.jp.freebsd.org/~iwasaki/apm/19990610/apmd-sys-PAO3.diff.gz
http://home.jp.freebsd.org/~iwasaki/apm/19990610/apmd-sys-R320.diff.gz

Thanks.


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



sio2 often fails to initialize

1999-06-27 Thread Bart Trzynadlowski

Hi,
I set about to reconfigure my kernel and everything works great
except for sio2. My COM3 port is unusual, it has the port address 0x3e8
with an IRQ of 10 so I commented out the relevant sio2 line in my
configuration file and replaced it with this:

device  sio2at isa? port 0x3e8 tty irq 10 vector siointr

It works fine the first time I boot FreeBSD. In fact, it also works if I
reboot BUT if I use the modem on cuaa2 (for user PPP) during a session
and then reboot sio2 is no longer found.
Before I used the GENERIC kernel and /boot/loader set the port,
irq, and flags for me. But even if I book my new kernel through
/boot/loader (I just boot straight to the kernel with my new one
regularly) it doesn't help. I suspect this has something to do with the
modem not properly hanging up or the COM port not being de-initialized.
But why would it work most of the time with the GENERIC kernel?

Thanks,

Bart Trzynadlowski




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



sio2 often fails to initialize

1999-06-27 Thread Bart Trzynadlowski
Hi,
I set about to reconfigure my kernel and everything works great
except for sio2. My COM3 port is unusual, it has the port address 0x3e8
with an IRQ of 10 so I commented out the relevant sio2 line in my
configuration file and replaced it with this:

device  sio2at isa? port 0x3e8 tty irq 10 vector siointr

It works fine the first time I boot FreeBSD. In fact, it also works if I
reboot BUT if I use the modem on cuaa2 (for user PPP) during a session
and then reboot sio2 is no longer found.
Before I used the GENERIC kernel and /boot/loader set the port,
irq, and flags for me. But even if I book my new kernel through
/boot/loader (I just boot straight to the kernel with my new one
regularly) it doesn't help. I suspect this has something to do with the
modem not properly hanging up or the COM port not being de-initialized.
But why would it work most of the time with the GENERIC kernel?

Thanks,

Bart Trzynadlowski




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Gandhi woulda smacked you
On Sun, 27 Jun 1999, der Mouse wrote:

# > Robert had to hand-remove the immutable flag
# > (I guess, by accessing the relevant block directly).
# 
# (clri didn't work?)

Obviously the guy thinks along the lines that you need a file descriptor
to do things to files.  That, or he didn't want to do an fsck on the
partition once he was done.

# 
# > Indeed, the "open without access rights"
# > is useful not only to modify attributes and do other ioctl's,
# > but also to effect all operations that should be done w/o the ability
# > to open for either read or write
# > (fstat, funlink, ioctl, fchown, fchmod, fsync),

You mean like stat, unlink, chown, chmod?

Why in the world are you going to fsync a file with which you haven't
done anything?

The only one up there that makes sense is ioctl.

# 
# funlink makes no sense, unless the fd it takes is the fd of a directory
# and you pass in the name of the entry to be removed - which I imagine
# is not what most people will think when they think of an fd-based
# variant of unlink.  unlink() operates on names, not files, after all.

I've wanted an fclri(fd) which would clear the dev/ino attached to the
fd, but there's no clean way to do that as the system would then have
to search for all instances of that ino on that dev, and that's something
the system has no business doing.  namei (name-to-inode) is necessary,
and also easier than doing the reverse since name-to-inode mapping
is unique (because pathnames are unique) while inode-to-name mapping
is not (because there are hard links, i.e. multiple names can refer
to the same inode).  [read that carefully, it looks contradictory but
isn't.]

# I've often wanted open-with-no-access in conjunction with fchdir().
# This is because you need only execute access to set your cwd to a
# directory, but there's no way to get an fd on a mode-111 directory.

Playing the Daemon's advocate, here...
What use is a descriptor into a directory you can't read?  What's
the point of fchdir(dd->fd) if you can't figure out where you're going
from there?  You may as well use chdir(dir).

# While I think there are ways symlinks could be improved, I don't think
# this is one of them.  I can't see any use for opening a symlink except
# use of write() to atomically make the link point somewhere different,
# and I'd prefer to do that by making symlink() do that when the link
# already exists and some appropriate condition is met.

That's a dicey proposition.  We already have quite a few "appropriate
condition" cases, and I think we want to avoid special-casing a whole
slough of conditions.

# > Of course, you'll want to be able to fcntl(fd,F_SETFL,O_RDWR)
# > or something equivalent, to upgrade your access mode
# > on a file you opened with O_NULL.
# 
# The security weenie in me is _really_ unsure that the ability to
# increase the access modes on an open fd is a good idea.

Nah.  fd's are inevitably associated with vnodes (which don't get freed
until the last close()); if the vnode doesn't map out to the appropriate
permissions, the fcntl() would fail.

# > About namei() and large directories, Robert suggested
# > that news servers, and other large databases
# > (terminfo, that web cache, and many more come to my mind),
# > should use special database libraries with a well-defined API
# > (possibly inspired by the filesystem interface),
# > rather than abuse the filesystem API as they do;
# 
# At least one news system does this now, I think - instead of keeping
# each post in a separate file, it uses one huge file and does its own
# space allocation out of it.

Another problem with making filesystems for news was that you had to tune
cpg and bpi way down and ipg way up.

When fscking the block device was actually possible, it was also faster
to fsck the block device on a device full of symlinks (but that's a horse
of a different colour, I realise...).  Go figure.

# > Another problem was the ability to change the mount status of a partition
# > from read-write to read-only or to unmounted,
# 
# See NetBSD (and presumably other BSD) "mount -o update,rdonly" and/or
# "umount -f".  (Last I tried, the latter didn't work as it should, but
# that's a matter of fixing bugs rather than introducing new features.)

...really?  umount -f always works for me.

There's a bug running around, though, at the end of a shutdown which prevents
me from umounting /var for some reason (fstat shows nothing).

# > Finally, we discussed about saving _and restoring_ the state of a process,
# > another hack that he did once to preserve a long-winded calculation
# > from the service shutdown of a big unix computer.
# 
# I did this once, long long ago, under (I think) 4.3.  I found that I
# couldn't just dump core, though I forget why.  As for the open file
# descriptor question, I punted - I made the relevant call fail unless
# the process had no fds open.

Yeah, there's just no way to restore fd state from saved state, since
that would require locking that particular set 

Re: building thread-safe Xlibs

1999-06-27 Thread Jeroen Ruigrok/Asmodai

* Francis Jordan ([EMAIL PROTECTED]) [990626 06:03]:
>  xc/include/Xos_r.h
> 
> which contains definitions of same (basically, pwd.h wrappers) for various 
> platforms, but not FreeBSD (I guess at the time FreeBSD didn't have threads). 
> Unfortunately, the wrappers for other platforms are no good, as FreeBSD's pwd
> structures are different from everything else.

Hmmm, one thing that's still missing from at least the POSIX threads is
pthread_cancel. Plus that there's no such thing as a libpthread.

Thread support (POSIX) has a long way to come in FreeBSD, but I lack
the cloo-by-four to do it.

-- 
Jeroen Ruigrok van der Werven  asmodai(at)wxs.nl
The BSD Programmer's Documentation Project 
Network/Security Specialist   BSD: Technical excellence at it's best
Cum angelis et pueris, fideles inveniamur. Quis est iste Rex gloriae...?


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: building thread-safe Xlibs

1999-06-27 Thread Jeroen Ruigrok/Asmodai
* Francis Jordan (fran...@netscape.net) [990626 06:03]:
>  xc/include/Xos_r.h
> 
> which contains definitions of same (basically, pwd.h wrappers) for various 
> platforms, but not FreeBSD (I guess at the time FreeBSD didn't have threads). 
> Unfortunately, the wrappers for other platforms are no good, as FreeBSD's pwd
> structures are different from everything else.

Hmmm, one thing that's still missing from at least the POSIX threads is
pthread_cancel. Plus that there's no such thing as a libpthread.

Thread support (POSIX) has a long way to come in FreeBSD, but I lack
the cloo-by-four to do it.

-- 
Jeroen Ruigrok van der Werven  asmodai(at)wxs.nl
The BSD Programmer's Documentation Project 
Network/Security Specialist   BSD: Technical excellence at it's best
Cum angelis et pueris, fideles inveniamur. Quis est iste Rex gloriae...?


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



ipfilter volunteer

1999-06-27 Thread Guido van Rooij

I'd like to volunteer to maintain ipfilter. I already told several people
at the usenix conference, but as I have seen others taking interest as
well, it seems right to at least spread it more publicly.

I am still waiting for a machine I won at the conference to start on it
though so it might take some weeks before seeing some action.

Current plans are to import a more recent version of ipfilter followed
by having a look at features implemented by ipfw that are currently
missing in ipfilter.

-Guido


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



ipfilter volunteer

1999-06-27 Thread Guido van Rooij
I'd like to volunteer to maintain ipfilter. I already told several people
at the usenix conference, but as I have seen others taking interest as
well, it seems right to at least spread it more publicly.

I am still waiting for a machine I won at the conference to start on it
though so it might take some weeks before seeing some action.

Current plans are to import a more recent version of ipfilter followed
by having a look at features implemented by ipfw that are currently
missing in ipfilter.

-Guido


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro


On Sun, 27 Jun 1999, Jan-Simon Pendry wrote:

> Alexander Viro wrote:
> > Proposed API on the Linux side being
> > int chflags(name, level, oldp, newp); where level is FL_VFS for generic
> > attirbutes (fs may map them on its own set) and FL_{UFS,EXT2,...} for raw
> > flags - corresponding filesystem is free to interpret the thing as it
> > likes and should set the generic attributes in the right way. 
> 
> if linux introduces a different API (ie not just an extension of
> the existing bsd API) then please do *not* call it "chflags".
;-/ Yes, it makes sense.

> it took years just to get over the bsd vs. svr4 gettimeofday()
> fiasco.  btw, what's the proposed API for getting the current
> attribute set?

oldp == NULL ;-)



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro


On Sun, 27 Jun 1999, der Mouse wrote:

> > Another problem was the ability to change the mount status of a partition
> > from read-write to read-only or to unmounted,
> 
> See NetBSD (and presumably other BSD) "mount -o update,rdonly" and/or
> "umount -f".  (Last I tried, the latter didn't work as it should, but
> that's a matter of fixing bugs rather than introducing new features.)

mount -o remount,ro on Linux. What was the problem? Indeed you can't do it
if you have files opened for write there (or pending removal of files
from unlinks), but that limitation is reasonable, IMHO.

> > As for the opening with no permissions - well, it would make *big*
> > sense if we could narrow down the API and move chown(), chmod(), etc.
> > into libc leaving f-variants in the kernel.
> 
> I really don't like that.  The reasons why are (1) this means you have
> to have an fd free to do them; (2) it triples the number of user/kernel
> crossings involved.

The former is not too terrible, but the latter... Yup.

> > Extreme variant might include {set,get}sockopt extended to files and
> > doing both *stat and *ch{mod,own,flags} via that.
> 
> If done, I think the name should be changed.  They are ?etSOCKopt,
> after all.  I'm not fond of this, though; it amounts to returning to
> using ioctl() for the tasks - albeit with a slightly different name.

The *only* way to make it reasonable would be to have a hierarchical
namespace for the options. Otherwise you are just getting the ioctl()
mess, and that's the last thing I'ld like to see.



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



No Subject

1999-06-27 Thread W Gerald Hicks




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



[no subject]

1999-06-27 Thread W Gerald Hicks



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Jan-Simon Pendry
Alexander Viro wrote:
> Proposed API on the Linux side being
> int chflags(name, level, oldp, newp); where level is FL_VFS for generic
> attirbutes (fs may map them on its own set) and FL_{UFS,EXT2,...} for raw
> flags - corresponding filesystem is free to interpret the thing as it
> likes and should set the generic attributes in the right way. 

if linux introduces a different API (ie not just an extension of
the existing bsd API) then please do *not* call it "chflags".
it took years just to get over the bsd vs. svr4 gettimeofday()
fiasco.  btw, what's the proposed API for getting the current
attribute set?

jan-simon.


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro


On Sun, 27 Jun 1999, Doug Rabson wrote:

> This looks viable as long as you don't use small integers to represent
> FL_UFS etc. Having a single header defining constants for all filesystems

Erm... sizeof(int)==4. I doubt that you will need more.

> just doesn't scale at all.
Sure. If you don't need fs-specific stuff -  and there
you go. If you need some particular fs -  and 

> You still want a clearly defined set of FS independant flags so that the
> application doesn't need to care what filesystem it is sitting on.

And that's exactly the reason for FL_VFS vs. FL_FOOFS separation -
some applications should be able to talk with the filesystem in the
filesystem's terms *and* be sure that they will not mess with another fs;
the rest shouldn't care for fs differences at all (aside of "did the
sucker set the bits I wanted?" that you already have for SUID/SGID/sticky).

I don't think that porting it to 4.4 will be difficult - all you
need is a way to tell VOP_SETATTR what level are you talking to (most
likely the same way as on the our side - add a field to the structure and 
let the methods scratch their heads). I'm going to do the Linux variant
and see how it will work. If somebody wants to do it with *BSD - fine, it
shouldn't be a problem.



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Bill Sommerfeld
>   Right. Except that UFS has not only generic attibutes. For example,
> you have UF_NODUMP and SF_ARCHIVED. The *only* place in the /sys you
> mention the former is sys/stat.h

Well, right, because backup/restore aren't part of the kernel...

> (BTW, you don't even map it on EXT2_NODUMP_FL).

This was presumably an oversight; I've reported it as a bug.

- Bill


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Doug Rabson
On Sun, 27 Jun 1999, Alexander Viro wrote:

> 
> 
> On Sun, 27 Jun 1999, Bill Sommerfeld wrote:
> 
> > > Usage of ioctl() on Linux was a bad idea and it's going to be fixed. More
> > > or less in the same direction, not exactly the same - 4.4 chflags() works
> > > fine for UFS and leaves other filesystems to map what they can into the
> > > UFS set. 
> > 
> > > Which is bogus - immutable is not a UFS attribute, it's VFS one.
> > 
> > Well, I'd argue that Berkeley defined a bunch of VFS attributes, and
> > then implemented them natively in UFS and LFS; other non-native
> > filesystems have to map their concepts of other file attributes (e.g.,
> > dates, permissions, etc.,) into the native VFS concepts.
> 
>   Right. Except that UFS has not only generic attibutes. For example,
> you have UF_NODUMP and SF_ARCHIVED. The *only* place in the /sys you
> mention the former is sys/stat.h (BTW, you don't even map it on
> EXT2_NODUMP_FL). The latter is mentioned only in the msdosfs/msdosfs_vnops.c.
> Hardly a VFS flag, right?
>   Proposed API on the Linux side being
> int chflags(name, level, oldp, newp); where level is FL_VFS for generic
> attirbutes (fs may map them on its own set) and FL_{UFS,EXT2,...} for raw
> flags - corresponding filesystem is free to interpret the thing as it
> likes and should set the generic attributes in the right way. If you are
> trying to talk with the wrong filesystem (i.e. the level is not FL_VFS and
> not FL_) you are getting an error. If
> oldp is not NULL *oldp contains the attributes to set. if newp is not
> NULL *newp will contain the attributes *after* operation. IMO it's cleaner
> than pushing all attributes into the single bitmap.

This looks viable as long as you don't use small integers to represent
FL_UFS etc. Having a single header defining constants for all filesystems
just doesn't scale at all.

You still want a clearly defined set of FS independant flags so that the
application doesn't need to care what filesystem it is sitting on.

--
Doug Rabson Mail:  d...@nlsystems.com
Nonlinear Systems Ltd.  Phone: +44 181 442 9037




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro


On Sun, 27 Jun 1999, Bill Sommerfeld wrote:

> > Usage of ioctl() on Linux was a bad idea and it's going to be fixed. More
> > or less in the same direction, not exactly the same - 4.4 chflags() works
> > fine for UFS and leaves other filesystems to map what they can into the
> > UFS set. 
> 
> > Which is bogus - immutable is not a UFS attribute, it's VFS one.
> 
> Well, I'd argue that Berkeley defined a bunch of VFS attributes, and
> then implemented them natively in UFS and LFS; other non-native
> filesystems have to map their concepts of other file attributes (e.g.,
> dates, permissions, etc.,) into the native VFS concepts.

Right. Except that UFS has not only generic attibutes. For example,
you have UF_NODUMP and SF_ARCHIVED. The *only* place in the /sys you
mention the former is sys/stat.h (BTW, you don't even map it on
EXT2_NODUMP_FL). The latter is mentioned only in the msdosfs/msdosfs_vnops.c.
Hardly a VFS flag, right?
Proposed API on the Linux side being
int chflags(name, level, oldp, newp); where level is FL_VFS for generic
attirbutes (fs may map them on its own set) and FL_{UFS,EXT2,...} for raw
flags - corresponding filesystem is free to interpret the thing as it
likes and should set the generic attributes in the right way. If you are
trying to talk with the wrong filesystem (i.e. the level is not FL_VFS and
not FL_) you are getting an error. If
oldp is not NULL *oldp contains the attributes to set. if newp is not
NULL *newp will contain the attributes *after* operation. IMO it's cleaner
than pushing all attributes into the single bitmap.



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread der Mouse
> He realized that the device had an immutable attribute.
> He tried to change the attribute with open() and ioctl()

As I think someone already mentioned, BSD has chflags(), which takes a
pathname.

> Robert had to hand-remove the immutable flag
> (I guess, by accessing the relevant block directly).

(clri didn't work?)

> Indeed, the "open without access rights"
> is useful not only to modify attributes and do other ioctl's,
> but also to effect all operations that should be done w/o the ability
> to open for either read or write
> (fstat, funlink, ioctl, fchown, fchmod, fsync),

funlink makes no sense, unless the fd it takes is the fd of a directory
and you pass in the name of the entry to be removed - which I imagine
is not what most people will think when they think of an fd-based
variant of unlink.  unlink() operates on names, not files, after all.

I've often wanted open-with-no-access in conjunction with fchdir().
This is because you need only execute access to set your cwd to a
directory, but there's no way to get an fd on a mode-111 directory.

> and could be used with new syscalls like
> flink (make a new directory link for file given by descriptor),
> freadlink (read link from a file descriptor opened with O_NULL),
> fexec (execute the binary that we checked), etc.

freadlink() implies that open() with O_NULL has the peculiar property
that, unlike all other open()s, it doesn't follow terminal symlinks.

While I think there are ways symlinks could be improved, I don't think
this is one of them.  I can't see any use for opening a symlink except
use of write() to atomically make the link point somewhere different,
and I'd prefer to do that by making symlink() do that when the link
already exists and some appropriate condition is met.

> Of course, you'll want to be able to fcntl(fd,F_SETFL,O_RDWR)
> or something equivalent, to upgrade your access mode
> on a file you opened with O_NULL.

The security weenie in me is _really_ unsure that the ability to
increase the access modes on an open fd is a good idea.

> About namei() and large directories, Robert suggested
> that news servers, and other large databases
> (terminfo, that web cache, and many more come to my mind),
> should use special database libraries with a well-defined API
> (possibly inspired by the filesystem interface),
> rather than abuse the filesystem API as they do;

At least one news system does this now, I think - instead of keeping
each post in a separate file, it uses one huge file and does its own
space allocation out of it.

> Another problem was the ability to change the mount status of a partition
> from read-write to read-only or to unmounted,

See NetBSD (and presumably other BSD) "mount -o update,rdonly" and/or
"umount -f".  (Last I tried, the latter didn't work as it should, but
that's a matter of fixing bugs rather than introducing new features.)

> Finally, we discussed about saving _and restoring_ the state of a process,
> another hack that he did once to preserve a long-winded calculation
> from the service shutdown of a big unix computer.

I did this once, long long ago, under (I think) 4.3.  I found that I
couldn't just dump core, though I forget why.  As for the open file
descriptor question, I punted - I made the relevant call fail unless
the process had no fds open.

> By posting on all free unix kernel mailing-list I know,
> I intend to put free unices in competition as to which
> will implement these features first.

Reasonable as this sounds, I think the last thing we need is yet
another ground on which one free-unix can be doing the "nana nana boo
boo" taunt at another.  Once upon a time I would have hoped the people
involved were sufficiently mature to avoid doing that, or responding
when on the receiving end of it - and many of them *are*, but I've been
involved in this scene too long to retain any real hope that *all* of
them are.

[And replying to another message...]

> 4.4 chflags() works fine for UFS and leaves other filesystems to map
> what they can into the UFS set.  Which is bogus - immutable is not a
> UFS attribute, it's VFS one.

Perhaps, but it's still something that the underlying filesystem has to
support.  Just because the API bit definitions happen to match what FFS
filesystems save on disk doesn't mean it's inherently an FFS thing.

> As for the opening with no permissions - well, it would make *big*
> sense if we could narrow down the API and move chown(), chmod(), etc.
> into libc leaving f-variants in the kernel.

I really don't like that.  The reasons why are (1) this means you have
to have an fd free to do them; (2) it triples the number of user/kernel
crossings involved.

> Extreme variant might include {set,get}sockopt extended to files and
> doing both *stat and *ch{mod,own,flags} via that.

If done, I think the name should be changed.  They are ?etSOCKopt,
after all.  I'm not fond of this, though; it amounts to returning to
using ioctl() for the tasks - albeit with a slightly diff

Re: Improving the Unix API

1999-06-27 Thread Bill Sommerfeld
> Usage of ioctl() on Linux was a bad idea and it's going to be fixed. More
> or less in the same direction, not exactly the same - 4.4 chflags() works
> fine for UFS and leaves other filesystems to map what they can into the
> UFS set. 

> Which is bogus - immutable is not a UFS attribute, it's VFS one.

Well, I'd argue that Berkeley defined a bunch of VFS attributes, and
then implemented them natively in UFS and LFS; other non-native
filesystems have to map their concepts of other file attributes (e.g.,
dates, permissions, etc.,) into the native VFS concepts.

- Bill


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro


On Sun, 27 Jun 1999, Bill Sommerfeld wrote:

> > .. but there remained one that garbled meta-data had made into a
> > non-existing block device, that would resist rm -f.  He realized
> > that the device had an immutable attribute.  However, the problem is
> > that to change the attribute, you have to open the file before you
> > can ioctl() on it;
> 
> BSD4.4 and its progeny deal with this by providing both chflags() and
> fchflags() system calls; as you don't need to be able to do an open()
> call to use chflags(), you can just fix the immutable attribute once
> you have the system running at an appropriate securelevel.

Usage of ioctl() on Linux was a bad idea and it's going to be fixed. More
or less in the same direction, not exactly the same - 4.4 chflags() works
fine for UFS and leaves other filesystems to map what they can into the
UFS set. Which is bogus - immutable is not a UFS attribute, it's VFS one.
I have a patch (still pre-alpha) and I'll post it tomorrow or on Wednesday
when I'll be back from CA.

As for the opening with no permissions - well, it would make *big* sense
if we could narrow down the API and move chown(), chmod(), etc. into libc
leaving f-variants in the kernel. Binary compatibility... Extreme variant
might include {set,get}sockopt extended to files and doing both *stat and
*ch{mod,own,flags} via that. Out of curiosity - did somebody on *BSD side
play with that?



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: All this and documentation too? (was: Microsoft performance (was: All this and documentation too? (was: cvs commit: src/sys/isa sio.c)))

1999-06-27 Thread Karl Pielorz

Greg Lehey wrote:

> > I've come to understanding that lack of documentation is probably one of
> > the factors that keep the system healthy, because it keeps the unskilled
> > people away. I don't know whether it's true but I read in books that
> > reading code is one of the methods to learn programming. Since FreeBSD
> > does ship with source code, docs are not necessary. NT ships with poorly
> > written docs instead, and, that is what kills it all the time, despite of
> > its perfect design that I really like. People write NT drivers without
> > full understanding what is going on, so they destabilize the system.
> 
> I can't agree with this theory.  Lack of documentation just moves the
> degree of skill needed to, for example, write device drivers.
> Document less well and your average device driver writer will write a
> worse driver, with or without source code access.  Source code access
> helps too, of course.

Coming from someone who's struggled to write a device driver, and then had to
move the driver from 2.X, through to 3.X to 4.X (it's currently languishing
somewhere along the line of 3.X) - I would wholely agree with Greg.
Documentation is _very important_ even more so in a rapidly moving system...

Having access to the source code is one thing, but 'c' was not designed for
documentation, it was designed to program in... Looking at the current array
of drivers in -current you get the idea everyones done it 'slightly
differently', and no one comments their code enough to make it 'self
documenting', nor has anyone singled out any of the vast array of drivers and
said "this is a good example if your writing ISA drivers", or "this is a good
one to go from if your writing PCI".

Just my annoyed $0.02's worth! :)

-Kp


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: All this and documentation too? (was: Microsoft performance (was: All this and documentation too? (was: cvs commit: src/sys/isa sio.c)))

1999-06-27 Thread Karl Pielorz
Greg Lehey wrote:

> > I've come to understanding that lack of documentation is probably one of
> > the factors that keep the system healthy, because it keeps the unskilled
> > people away. I don't know whether it's true but I read in books that
> > reading code is one of the methods to learn programming. Since FreeBSD
> > does ship with source code, docs are not necessary. NT ships with poorly
> > written docs instead, and, that is what kills it all the time, despite of
> > its perfect design that I really like. People write NT drivers without
> > full understanding what is going on, so they destabilize the system.
> 
> I can't agree with this theory.  Lack of documentation just moves the
> degree of skill needed to, for example, write device drivers.
> Document less well and your average device driver writer will write a
> worse driver, with or without source code access.  Source code access
> helps too, of course.

Coming from someone who's struggled to write a device driver, and then had to
move the driver from 2.X, through to 3.X to 4.X (it's currently languishing
somewhere along the line of 3.X) - I would wholely agree with Greg.
Documentation is _very important_ even more so in a rapidly moving system...

Having access to the source code is one thing, but 'c' was not designed for
documentation, it was designed to program in... Looking at the current array
of drivers in -current you get the idea everyones done it 'slightly
differently', and no one comments their code enough to make it 'self
documenting', nor has anyone singled out any of the vast array of drivers and
said "this is a good example if your writing ISA drivers", or "this is a good
one to go from if your writing PCI".

Just my annoyed $0.02's worth! :)

-Kp


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Werner Almesberger
Francois-Rene Rideau wrote:
> Robert told me that in some Unix flavors of old,
> it was possible to open a file by path with a null access mode (O_NULL ?)

E.g. Linux. Very undocumented, but has been around for ages ('92 or
such). The main purpose is to keep the floppy drive from spinning up
to check for a media change when you open it to access parameters and
such. E.g. fdformat, setfdprm, and LILO use this. (NB: some versions
of strace print the flags argument in this case as "0x4", although
it's really 3.)

- Werner

-- 
  _
 / Werner Almesberger, ICA, EPFL, CH   werner.almesber...@ica.epfl.ch /
/_IN_R_131__Tel_+41_21_693_6621__Fax_+41_21_693_6610_/


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Bill Sommerfeld
> .. but there remained one that garbled meta-data had made into a
> non-existing block device, that would resist rm -f.  He realized
> that the device had an immutable attribute.  However, the problem is
> that to change the attribute, you have to open the file before you
> can ioctl() on it;

BSD4.4 and its progeny deal with this by providing both chflags() and
fchflags() system calls; as you don't need to be able to do an open()
call to use chflags(), you can just fix the immutable attribute once
you have the system running at an appropriate securelevel.

- Bill


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Improving the Unix API

1999-06-27 Thread Francois-Rene Rideau

Improving the Unix Kernels' API
A Kernel Discussion with Hacker Robert Ehrlich

Summary: after a discussion with R.E., I submit a suggestion about improving
the API of free Unices with useful features such as open(path,O_NULL);


Dear Free *n*x Kernel Hackers,
   I've been discussing today with old-time Unix hacker (since V6 or so)
[EMAIL PROTECTED] about possible improvements in the design
of Unix APIs in general, and to the Linux kernel in particular.
I'd like to share a summary with you,
since your are the ones fit to implement them or not.
(Comments in parentheses are purely mine).

The starting point of the discussion was an unexplained corruption
of a Linux ext2 fs partition on a friend's machine.
Our common friend ([EMAIL PROTECTED]) had found
that 4 subdirectories in his large persistent web cache were corrupted:
their type, size, dates, access rights, attributes, etc, were garbled.
Obviously, a directory inode has been filled with random garbage.
As I happened to pass by, I helped Bernard kill the processes
that had files opened on the mounted drive, so as to fsck it.
The corrupted directories were lost.
They happily weren't critical data, but still an annoying thing to lose them
(who knows, maybe some of the files are now lost pearls of the Internet?).
After fsck, Bernard tried to remove the files,
but there remained one that garbled meta-data had made into
a non-existing block device, that would resist rm -f.

On Friday morning (I guess, since Robert wasn't there on Thursday),
Bernard asked help from Robert. Robert tried to figure out what went wrong,
and soon ended up examining a binary dump of the bad block
and reading the kernel source code to understand.
He realized that the device had an immutable attribute.
He tried to change the attribute with open() and ioctl()
(having learnt about the immutable flag and its behavior
by reading kernel sources for rm, and grep'ing for the flag
in the rest of the kernel; he didn't know about chattr;
chattr must do the same, anyway).
However, the problem is that to change the attribute,
you have to open the file before you can ioctl() on it;
but the file didn't exist (a non-existing device!)
and thus couldn't be opened successfully.
Robert had to hand-remove the immutable flag
(I guess, by accessing the relevant block directly).

We met afterwards, before lunch (he did all that during that morning at work,
including diagnosis and correction of the problem by reading the kernel code;
and he didn't know about the existence of lsattr and chattr -- impressive!).

Robert told me that in some Unix flavors of old,
it was possible to open a file by path with a null access mode (O_NULL ?)
granting neither read nor write access,
of value -1 (bytewise? or 2-bit-wise?),
so that adding 1 to the open mode you get 0 for 0_NULL, 1 for O_RDONLY,
2 for O_RDWR, O_WRONLY=2, and you get a 2-bit capability bitmask
for read and write. He argued that it would have been useful
to be able to do that in modern Unices.
An alternative would be to provide additional system calls
to change attributes, as well as for everything that should
be done on files without requiring to open them.

Indeed, the "open without access rights"
is useful not only to modify attributes and do other ioctl's,
but also to effect all operations that should be done w/o the ability
to open for either read or write
(fstat, funlink, ioctl, fchown, fchmod, fsync),
and could be used with new syscalls like
flink (make a new directory link for file given by descriptor),
freadlink (read link from a file descriptor opened with O_NULL),
fexec (execute the binary that we checked), etc.
open(path,O_NULL) allows you to do all these things _atomically_,
without all those nasty race conditions that happen all the time
in absence of it, when you have to check a file,
then use the data from a which ever same-named file happens to be there
between two system calls, without any kernel-enforced way
to ensure the file will be the same at that time.
Of course, you'll want to be able to fcntl(fd,F_SETFL,O_RDWR)
or something equivalent, to upgrade your access mode
on a file you opened with O_NULL.

It looked like the linux kernel did immutability checking at wrong places:
not only you can't modify attributes from a file you cannot open,
but you cannot do it for symlinks, either (actually,
the situation of symlinks with respect to attributes and fstat, etc,
is very peculiar; maybe there should be in open an O_DONTFOLLOWLINK option
when you open in mode O_NULL, so that you can do the equivalent
of lstat on a filedescriptor; again such thing could be a life-saver
when dealing with files atomically in presence of symlinks).
I remember having been very disappointed not being able to chattr +i
symlinks from /etc to /trans/etc or /proc so as to ensure
that given "files" would always point to the zone where I store
machine/network-dependent configuration files
that I generate automatically from templates when I move
f

Improving the Unix API

1999-06-27 Thread Francois-Rene Rideau
Improving the Unix Kernels' API
A Kernel Discussion with Hacker Robert Ehrlich

Summary: after a discussion with R.E., I submit a suggestion about improving
the API of free Unices with useful features such as open(path,O_NULL);


Dear Free *n*x Kernel Hackers,
   I've been discussing today with old-time Unix hacker (since V6 or so)
robert.ehrl...@inria.fr about possible improvements in the design
of Unix APIs in general, and to the Linux kernel in particular.
I'd like to share a summary with you,
since your are the ones fit to implement them or not.
(Comments in parentheses are purely mine).

The starting point of the discussion was an unexplained corruption
of a Linux ext2 fs partition on a friend's machine.
Our common friend (bernard.l...@inria.fr) had found
that 4 subdirectories in his large persistent web cache were corrupted:
their type, size, dates, access rights, attributes, etc, were garbled.
Obviously, a directory inode has been filled with random garbage.
As I happened to pass by, I helped Bernard kill the processes
that had files opened on the mounted drive, so as to fsck it.
The corrupted directories were lost.
They happily weren't critical data, but still an annoying thing to lose them
(who knows, maybe some of the files are now lost pearls of the Internet?).
After fsck, Bernard tried to remove the files,
but there remained one that garbled meta-data had made into
a non-existing block device, that would resist rm -f.

On Friday morning (I guess, since Robert wasn't there on Thursday),
Bernard asked help from Robert. Robert tried to figure out what went wrong,
and soon ended up examining a binary dump of the bad block
and reading the kernel source code to understand.
He realized that the device had an immutable attribute.
He tried to change the attribute with open() and ioctl()
(having learnt about the immutable flag and its behavior
by reading kernel sources for rm, and grep'ing for the flag
in the rest of the kernel; he didn't know about chattr;
chattr must do the same, anyway).
However, the problem is that to change the attribute,
you have to open the file before you can ioctl() on it;
but the file didn't exist (a non-existing device!)
and thus couldn't be opened successfully.
Robert had to hand-remove the immutable flag
(I guess, by accessing the relevant block directly).

We met afterwards, before lunch (he did all that during that morning at work,
including diagnosis and correction of the problem by reading the kernel code;
and he didn't know about the existence of lsattr and chattr -- impressive!).

Robert told me that in some Unix flavors of old,
it was possible to open a file by path with a null access mode (O_NULL ?)
granting neither read nor write access,
of value -1 (bytewise? or 2-bit-wise?),
so that adding 1 to the open mode you get 0 for 0_NULL, 1 for O_RDONLY,
2 for O_RDWR, O_WRONLY=2, and you get a 2-bit capability bitmask
for read and write. He argued that it would have been useful
to be able to do that in modern Unices.
An alternative would be to provide additional system calls
to change attributes, as well as for everything that should
be done on files without requiring to open them.

Indeed, the "open without access rights"
is useful not only to modify attributes and do other ioctl's,
but also to effect all operations that should be done w/o the ability
to open for either read or write
(fstat, funlink, ioctl, fchown, fchmod, fsync),
and could be used with new syscalls like
flink (make a new directory link for file given by descriptor),
freadlink (read link from a file descriptor opened with O_NULL),
fexec (execute the binary that we checked), etc.
open(path,O_NULL) allows you to do all these things _atomically_,
without all those nasty race conditions that happen all the time
in absence of it, when you have to check a file,
then use the data from a which ever same-named file happens to be there
between two system calls, without any kernel-enforced way
to ensure the file will be the same at that time.
Of course, you'll want to be able to fcntl(fd,F_SETFL,O_RDWR)
or something equivalent, to upgrade your access mode
on a file you opened with O_NULL.

It looked like the linux kernel did immutability checking at wrong places:
not only you can't modify attributes from a file you cannot open,
but you cannot do it for symlinks, either (actually,
the situation of symlinks with respect to attributes and fstat, etc,
is very peculiar; maybe there should be in open an O_DONTFOLLOWLINK option
when you open in mode O_NULL, so that you can do the equivalent
of lstat on a filedescriptor; again such thing could be a life-saver
when dealing with files atomically in presence of symlinks).
I remember having been very disappointed not being able to chattr +i
symlinks from /etc to /trans/etc or /proc so as to ensure
that given "files" would always point to the zone where I store
machine/network-dependent configuration files
that I generate automatically from templates when

Re: ufs/ffs resize?

1999-06-27 Thread Greg Lehey

On Sunday, 27 June 1999 at  9:33:09 +0200, [EMAIL PROTECTED] wrote:
>>> Another datapoint ot consider, it seems that Linux (at least the derivative
>>> version maintained by Alan Cox -- the other one :) ) has now grown an LVM
>>> system (probably à la HP or AIX). That's what I've been told yesterday during
>>> a small conference about Linux and free software in France (and where I did a
>>> talk about FreeBSD *grin*).
>>
>> Hmmm.  It might be from SGI.  SGI has donated XFS to Linux and is actively
>> marketing it on their Intel based systems.
>>
>> http://www.news.com/News/Item/0,4,36807,00.html?st.ne.fd.tohhed.ni
>
> As far as I know it's way too early for the Linux LVM to based on XFS,
> since SGI hasn't even released the source code yet (just stated that
> they intend to do so).

There's another reason: XFS is a file system, not a volume manager.  A
volume manager is more like a disk than a file system.

In fact, I've seen the Linux LVM before; it's been around for a while,
but last time I looked at it I wasn't very interesting.

Greg
--
See complete headers for address, home page and phone numbers
finger [EMAIL PROTECTED] for PGP public key


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: ufs/ffs resize?

1999-06-27 Thread Greg Lehey
On Sunday, 27 June 1999 at  9:33:09 +0200, sth...@nethelp.no wrote:
>>> Another datapoint ot consider, it seems that Linux (at least the derivative
>>> version maintained by Alan Cox -- the other one :) ) has now grown an LVM
>>> system (probably à la HP or AIX). That's what I've been told yesterday 
>>> during
>>> a small conference about Linux and free software in France (and where I did 
>>> a
>>> talk about FreeBSD *grin*).
>>
>> Hmmm.  It might be from SGI.  SGI has donated XFS to Linux and is actively
>> marketing it on their Intel based systems.
>>
>> http://www.news.com/News/Item/0,4,36807,00.html?st.ne.fd.tohhed.ni
>
> As far as I know it's way too early for the Linux LVM to based on XFS,
> since SGI hasn't even released the source code yet (just stated that
> they intend to do so).

There's another reason: XFS is a file system, not a volume manager.  A
volume manager is more like a disk than a file system.

In fact, I've seen the Linux LVM before; it's been around for a while,
but last time I looked at it I wasn't very interesting.

Greg
--
See complete headers for address, home page and phone numbers
finger g...@lemis.com for PGP public key


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: [Re: [Re: coarse vs fine-grained locking in SMP systems]]

1999-06-27 Thread Warner Losh

In message <[EMAIL PROTECTED]> Matthew Dillon writes:
: Here's the basic problem:  The kernel is currently designed for 
: single-threaded operation plus interrupt handling.  A piece of code
: in the kernel can temporarily disable certain interrupts with the
: spl*() codes to cover situations where a race on some system resource
: might occur.
: 
: But with SMP, several cpu's may be running in supervisor mode 
: simultaniously.  The spl*() model breaks down because while one
: can block interrupts, one cannot easily block another cpu that
: might be running conflicting code.  Resource races can now occur between
: mainline code running on several cpu's simultaniously as well as between
: mainline code and interrupt code.

Yes.  However, the spl* model could also be viewed as a few very basic 
locks.  so splnet would block the net interrupts and take out the net
mutex, etc.  When splx is executed, the interrupts are restored to
their old value and the net mutex could be released.  In this case the 
return value of spl* becomes a cookie that can be used to restore both 
the prior interrupt context, as well as release the mutex aquired.

There are problems with this approach, as I believe early efforts in
the FreeBSD/SMP project can attest, but I don't recall the details of
them.  It was originally thought that this could be made to work, if I 
recall the few messages about SMP that I saw, since you are
effectively emulating the spl mechanism accross CPUS.

VMS 5.0 introduced a similar concept as well.  To get access to a
resource, you'd raise the SPL level of the CPU (to keep the hardware
devices from interrupting you) and then take out a spin lock (to keep
the other CPUs from doing the same).

: In order to make SMP operation work better, pieces of the kernel are
: slowly being moved outside the "big giant lock".  Linux developers,
: in fact, have already moved their core data copying code and their TCP
: stack outside the lock.  At the moment the FreeBSD-current kernel has
: not moved anything outside the lock, but John Dyson has shown that it
: is fairly easy to move certain specific pieces such as the uiomove()
: code outside the lock, though inefficiencies from side-effects currently
: make the improvement in performance less then steller.

That is correct.  At Solbourne[*], we were honest enough to call the one
big lock approach ASMP (any CPU could run in kernel mode, but only one 
at a time).  Linux's (and FreeBSD's) SMP has really been mostly ASMP,
with a little bit of fine grain locking in the corners.

[*] Solbourne, for those of you that don't know, made sparc servers
(and one workstation) several years ago.  They were SMP years before
Sun managed to ship SMP support in Solrais.  Many of my SMP "gut
feelings" were developed while working there.

: The real question is how to manage concurrency as pieces get moved outside
: the lock.  There are lots of ways to do it.   One can use spin locks to
: protect resources or, as someone pointed out earlier, to protect sections
: of code.  I don't know which is better myself, it probably depends on the
: situation so a hybrid will probably be the end result.  One can also use
: kernel threads to simplify resource management.  The advantage of a 
: kernel thread verses a normal process is in the ability to switch between
: kernel threads very quickly, allowing the time normally wasted spining in
: certain types of locks to be used more efficiently.  

Solaris wound up using mutexes, condition variables, and semaphores to
accomplish this.  I don't know the exact details of what they did on a
resource stall, however.  The ddk tended to discourage exploration of
this.  I believe it was simply the thread stalled and another thread
were allowed to run.  I don't know how the scheduler itself was
protected.  Given that you have a threading kernel, making it SMP safe 
is generally fairly easy, modulo locking issues.

The biggest area that both Solbourne, VMS and Solaris had in their
early versions were making sure that deadlock didn't happen.  Locks
were always a real SOB to get right, and generally the cause of all
kinds of problems.  When I was testing Solbourne OS/MP 4.0C, I'd say
that 95% of the difficult to reproduce problems turned out to be
locking related and 60% of the easily reproducible were locking
related.  The years may have colored my rememberences of the
percentages and the version numbers for OS/MP, but it is the one thing 
that stands out in my mind accross the 9 years it has been since I was 
doing that.

Warner


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: [Re: [Re: coarse vs fine-grained locking in SMP systems]]

1999-06-27 Thread Warner Losh
In message <199906270733.aaa10...@apollo.backplane.com> Matthew Dillon writes:
: Here's the basic problem:  The kernel is currently designed for 
: single-threaded operation plus interrupt handling.  A piece of code
: in the kernel can temporarily disable certain interrupts with the
: spl*() codes to cover situations where a race on some system resource
: might occur.
: 
: But with SMP, several cpu's may be running in supervisor mode 
: simultaniously.  The spl*() model breaks down because while one
: can block interrupts, one cannot easily block another cpu that
: might be running conflicting code.  Resource races can now occur between
: mainline code running on several cpu's simultaniously as well as between
: mainline code and interrupt code.

Yes.  However, the spl* model could also be viewed as a few very basic 
locks.  so splnet would block the net interrupts and take out the net
mutex, etc.  When splx is executed, the interrupts are restored to
their old value and the net mutex could be released.  In this case the 
return value of spl* becomes a cookie that can be used to restore both 
the prior interrupt context, as well as release the mutex aquired.

There are problems with this approach, as I believe early efforts in
the FreeBSD/SMP project can attest, but I don't recall the details of
them.  It was originally thought that this could be made to work, if I 
recall the few messages about SMP that I saw, since you are
effectively emulating the spl mechanism accross CPUS.

VMS 5.0 introduced a similar concept as well.  To get access to a
resource, you'd raise the SPL level of the CPU (to keep the hardware
devices from interrupting you) and then take out a spin lock (to keep
the other CPUs from doing the same).

: In order to make SMP operation work better, pieces of the kernel are
: slowly being moved outside the "big giant lock".  Linux developers,
: in fact, have already moved their core data copying code and their TCP
: stack outside the lock.  At the moment the FreeBSD-current kernel has
: not moved anything outside the lock, but John Dyson has shown that it
: is fairly easy to move certain specific pieces such as the uiomove()
: code outside the lock, though inefficiencies from side-effects currently
: make the improvement in performance less then steller.

That is correct.  At Solbourne[*], we were honest enough to call the one
big lock approach ASMP (any CPU could run in kernel mode, but only one 
at a time).  Linux's (and FreeBSD's) SMP has really been mostly ASMP,
with a little bit of fine grain locking in the corners.

[*] Solbourne, for those of you that don't know, made sparc servers
(and one workstation) several years ago.  They were SMP years before
Sun managed to ship SMP support in Solrais.  Many of my SMP "gut
feelings" were developed while working there.

: The real question is how to manage concurrency as pieces get moved outside
: the lock.  There are lots of ways to do it.   One can use spin locks to
: protect resources or, as someone pointed out earlier, to protect sections
: of code.  I don't know which is better myself, it probably depends on the
: situation so a hybrid will probably be the end result.  One can also use
: kernel threads to simplify resource management.  The advantage of a 
: kernel thread verses a normal process is in the ability to switch between
: kernel threads very quickly, allowing the time normally wasted spining in
: certain types of locks to be used more efficiently.  

Solaris wound up using mutexes, condition variables, and semaphores to
accomplish this.  I don't know the exact details of what they did on a
resource stall, however.  The ddk tended to discourage exploration of
this.  I believe it was simply the thread stalled and another thread
were allowed to run.  I don't know how the scheduler itself was
protected.  Given that you have a threading kernel, making it SMP safe 
is generally fairly easy, modulo locking issues.

The biggest area that both Solbourne, VMS and Solaris had in their
early versions were making sure that deadlock didn't happen.  Locks
were always a real SOB to get right, and generally the cause of all
kinds of problems.  When I was testing Solbourne OS/MP 4.0C, I'd say
that 95% of the difficult to reproduce problems turned out to be
locking related and 60% of the easily reproducible were locking
related.  The years may have colored my rememberences of the
percentages and the version numbers for OS/MP, but it is the one thing 
that stands out in my mind accross the 9 years it has been since I was 
doing that.

Warner


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: [Re: [Re: coarse vs fine-grained locking in SMP systems]]

1999-06-27 Thread Matthew Dillon

Here's the basic problem:  The kernel is currently designed for 
single-threaded operation plus interrupt handling.  A piece of code
in the kernel can temporarily disable certain interrupts with the
spl*() codes to cover situations where a race on some system resource
might occur.

But with SMP, several cpu's may be running in supervisor mode 
simultaniously.  The spl*() model breaks down because while one
can block interrupts, one cannot easily block another cpu that
might be running conflicting code.  Resource races can now occur between
mainline code running on several cpu's simultaniously as well as between
mainline code and interrupt code.

The traditional BSD kernel code cannot deal with this new type of
race.  At the moment every entry into supervisor mode is being
governed by a "big giant lock" which only allows one cpu to run
mainline code in supervisor mode at any given moment.  Both cpu's
can run usermode code simultaniously just fine, but only one can
run supervisor code.

In order to make SMP operation work better, pieces of the kernel are
slowly being moved outside the "big giant lock".  Linux developers,
in fact, have already moved their core data copying code and their TCP
stack outside the lock.  At the moment the FreeBSD-current kernel has
not moved anything outside the lock, but John Dyson has shown that it
is fairly easy to move certain specific pieces such as the uiomove()
code outside the lock, though inefficiencies from side-effects currently
make the improvement in performance less then steller.

The real question is how to manage concurrency as pieces get moved outside
the lock.  There are lots of ways to do it.   One can use spin locks to
protect resources or, as someone pointed out earlier, to protect sections
of code.  I don't know which is better myself, it probably depends on the
situation so a hybrid will probably be the end result.  One can also use
kernel threads to simplify resource management.  The advantage of a 
kernel thread verses a normal process is in the ability to switch between
kernel threads very quickly, allowing the time normally wasted spining in
certain types of locks to be used more efficiently.  

The problem that generally needs to be solved is the problem of stalling
on a resource.  For example, if you have several threads running 
simultaniously and they all need access to the same resource, serialization
of the threads occurs due to the 'blockage' on access to the resource.
(serialization means that only one thread can run at a time within the
resource, which means your efficiency drops to the efficiency of only a
single cpu).  There are lots of other issues (such as cache efficiency),
but that is the big one.

-Matt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: [Re: [Re: coarse vs fine-grained locking in SMP systems]]

1999-06-27 Thread Matthew Dillon
Here's the basic problem:  The kernel is currently designed for 
single-threaded operation plus interrupt handling.  A piece of code
in the kernel can temporarily disable certain interrupts with the
spl*() codes to cover situations where a race on some system resource
might occur.

But with SMP, several cpu's may be running in supervisor mode 
simultaniously.  The spl*() model breaks down because while one
can block interrupts, one cannot easily block another cpu that
might be running conflicting code.  Resource races can now occur between
mainline code running on several cpu's simultaniously as well as between
mainline code and interrupt code.

The traditional BSD kernel code cannot deal with this new type of
race.  At the moment every entry into supervisor mode is being
governed by a "big giant lock" which only allows one cpu to run
mainline code in supervisor mode at any given moment.  Both cpu's
can run usermode code simultaniously just fine, but only one can
run supervisor code.

In order to make SMP operation work better, pieces of the kernel are
slowly being moved outside the "big giant lock".  Linux developers,
in fact, have already moved their core data copying code and their TCP
stack outside the lock.  At the moment the FreeBSD-current kernel has
not moved anything outside the lock, but John Dyson has shown that it
is fairly easy to move certain specific pieces such as the uiomove()
code outside the lock, though inefficiencies from side-effects currently
make the improvement in performance less then steller.

The real question is how to manage concurrency as pieces get moved outside
the lock.  There are lots of ways to do it.   One can use spin locks to
protect resources or, as someone pointed out earlier, to protect sections
of code.  I don't know which is better myself, it probably depends on the
situation so a hybrid will probably be the end result.  One can also use
kernel threads to simplify resource management.  The advantage of a 
kernel thread verses a normal process is in the ability to switch between
kernel threads very quickly, allowing the time normally wasted spining in
certain types of locks to be used more efficiently.  

The problem that generally needs to be solved is the problem of stalling
on a resource.  For example, if you have several threads running 
simultaniously and they all need access to the same resource, serialization
of the threads occurs due to the 'blockage' on access to the resource.
(serialization means that only one thread can run at a time within the
resource, which means your efficiency drops to the efficiency of only a
single cpu).  There are lots of other issues (such as cache efficiency),
but that is the big one.

-Matt



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: ufs/ffs resize?

1999-06-27 Thread sthaug

> > Another datapoint ot consider, it seems that Linux (at least the derivative
> > version maintained by Alan Cox -- the other one :) ) has now grown an LVM
> > system (probably à la HP or AIX). That's what I've been told yesterday during
> > a small conference about Linux and free software in France (and where I did a
> > talk about FreeBSD *grin*).
> 
> Hmmm.  It might be from SGI.  SGI has donated XFS to Linux and is actively
> marketing it on their Intel based systems.
> 
> http://www.news.com/News/Item/0,4,36807,00.html?st.ne.fd.tohhed.ni

As far as I know it's way too early for the Linux LVM to based on XFS,
since SGI hasn't even released the source code yet (just stated that
they intend to do so).

Steinar Haug, Nethelp consulting, [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: ufs/ffs resize?

1999-06-27 Thread sthaug
> > Another datapoint ot consider, it seems that Linux (at least the derivative
> > version maintained by Alan Cox -- the other one :) ) has now grown an LVM
> > system (probably à la HP or AIX). That's what I've been told yesterday 
> > during
> > a small conference about Linux and free software in France (and where I did 
> > a
> > talk about FreeBSD *grin*).
> 
> Hmmm.  It might be from SGI.  SGI has donated XFS to Linux and is actively
> marketing it on their Intel based systems.
> 
> http://www.news.com/News/Item/0,4,36807,00.html?st.ne.fd.tohhed.ni

As far as I know it's way too early for the Linux LVM to based on XFS,
since SGI hasn't even released the source code yet (just stated that
they intend to do so).

Steinar Haug, Nethelp consulting, sth...@nethelp.no


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Volume managers (was: ufs/ffs resize?)

1999-06-27 Thread Bernd Walter

On Sun, Jun 27, 1999 at 09:33:45AM +0930, Greg Lehey wrote:
> On Sunday, 27 June 1999 at  0:35:54 +0200, Ollivier Robert wrote:
> > I think one of the difficulty of growing a FS is that you have to
> > choose whether you need the FS to be contiguous or not. The latter
> > case makes it much more difficult...
> 
> Why shouldn't it be contiguous?  That's what the volume manager's
> there for.
> 
It should be always possible to add some blocks to an existing volume
without a gap between.
The problem is shrinking.
You usually want to shrink a fs not because you want it smaller but you want
to free a harddisk.
Murphy says that this disk is at the beginning or the middle of the partition.
If Murphy had a good day he says this disk is part of a stripe...

What system should handle this case?
For FFS it might be better handled by the volume-managaer because it should
be difficult to implement GAP-Handling in FFS
LFSs daily job is to do such jobs as a kind of garbadge collection so it should
be much faster and easier if handled by the fs - but that would mean the fs
need to know about the volumes details.

There are some other points left with interaction of volumemanagers and
Filesystem instead of shrinking and growing.
Say you have a concatenated volume with 3 disks.
One drive fails without any chance to get it up again.
Vinum takes the plex and because it's the only one the volume too.
There no kind of emergency mode in which you can say - bring that volume
up with a large defektive area in them so I can mount read-only and try
to read everthing I can.
Instead of CCD vinum knows how big the failed disk was.

One other point is that it is not possible with vinum or ccd to add space to
a striped/raid5 volume.
One way might be to use ccd to concat striped vinum volumes - never tried.
But in my opinion that's not the way it should be.

-- 
B.Walter  COSMO-Project  http://www.cosmo-project.de
[EMAIL PROTECTED]  [EMAIL PROTECTED]



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Volume managers (was: ufs/ffs resize?)

1999-06-27 Thread Bernd Walter
On Sun, Jun 27, 1999 at 09:33:45AM +0930, Greg Lehey wrote:
> On Sunday, 27 June 1999 at  0:35:54 +0200, Ollivier Robert wrote:
> > I think one of the difficulty of growing a FS is that you have to
> > choose whether you need the FS to be contiguous or not. The latter
> > case makes it much more difficult...
> 
> Why shouldn't it be contiguous?  That's what the volume manager's
> there for.
> 
It should be always possible to add some blocks to an existing volume
without a gap between.
The problem is shrinking.
You usually want to shrink a fs not because you want it smaller but you want
to free a harddisk.
Murphy says that this disk is at the beginning or the middle of the partition.
If Murphy had a good day he says this disk is part of a stripe...

What system should handle this case?
For FFS it might be better handled by the volume-managaer because it should
be difficult to implement GAP-Handling in FFS
LFSs daily job is to do such jobs as a kind of garbadge collection so it should
be much faster and easier if handled by the fs - but that would mean the fs
need to know about the volumes details.

There are some other points left with interaction of volumemanagers and
Filesystem instead of shrinking and growing.
Say you have a concatenated volume with 3 disks.
One drive fails without any chance to get it up again.
Vinum takes the plex and because it's the only one the volume too.
There no kind of emergency mode in which you can say - bring that volume
up with a large defektive area in them so I can mount read-only and try
to read everthing I can.
Instead of CCD vinum knows how big the failed disk was.

One other point is that it is not possible with vinum or ccd to add space to
a striped/raid5 volume.
One way might be to use ccd to concat striped vinum volumes - never tried.
But in my opinion that's not the way it should be.

-- 
B.Walter  COSMO-Project  http://www.cosmo-project.de
ti...@cicely.de  i...@cosmo-project.de



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message